# What we're covering in the Scikit-Learn Introduction

This notebook outlines the content convered in the Scikit-Learn Introduction.

It's a quick stop to see all the Scikit-Learn functions and modules for each section outlined.

What we're covering follows the following diagram detailing a Scikit-Learn workflow.

<img src="../images/sklearn-workflow-title.png"/>

## 0. Standard library imports

For all machine learning projects, you'll often see these libraries (Matplotlib, NumPy and pandas) imported at the top.

In [18]:
# Enabling inline display of matplotlib plots within the Jupyter Notebook
# '%matplotlib inline' is a magic function in IPython that renders the plots in a cell output within the Jupyter notebook itself.
# Without this line, plots might open in a new window or not display at all.
%matplotlib inline

# Importing the matplotlib.pyplot module
# This module provides a MATLAB-like plotting framework, which is widely used for plotting graphs and charts in Python.
# 'plt' is a commonly used alias for matplotlib.pyplot.
import matplotlib.pyplot as plt

# Importing the numpy module
# NumPy is a fundamental package for scientific computing in Python, providing support for arrays, mathematical functions, and more.
# 'np' is a widely used abbreviation for NumPy, making it more convenient to refer to in the code.
import numpy as np

# Importing the pandas module
# pandas is a powerful data manipulation and analysis library for Python, providing DataFrame objects for handling tabular data.
# 'pd' is the conventional alias for pandas, used for convenience in referencing the library.
import pandas as pd


We'll use 2 datasets for demonstration purposes.
* `heart_disease` - a classification dataset (predicting whether someone has heart disease or not)
* `boston_df` - a regression dataset (predicting the median house prices of cities in Boston)

In [21]:
# Loading and Preparing Classification and Regression Datasets

# Classification Dataset: Heart Disease

# Loading the heart disease dataset into a pandas DataFrame.
# The dataset is read from a CSV file located at 'heart-disease.csv'.
# This dataset is typically used for classification tasks.
heart_disease = pd.read_csv("heart-disease.csv")

# Regression Dataset: California Housing

# Importing the fetch_california_housing function from sklearn.datasets.
# This function provides access to the California housing dataset,
# which is commonly used for regression tasks.
from sklearn.datasets import fetch_california_housing

# Fetching the California housing dataset.
# The dataset is loaded as a Bunch object (similar to a dictionary).
california_housing = fetch_california_housing()

# Converting the California housing dataset from a Bunch object to a pandas DataFrame.
# The 'data' attribute contains the features, and 'feature_names' attribute provides the column names.
# Creating a DataFrame 'california_housing_df' with features and column names.
california_housing_df = pd.DataFrame(california_housing.data, columns=california_housing.feature_names)

# Adding the target variable to the DataFrame.
# The 'target' attribute in the California housing dataset contains the dependent variable (housing prices).
# It's added as a new column named 'target' in the DataFrame.
california_housing_df["target"] = california_housing.target


## 1. Get the data ready

In [22]:
# Splitting the Heart Disease Dataset into Features and Target Variable

# Creating the features matrix (X) from the heart_disease DataFrame.
# This is done by dropping the 'target' column which is our dependent variable.
# The 'drop' method removes the specified column ('target') from the DataFrame.
# 'axis=1' indicates that we are dropping a column, not a row.
# The resulting DataFrame, assigned to X, contains only the independent variables (features).
X = heart_disease.drop("target", axis=1)

# Creating the target vector (y) from the heart_disease DataFrame.
# The target variable is what we are trying to predict or classify.
# In this case, 'y' is the 'target' column from the heart_disease DataFrame,
# which represents the presence or absence of heart disease.
y = heart_disease["target"]


In [23]:
# Splitting the Dataset into Training and Test Sets

# Importing the train_test_split function from sklearn.model_selection.
# This function is used to randomly split the dataset into training and test subsets.
# It's a common practice in machine learning to evaluate the performance of a model.
from sklearn.model_selection import train_test_split

# Splitting the features matrix (X) and target vector (y) into training and test sets.
# The 'train_test_split' function returns four subsets:
# X_train: part of the features used for training the model.
# X_test: part of the features used for testing the model.
# y_train: part of the target variable corresponding to X_train, used for training.
# y_test: part of the target variable corresponding to X_test, used for evaluating the model.
# By default, the function splits the data into 75% for training and 25% for testing.
X_train, X_test, y_train, y_test = train_test_split(X, y)


## 2. Pick a model/estimator (to suit your problem)
To pick a model we use the [Scikit-Learn machine learning map](https://scikit-learn.org/stable/tutorial/machine_learning_map/index.html).

<img src="../images/sklearn-ml-map.png" width=400/>

**Note:** Scikit-Learn refers to machine learning models and algorithms as estimators.

In [24]:
# Importing and Instantiating a Random Forest Classifier for Classification Tasks

# Importing the RandomForestClassifier from sklearn.ensemble.
# RandomForestClassifier is an ensemble learning method used for classification tasks.
# It operates by constructing a multitude of decision trees at training time 
# and outputting the class that is the mode of the classes (classification) 
# of the individual trees.
from sklearn.ensemble import RandomForestClassifier

# Creating an instance of the RandomForestClassifier.
# Here, 'clf' (short for 'classifier') is instantiated as a RandomForestClassifier.
# Without any parameters, it will use the default settings of the classifier.
# You can customize it by passing parameters like n_estimators (number of trees),
# max_depth (maximum depth of each tree), and others, according to your dataset and task.
clf = RandomForestClassifier()


In [25]:
# Importing and Instantiating a Random Forest Regressor for Regression Tasks

# Importing the RandomForestRegressor from sklearn.ensemble.
# RandomForestRegressor is an ensemble learning method primarily used for regression tasks.
# Similar to the classifier, it operates by constructing a multitude of decision trees at training time.
# For regression tasks, it predicts the output based on the average or mean of the outputs of individual trees.

from sklearn.ensemble import RandomForestRegressor

# Creating an instance of the RandomForestRegressor.
# Here, 'model' is instantiated as a RandomForestRegressor.
# The default settings are used for this instance, but it can be customized with various parameters.
# Parameters like n_estimators (number of trees), max_depth (maximum depth of each tree),
# and others can be specified to tailor the model to specific datasets and regression tasks.
model = RandomForestRegressor()


## 3. Fit the model to the data and make a prediction


In [26]:
# Training the Model and Making Predictions

# Fitting the Model
# The 'fit' method is used to train the model using the training data.
# It adjusts the parameters of the model (clf) so it best fits the data.
# Here, clf.fit(X_train, y_train) trains the RandomForestClassifier 'clf' 
# using the features (X_train) and the target (y_train) from the training set.
clf.fit(X_train, y_train)

# Making Standard Predictions
# After the model is trained, you can use the 'predict' method to make predictions.
# 'clf.predict(X_test)' uses the features from the test set (X_test) to predict the target values.
# The predicted target values are stored in 'y_preds'.
y_preds = clf.predict(X_test)

# Making Predictions with Probabilities (specific to classification models)
# The 'predict_proba' method is used to predict class probabilities for classification models.
# It returns the probability of the test data belonging to each class.
# In the case of RandomForestClassifier, it gives the mean predicted class probabilities
# of the trees in the forest.
# The probabilities for the test set (X_test) are stored in 'y_probs'.
y_probs = clf.predict_proba(X_test)

# Viewing Predictions and Probabilities
# Printing 'y_preds' and 'y_probs' to see the predictions and the associated probabilities.
# 'y_preds' shows the predicted class labels, while 'y_probs' shows the probability estimates.
y_preds, y_probs


(array([0, 1, 0, 0, 1, 1, 1, 0, 0, 0, 1, 0, 1, 0, 1, 1, 1, 0, 0, 0, 1, 0,
        1, 1, 1, 1, 1, 1, 0, 1, 0, 0, 0, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1,
        1, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 1, 0,
        0, 1, 1, 1, 1, 1, 1, 1, 0, 1], dtype=int64),
 array([[0.91, 0.09],
        [0.45, 0.55],
        [0.51, 0.49],
        [0.85, 0.15],
        [0.25, 0.75],
        [0.05, 0.95],
        [0.25, 0.75],
        [0.97, 0.03],
        [0.98, 0.02],
        [0.52, 0.48],
        [0.15, 0.85],
        [0.69, 0.31],
        [0.06, 0.94],
        [0.86, 0.14],
        [0.05, 0.95],
        [0.01, 0.99],
        [0.  , 1.  ],
        [0.86, 0.14],
        [0.95, 0.05],
        [0.95, 0.05],
        [0.46, 0.54],
        [0.93, 0.07],
        [0.31, 0.69],
        [0.29, 0.71],
        [0.36, 0.64],
        [0.28, 0.72],
        [0.2 , 0.8 ],
        [0.23, 0.77],
        [0.9 , 0.1 ],
        [0.21, 0.79],
        [0.95, 0.05],
        [0.87, 0.13],
        [0.98, 0.02],

## 4. Evaluate the model

Every Scikit-Learn model has a default metric which is accessible through the `score()` function.

However there are a range of different evaluation metrics you can use depending on the model you're using.

A full list of evaluation metrics can be [found in the documentation](https://scikit-learn.org/stable/modules/model_evaluation.html).

In [27]:
# All models/estimators have a score() function
clf.score(X_test, y_test)

0.8289473684210527

In [28]:
# Evaluting a model using cross-validation is possible with cross_val_score
from sklearn.model_selection import cross_val_score

# scoring=None means default score() metric is used
print(cross_val_score(estimator=clf, 
                      X=X, 
                      y=y, 
                      cv=5, # use 5-fold cross-validation
                      scoring=None)) 

# Evaluate a model with a different scoring method
print(cross_val_score(estimator=clf, 
                      X=X, 
                      y=y,
                      cv=5, # use 5-fold cross-validation
                      scoring="precision"))

[0.81967213 0.86885246 0.81967213 0.78333333 0.76666667]
[0.82857143 0.93333333 0.80645161 0.81818182 0.76923077]


In [29]:
# Different classification metrics

# Accuracy
from sklearn.metrics import accuracy_score
print(accuracy_score(y_test, y_preds))

# Reciver Operating Characteristic (ROC curve)/Area under curve (AUC)
from sklearn.metrics import roc_curve, roc_auc_score
false_positive_rate, true_positive_rate, thresholds = roc_curve(y_test, y_probs[:, 1])
print(roc_auc_score(y_test, y_preds))

# Confusion matrix
from sklearn.metrics import confusion_matrix
print(confusion_matrix(y_test, y_preds))

# Classification report
from sklearn.metrics import classification_report
print(classification_report(y_test, y_preds))

0.8289473684210527
0.8268292682926829
[[28  7]
 [ 6 35]]
              precision    recall  f1-score   support

           0       0.82      0.80      0.81        35
           1       0.83      0.85      0.84        41

    accuracy                           0.83        76
   macro avg       0.83      0.83      0.83        76
weighted avg       0.83      0.83      0.83        76



In [42]:
# Different regression metrics using the California Housing Dataset

# Make predictions first
# Using the California housing dataset DataFrame for the regression model
X = california_housing_df.drop("target", axis=1)
y = california_housing_df["target"]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

# Instantiate and train the RandomForestRegressor model
model = RandomForestRegressor()
model.fit(X_train, y_train)
y_preds = model.predict(X_test)

# Evaluate the model using various regression metrics

# R^2 (pronounced r-squared) or coefficient of determination
from sklearn.metrics import r2_score
print("R^2 Score:", r2_score(y_test, y_preds))

# Mean absolute error (MAE)
from sklearn.metrics import mean_absolute_error
print("Mean Absolute Error (MAE):", mean_absolute_error(y_test, y_preds))

# Mean square error (MSE)
from sklearn.metrics import mean_squared_error
print("Mean Squared Error (MSE):", mean_squared_error(y_test, y_preds))


R^2 Score: 0.801891020387404
Mean Absolute Error (MAE): 0.32951328255813966
Mean Squared Error (MSE): 0.2579791311908462


## 5. Improve through experimentation

Two of the main methods to improve a models baseline metrics (the first evaluation metrics you get).

From a data perspective asks:
* Could we collect more data? In machine learning, more data is generally better, as it gives a model more opportunities to learn patterns.
* Could we improve our data? This could mean filling in misisng values or finding a better encoding (turning things into numbers) strategy.

From a model perspective asks:
* Is there a better model we could use? If you've started out with a simple model, could you use a more complex one? (we saw an example of this when looking at the [Scikit-Learn machine learning map](https://scikit-learn.org/stable/tutorial/machine_learning_map/index.html), ensemble methods are generally considered more complex models)
* Could we improve the current model? If the model you're using performs well straight out of the box, can the **hyperparameters** be tuned to make it even better?

**Hyperparameters** are like settings on a model you can adjust so some of the ways it uses to find patterns are altered and potentially improved. Adjusting hyperparameters is referred to as hyperparameter tuning.

In [43]:
# How to find a model's hyperparameters
clf = RandomForestClassifier()
clf.get_params() # returns a list of adjustable hyperparameters

{'bootstrap': True,
 'ccp_alpha': 0.0,
 'class_weight': None,
 'criterion': 'gini',
 'max_depth': None,
 'max_features': 'sqrt',
 'max_leaf_nodes': None,
 'max_samples': None,
 'min_impurity_decrease': 0.0,
 'min_samples_leaf': 1,
 'min_samples_split': 2,
 'min_weight_fraction_leaf': 0.0,
 'n_estimators': 100,
 'n_jobs': None,
 'oob_score': False,
 'random_state': None,
 'verbose': 0,
 'warm_start': False}

In [44]:
# Example of adjusting hyperparameters by hand

# Split data into X & y
X = heart_disease.drop("target", axis=1) # use all columns except target
y = heart_disease["target"] # we want to predict y using X

# Split data into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y)

# Instantiate two models with different settings
clf_1 = RandomForestClassifier(n_estimators=100)
clf_2 = RandomForestClassifier(n_estimators=200)

# Fit both models on training data
clf_1.fit(X_train, y_train)
clf_2.fit(X_train, y_train)

# Evaluate both models on test data and see which is best
print(clf_1.score(X_test, y_test))
print(clf_2.score(X_test, y_test))

0.7631578947368421
0.7631578947368421


In [57]:
# Example of adjusting hyperparameters computationally (recommended)

from sklearn.model_selection import RandomizedSearchCV, train_test_split, KFold
from sklearn.ensemble import RandomForestClassifier

# Define a grid of hyperparameters
grid = {"n_estimators": [10, 100, 200, 500, 1000, 1200],
        "max_depth": [None, 5, 10, 20, 30],
        "max_features": ["auto", "sqrt"],
        "min_samples_split": [2, 4, 6],
        "min_samples_leaf": [1, 2, 4]}

# Ensure X and y are defined and properly formatted
# X = ... (your features)
# y = ... (your target variable)

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

# Instantiate RandomForestClassifier
clf = RandomForestClassifier(n_jobs=1)  # n_jobs set to 1 for compatibility

# Setup RandomizedSearchCV with KFold cross-validation
rs_clf = RandomizedSearchCV(estimator=clf,
                            param_distributions=grid,
                            n_iter=10,  # try 10 models total
                            cv=KFold(n_splits=5),  # using simple 5-fold cross-validation
                            verbose=2)  # print out results

# Fit the RandomizedSearchCV version of clf
rs_clf.fit(X_train, y_train)

# Find the best hyperparameters
print("Best hyperparameters:", rs_clf.best_params_)

# Scoring automatically uses the best hyperparameters
score = rs_clf.score(X_test, y_test)
print("Model score:", score)


Fitting 5 folds for each of 10 candidates, totalling 50 fits
[CV] END max_depth=5, max_features=sqrt, min_samples_leaf=4, min_samples_split=4, n_estimators=10; total time=   0.0s
[CV] END max_depth=5, max_features=sqrt, min_samples_leaf=4, min_samples_split=4, n_estimators=10; total time=   0.0s
[CV] END max_depth=5, max_features=sqrt, min_samples_leaf=4, min_samples_split=4, n_estimators=10; total time=   0.0s
[CV] END max_depth=5, max_features=sqrt, min_samples_leaf=4, min_samples_split=4, n_estimators=10; total time=   0.0s
[CV] END max_depth=5, max_features=sqrt, min_samples_leaf=4, min_samples_split=4, n_estimators=10; total time=   0.0s
[CV] END max_depth=5, max_features=sqrt, min_samples_leaf=1, min_samples_split=4, n_estimators=10; total time=   0.0s
[CV] END max_depth=5, max_features=sqrt, min_samples_leaf=1, min_samples_split=4, n_estimators=10; total time=   0.0s
[CV] END max_depth=5, max_features=sqrt, min_samples_leaf=1, min_samples_split=4, n_estimators=10; total time=   

ValueError: 
All the 50 fits failed.
It is very likely that your model is misconfigured.
You can try to debug the error by setting error_score='raise'.

Below are more details about the failures:
--------------------------------------------------------------------------------
6 fits failed with the following error:
Traceback (most recent call last):
  File "c:\Users\yisakg\desktop\sample_project_1\env\lib\site-packages\sklearn\model_selection\_validation.py", line 732, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "c:\Users\yisakg\desktop\sample_project_1\env\lib\site-packages\sklearn\base.py", line 1151, in wrapper
    return fit_method(estimator, *args, **kwargs)
  File "c:\Users\yisakg\desktop\sample_project_1\env\lib\site-packages\sklearn\ensemble\_forest.py", line 348, in fit
    X, y = self._validate_data(
  File "c:\Users\yisakg\desktop\sample_project_1\env\lib\site-packages\sklearn\base.py", line 621, in _validate_data
    X, y = check_X_y(X, y, **check_params)
  File "c:\Users\yisakg\desktop\sample_project_1\env\lib\site-packages\sklearn\utils\validation.py", line 1147, in check_X_y
    X = check_array(
  File "c:\Users\yisakg\desktop\sample_project_1\env\lib\site-packages\sklearn\utils\validation.py", line 917, in check_array
    array = _asarray_with_order(array, order=order, dtype=dtype, xp=xp)
  File "c:\Users\yisakg\desktop\sample_project_1\env\lib\site-packages\sklearn\utils\_array_api.py", line 380, in _asarray_with_order
    array = numpy.asarray(array, order=order, dtype=dtype)
  File "c:\Users\yisakg\desktop\sample_project_1\env\lib\site-packages\pandas\core\generic.py", line 1998, in __array__
    arr = np.asarray(values, dtype=dtype)
ValueError: could not convert string to float: 'Nissan'

--------------------------------------------------------------------------------
24 fits failed with the following error:
Traceback (most recent call last):
  File "c:\Users\yisakg\desktop\sample_project_1\env\lib\site-packages\sklearn\model_selection\_validation.py", line 732, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "c:\Users\yisakg\desktop\sample_project_1\env\lib\site-packages\sklearn\base.py", line 1151, in wrapper
    return fit_method(estimator, *args, **kwargs)
  File "c:\Users\yisakg\desktop\sample_project_1\env\lib\site-packages\sklearn\ensemble\_forest.py", line 348, in fit
    X, y = self._validate_data(
  File "c:\Users\yisakg\desktop\sample_project_1\env\lib\site-packages\sklearn\base.py", line 621, in _validate_data
    X, y = check_X_y(X, y, **check_params)
  File "c:\Users\yisakg\desktop\sample_project_1\env\lib\site-packages\sklearn\utils\validation.py", line 1147, in check_X_y
    X = check_array(
  File "c:\Users\yisakg\desktop\sample_project_1\env\lib\site-packages\sklearn\utils\validation.py", line 917, in check_array
    array = _asarray_with_order(array, order=order, dtype=dtype, xp=xp)
  File "c:\Users\yisakg\desktop\sample_project_1\env\lib\site-packages\sklearn\utils\_array_api.py", line 380, in _asarray_with_order
    array = numpy.asarray(array, order=order, dtype=dtype)
  File "c:\Users\yisakg\desktop\sample_project_1\env\lib\site-packages\pandas\core\generic.py", line 1998, in __array__
    arr = np.asarray(values, dtype=dtype)
ValueError: could not convert string to float: 'Toyota'

--------------------------------------------------------------------------------
20 fits failed with the following error:
Traceback (most recent call last):
  File "c:\Users\yisakg\desktop\sample_project_1\env\lib\site-packages\sklearn\model_selection\_validation.py", line 732, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "c:\Users\yisakg\desktop\sample_project_1\env\lib\site-packages\sklearn\base.py", line 1144, in wrapper
    estimator._validate_params()
  File "c:\Users\yisakg\desktop\sample_project_1\env\lib\site-packages\sklearn\base.py", line 637, in _validate_params
    validate_parameter_constraints(
  File "c:\Users\yisakg\desktop\sample_project_1\env\lib\site-packages\sklearn\utils\_param_validation.py", line 95, in validate_parameter_constraints
    raise InvalidParameterError(
sklearn.utils._param_validation.InvalidParameterError: The 'max_features' parameter of RandomForestClassifier must be an int in the range [1, inf), a float in the range (0.0, 1.0], a str among {'sqrt', 'log2'} or None. Got 'auto' instead.


## 6. Save and reload your trained model
You can save and load a model with `pickle`.

In [46]:
# Saving a model with pickle
import pickle

# Save an existing model to file
pickle.dump(rs_clf, open("rs_random_forest_model_1.pkl", "wb"))

In [47]:
# Load a saved pickle model
loaded_pickle_model = pickle.load(open("rs_random_forest_model_1.pkl", "rb"))

# Evaluate loaded model
loaded_pickle_model.score(X_test, y_test)

0.8032786885245902

You can do the same with `joblib`. `joblib` is usually more efficient with numerical data (what our models are).

In [48]:
# Saving a model with joblib
from joblib import dump, load

# Save a model to file
dump(rs_clf, filename="gs_random_forest_model_1.joblib") 

['gs_random_forest_model_1.joblib']

In [49]:
# Import a saved joblib model
loaded_joblib_model = load(filename="gs_random_forest_model_1.joblib")

In [50]:
# Evaluate joblib predictions 
loaded_joblib_model.score(X_test, y_test)

0.8032786885245902

## 7. Putting it all together (not pictured)

We can put a number of different Scikit-Learn functions together using `Pipeline`.

As an example, we'll use `car-sales-extended-missing-data.csv`. Which has missing data as well as non-numeric data. For a machine learning model to work, there can be no missing data or non-numeric values.

The problem we're solving here is predicting a cars sales price given a number of parameters about the car (a regression problem).

In [52]:
# Getting data ready
import pandas as pd
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import OneHotEncoder

# Modelling
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import train_test_split

# Setup random seed
import numpy as np
np.random.seed(42)

# Import data and drop rows with missing target values (Price)
data = pd.read_csv("car-sales-extended-missing-data.csv")  # Update path as needed
data.dropna(subset=["Price"], inplace=True)

# Define different features and transformer pipelines
categorical_features = ["Make", "Colour"]
categorical_transformer = Pipeline(steps=[
    ("imputer", SimpleImputer(strategy="constant", fill_value="missing")),
    ("onehot", OneHotEncoder(handle_unknown="ignore"))])

door_feature = ["Doors"]
door_transformer = Pipeline(steps=[
    ("imputer", SimpleImputer(strategy="constant", fill_value=4))])

numeric_features = ["Odometer (KM)"]
numeric_transformer = Pipeline(steps=[
    ("imputer", SimpleImputer(strategy="mean"))
])

# Setup preprocessing steps (fill missing values, then convert to numbers)
preprocessor = ColumnTransformer(
    transformers=[
        ("cat", categorical_transformer, categorical_features),
        ("door", door_transformer, door_feature),
        ("num", numeric_transformer, numeric_features)])

# Create a preprocessing and modelling pipeline
model = Pipeline(steps=[("preprocessor", preprocessor),
                        ("model", RandomForestRegressor())])

# Split data
X = data.drop("Price", axis=1)
y = data["Price"]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

# Fit and score the model
model.fit(X_train, y_train)
model.score(X_test, y_test)


0.22188417408787875