# 1. Selecting Best Models Using Exhaustive Search

# <b style="color:red">Problem : 
You want to select the best model by searching over a range of hyperparameters.</b>

# <b style="color:blue">Solution: 
Use scikit-learn’s GridSearchCV:</b>

In [1]:

# Load libraries
import numpy as np
import warnings
warnings.filterwarnings('ignore')
from sklearn import linear_model, datasets
from sklearn.model_selection import GridSearchCV
# Load data
iris = datasets.load_iris()
features = iris.data
target = iris.target
# Create logistic regression
logistic = linear_model.LogisticRegression()
# Create range of candidate penalty hyperparameter values
penalty = ['l1', 'l2']
# Create range of candidate regularization hyperparameter values
C = np.logspace(0, 4, 10)
# Create dictionary hyperparameter candidates
hyperparameters = dict(C=C, penalty=penalty)

# Create grid search
gridsearch = GridSearchCV(logistic, hyperparameters, cv=5, verbose=0)

# Fit grid search
best_model = gridsearch.fit(features, target)
np.logspace(0, 4, 10)


print('Best Penalty:', best_model.best_estimator_.get_params()['penalty'])
print('Best C:', best_model.best_estimator_.get_params()['C'])

# Predict target vector
best_model.predict(features)

Best Penalty: l2
Best C: 7.742636826811269


array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2])

# <b style="color:green">Supported Discussion </b>

GridSearchCV is a brute-force approach to model selection using cross-validation.
Specifically, a user defines sets of possible values for one or multiple hyperparameters,
and then GridSearchCV trains a model using every value and/or combination of values.
The model with the best performance score is selected as the best model. For
example, in our solution we used logistic regression as our learning algorithm, containing
two hyperparameters: C and the regularization penalty. Don’t worry if you
don’t know what C and regularization mean; we cover them in the next few chapters.
Just realize that C and the regularization penalty can take a range of values, which
have to be specified prior to training. For C, we define 10 possible values:

# 2. Selecting Best Models Using Randomized Search

# <b style="color:red">Problem
You want a computationally cheaper method than exhaustive search to select the best
model.</b>

# <b style="color:blue">Solution
Use scikit-learn’s RandomizedSearchCV:</b>

In [2]:
# Load libraries
from scipy.stats import uniform
from sklearn import linear_model, datasets
from sklearn.model_selection import RandomizedSearchCV
# Load data
iris = datasets.load_iris()
features = iris.data
target = iris.target
# Create logistic regression
logistic = linear_model.LogisticRegression()
# Create range of candidate regularization penalty hyperparameter values
penalty = ['l1', 'l2']
# Create distribution of candidate regularization hyperparameter values
C = uniform(loc=0, scale=4)
# Create hyperparameter options
hyperparameters = dict(C=C, penalty=penalty)
# Create randomized search
randomizedsearch = RandomizedSearchCV(
logistic, hyperparameters, random_state=1, n_iter=100, cv=5, verbose=0,
n_jobs=-1)
# Fit randomized search
best_model = randomizedsearch.fit(features, target)

# Define a uniform distribution between 0 and 4, sample 10 values
uniform(loc=0, scale=4).rvs(10)

# View best hyperparameters
print('Best Penalty:', best_model.best_estimator_.get_params()['penalty'])
print('Best C:', best_model.best_estimator_.get_params()['C'])

# Predict target vector
best_model.predict(features)

Best Penalty: l2
Best C: 3.730229437354635


array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2])

# <b style="color:green">Supported Discussion </b>
In Recipe 1, we used GridSearchCV on a user-defined set of hyperparameter values
to search for the best model according to a score function. A more efficient method
than GridSearchCV’s brute-force search is to search over a specific number of random
combinations of hyperparameter values from user-supplied distributions (e.g., normal,
uniform). scikit-learn implements this randomized search technique with Ran
domizedSearchCV.

# 3. Selecting Best Models from Multiple Learning Algorithms

# <b style="color:red">Problem
You want to select the best model by searching over a range of learning algorithms
and their respective hyperparameters.</b>

# <b style="color:blue">Solution: 
Create a dictionary of candidate learning algorithms and their hyperparameters:</b>

In [3]:
# Load libraries
import numpy as np
from sklearn import datasets
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import GridSearchCV
from sklearn.pipeline import Pipeline
# Set random seed
np.random.seed(0)
# Load data
iris = datasets.load_iris()
features = iris.data
target = iris.target
# Create a pipeline
pipe = Pipeline([("classifier", RandomForestClassifier())])
# Create dictionary with candidate learning algorithms and their hyperparameters
search_space = [{"classifier": [LogisticRegression()],
"classifier__penalty": ['l1', 'l2'],
"classifier__C": np.logspace(0, 4, 10)},
{"classifier": [RandomForestClassifier()],
"classifier__n_estimators": [10, 100, 1000],
"classifier__max_features": [1, 2, 3]}]
# Create grid search
gridsearch = GridSearchCV(pipe, search_space, cv=5, verbose=0)
# Fit grid search
best_model = gridsearch.fit(features, target)

# View best model
best_model.best_estimator_.get_params()["classifier"]

# Predict target vector
best_model.predict(features)

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2])

# <b style="color:green">Supported Discussion </b>
In the previous two recipes we found the best model by searching over possible
hyperparameter values of a learning algorithm. However, what if we are not certain
which learning algorithm to use? Recent versions of scikit-learn allow us to include
learning algorithms as part of the search space. In our solution we define a search
space that includes two learning algorithms: logistic regression and random forest
classifier. Each learning algorithm has its own hyperparameters, and we define their
candidate values using the format classifier__[hyperparameter name].

# 4. Selecting Best Models When Preprocessing

# <b style="color:red">Problem
You want to include a preprocessing step during model selection.</b>

# <b style="color:blue">Solution:
Create a pipeline that includes the preprocessing step and any of its parameters:</b>

In [4]:
# Load libraries
import numpy as np
from sklearn import datasets
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import GridSearchCV
from sklearn.pipeline import Pipeline, FeatureUnion
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler
# Set random seed
np.random.seed(0)
# Load data
iris = datasets.load_iris()
features = iris.data
target = iris.target
# Create a preprocessing object that includes StandardScaler features and PCA
preprocess = FeatureUnion([("std", StandardScaler()), ("pca", PCA())])
# Create a pipeline
pipe = Pipeline([("preprocess", preprocess),
("classifier", LogisticRegression())])
# Create space of candidate values
search_space = [{"preprocess__pca__n_components": [1, 2, 3],
"classifier__penalty": ["l1", "l2"],
"classifier__C": np.logspace(0, 4, 10)}]
# Create grid search
clf = GridSearchCV(pipe, search_space, cv=5, verbose=0, n_jobs=-1)
# Fit grid search
best_model = clf.fit(features, target)

# View best model
best_model.best_estimator_.get_params()['preprocess__pca__n_components']

2

# <b style="color:green">Supported Discussion </b>
have to be careful to properly handle preprocessing when conducting model selection.
First, GridSearchCV uses cross-validation to determine which model has the
highest performance. However, in cross-validation we are in effect pretending that
the fold held out, as the test set is not seen, and thus not part of fitting any preprocessing
steps (e.g., scaling or standardization). For this reason, we cannot preprocess
the data and then run GridSearchCV. Rather, the preprocessing steps must be a part
of the set of actions taken by GridSearchCV. While this might appear complex, the
reality is that scikit-learn makes it simple. FeatureUnion allows us to combine multi‐
ple preprocessing actions properly. In our solution we use FeatureUnion to combine
two preprocessing steps: standardize the feature values (StandardScaler) and Principal
Component Analysis (PCA). This object is called preprocess and contains both of
our preprocessing steps. We then include preprocess into a pipeline with our learning
algorithm. The end result is that this allows us to outsource the proper (and confusing)
handling of fitting, transforming, and training the models with combinations
of hyperparameters to scikit-learn.

# 5. Speeding Up Model Selection with Parallelization

# <b style="color:red">Problem
You need to speed up model selection.</b>

# <b style="color:blue">Solution:
Use all the cores in your machine by setting n_jobs=-1:</b>

In [5]:
# Load libraries
import numpy as np
from sklearn import linear_model, datasets
from sklearn.model_selection import GridSearchCV
# Load data
iris = datasets.load_iris()
features = iris.data
target = iris.target

# Create logistic regression
logistic = linear_model.LogisticRegression()
# Create range of candidate regularization penalty hyperparameter values
penalty = ["l1", "l2"]
# Create range of candidate values for C
C = np.logspace(0, 4, 1000)
# Create hyperparameter options
hyperparameters = dict(C=C, penalty=penalty)
# Create grid search
gridsearch = GridSearchCV(logistic, hyperparameters, cv=5, n_jobs=-1, verbose=1)
# Fit grid search
best_model = gridsearch.fit(features, target)

# Create grid search using one core
clf = GridSearchCV(logistic, hyperparameters, cv=5, n_jobs=1, verbose=1)
# Fit grid search
best_model = clf.fit(features, target)

Fitting 5 folds for each of 2000 candidates, totalling 10000 fits


[Parallel(n_jobs=-1)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=-1)]: Done 200 tasks      | elapsed:    1.8s
[Parallel(n_jobs=-1)]: Done 1400 tasks      | elapsed:   11.6s
[Parallel(n_jobs=-1)]: Done 3400 tasks      | elapsed:   30.6s
[Parallel(n_jobs=-1)]: Done 6200 tasks      | elapsed:   59.4s
[Parallel(n_jobs=-1)]: Done 9800 tasks      | elapsed:  1.6min
[Parallel(n_jobs=-1)]: Done 10000 out of 10000 | elapsed:  1.7min finished
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.


Fitting 5 folds for each of 2000 candidates, totalling 10000 fits


[Parallel(n_jobs=1)]: Done 10000 out of 10000 | elapsed:  2.7min finished


# <b style="color:green">Supported Discussion </b>
In the recipes of this chapter, we have kept the number of candidate models small to
make the code complete quickly. However, in the real world we will often have many
thousands or tens of thousands of models to train. The end result is that it can take
many hours to find the best model. To speed up the process, scikit-learn lets us train
multiple models simultaneously. Without going into too much technical detail, scikitlearn
can simultaneously train models up to the number of cores on the machine.
Most modern laptops have four cores, so (assuming you are currently on a laptop) we
can potentially train four models at the same time. This will dramatically increase the
speed of our model selection process. The parameter n_jobs defines the number of
models to train in parallel.

# 6. Speeding Up Model Selection Using Algorithm-Specific Methods

# <b style="color:red">Problem
You need to speed up model selection.</b>

# <b style="color:blue">Solution:
If you are using a select number of learning algorithms, use scikit-learn’s modelspecific
cross-validation hyperparameter tuning. For example, LogisticRegres
sionCV:</b>

In [6]:
# Load libraries
from sklearn import linear_model, datasets
# Load data
iris = datasets.load_iris()
features = iris.data
target = iris.target
# Create cross-validated logistic regression
logit = linear_model.LogisticRegressionCV(Cs=100)
# Train model
logit.fit(features, target)

LogisticRegressionCV(Cs=100)

# <b style="color:green">Supported Discussion </b>
Sometimes the characteristics of a learning algorithm allow us to search for the best
hyperparameters significantly faster than either brute force or randomized model
search methods. In scikit-learn, many learning algorithms (e.g., ridge, lasso, and elastic
net regression) have an algorithm-specific cross-validation method to take advantage
of this. For example, LogisticRegression is used to conduct a standard logistic
regression classifier, while LogisticRegressionCV implements an efficient crossvalidated
logistic regression classifier that has the ability to identify the optimum
value of the hyperparameter C.

# 7. Evaluating Performance After Model Selection

# <b style="color:red">Problem
You want to evaluate the performance of a model found through model selection.</b>

# <b style="color:blue">Solution
Use nested cross-validation to avoid biased evaluation:</b>

In [7]:
# Load libraries
import numpy as np
from sklearn import linear_model, datasets
from sklearn.model_selection import GridSearchCV, cross_val_score
# Load data
iris = datasets.load_iris()
features = iris.data
target = iris.target
# Create logistic regression
logistic = linear_model.LogisticRegression()
# Create range of 20 candidate values for C
C = np.logspace(0, 4, 20)
# Create hyperparameter options
hyperparameters = dict(C=C)
# Create grid search

gridsearch = GridSearchCV(logistic, hyperparameters, cv=5, n_jobs=-1, verbose=0)
# Conduct nested cross-validation and outut the average score
cross_val_score(gridsearch, features, target).mean()

gridsearch = GridSearchCV(logistic, hyperparameters, cv=5, verbose=1)

best_model = gridsearch.fit(features, target)

scores = cross_val_score(gridsearch, features, target)

Fitting 5 folds for each of 20 candidates, totalling 100 fits


[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
[Parallel(n_jobs=1)]: Done 100 out of 100 | elapsed:    3.3s finished
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.


Fitting 5 folds for each of 20 candidates, totalling 100 fits


[Parallel(n_jobs=1)]: Done 100 out of 100 | elapsed:    3.0s finished
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.


Fitting 5 folds for each of 20 candidates, totalling 100 fits


[Parallel(n_jobs=1)]: Done 100 out of 100 | elapsed:    3.1s finished
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.


Fitting 5 folds for each of 20 candidates, totalling 100 fits


[Parallel(n_jobs=1)]: Done 100 out of 100 | elapsed:    3.1s finished
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.


Fitting 5 folds for each of 20 candidates, totalling 100 fits


[Parallel(n_jobs=1)]: Done 100 out of 100 | elapsed:    3.2s finished
[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.


Fitting 5 folds for each of 20 candidates, totalling 100 fits


[Parallel(n_jobs=1)]: Done 100 out of 100 | elapsed:    3.1s finished


# <b style="color:green">Supported Discussion </b>
Nested cross-validation during model selection is a difficult concept for many people
to grasp the first time. Remember that in k-fold cross-validation, we train our model
on k–1 folds of the data, use this model to make predictions on the remaining fold,
and then evaluate our model best on how well our model’s predictions compare to the
true values. We then repeat this process k times.
In the model selection searches described in this chapter (i.e., GridSearchCV and Ran
domizedSearchCV), we used cross-validation to evaluate which hyperparameter values
produced the best models. However, a nuanced and generally underappreciated problem
arises: since we used the data to select the best hyperparameter values, we cannot
use that same data to evaluate the model’s performance. The solution? Wrap the
cross-validation used for model search in another cross-validation! In nested crossvalidation,
the “inner” cross-validation selects the best model, while the “outer” crossvalidation
provides us with an unbiased evaluation of the model’s performance. In
our solution, the inner cross-validation is our GridSearchCV object, which we then
wrap in an outer cross-validation using cross_val_score.