# Hyperparameter optimisation



A hyperparameter is any parameter whose value is used to control the learning process of a machine learning model, as opposed to a parameter which is learnt by the model during training. Hyperparameter optimization is the act of attempting to find a set of optimal hyperparameter values to get the best possible model performance. In this notebook, we will introduce two methods implemented in sklearn of performing hyperparameter optimisation.

In [1]:
import pandas as pd

from scipy.stats import randint as sp_randint
from sklearn.base import TransformerMixin,BaseEstimator
from sklearn.datasets import fetch_20newsgroups
from sklearn.decomposition import PCA
from sklearn.ensemble import RandomForestClassifier
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.feature_extraction.text import TfidfTransformer
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report
from sklearn.model_selection import GridSearchCV, RandomizedSearchCV
from sklearn.pipeline import Pipeline

We will again use the 20 Newsgroups dataset for this section, the modelling task in this case is to classify a document as being about religon or athiesm.


In [2]:
# downloading the data
categories = [
    'alt.atheism',
    'talk.religion.misc',
]

training_data = fetch_20newsgroups(subset='train', categories=categories)
X_train = training_data.data
Y_train = training_data.target

testing_data = fetch_20newsgroups(subset='test', categories=categories)
X_test = testing_data.data
Y_test = testing_data.target

In [3]:
# we will first define a pipeline, where one of the steps is a custom transformer defined in the transformers and pipelines notebook
class ToDenseTransformer(BaseEstimator,TransformerMixin):

    # define the transform operation
    def transform(self, X, y=None, **fit_params):
        return X.todense()

    # no paramter to learn this case
    # fit just returns an unchanged object
    def fit(self, X, y=None, **fit_params):
        return self

In [4]:
# initiate a pipeline object
pipeline = Pipeline([
    ('cv', CountVectorizer()),
    ('to_dense', ToDenseTransformer()),
    ('pca', PCA()),
    ('clf', RandomForestClassifier())
])

A common method of hyperparameter optimisation is a grid search. For each hyperparameter of interest we define a set of values to test. We then exaustively train and evaulate our model using all of the possible combinations of hyperparameters. In sklearn we can grid search estimators and whole pipelines. Below is an example of grid searching the pipeline defined above.

In [5]:
# first we need to define a parameter search grid.
# Our parameter grid is a python dictionary or a list of python dictionaries. Each dictionary key is the name of a hyperparameter you want to tune, the corresponding value is a list of the desired options.

# if you are grid searching a pipeline, the key must follow the following syntax 'pipelineStepName__hyperparameterName'. 

param_grid = [
    {
        'cv__max_features':[100,200,500,1000], 
        'pca__n_components':[2,5,10,20],
        'clf__n_estimators':[200, 500]      
    }
]

# we can also grid search the transformer or estimator choice itself 
# for example:
# param_grid = [
#     {
#         'cv': [CountVectorizer(), TfidfVectorizer],
#         'pca__n_components':[2,5,10,20],
#         'clf': [RandomForestClassifier(), XGBoostClassifier()]     
#     }
# ]  


grid = GridSearchCV(pipeline,
                    cv=3, # the number of folds for cross validation
                    param_grid=param_grid,
                    n_jobs=-1)


In [6]:
# to perform the grid search we call the fit method
grid.fit(X_train,Y_train);

In [7]:
# to see the best hyperparameters
grid.best_params_

{'clf__n_estimators': 200, 'cv__max_features': 1000, 'pca__n_components': 20}

In [8]:
# we can make predictions using the best performing model using the predict method
Y_pred = grid.predict(X_test)
print(classification_report(Y_test, Y_pred))

              precision    recall  f1-score   support

           0       0.67      0.69      0.68       319
           1       0.59      0.56      0.57       251

    accuracy                           0.63       570
   macro avg       0.63      0.63      0.63       570
weighted avg       0.63      0.63      0.63       570



Grid searching suffers from the curse of dimensionality and the number of combinations to test can quickly become infeasible. Another type of hyperparameter optimisation that avoids this problem is a random search. Random search simply selects the value for a hyperparameter randomly from a set of distrete options, or a distribution, a predefined number of times.

In [9]:
# define our parameter distributions. In this case we are treating each hyperparameter as uniform random discrete variables.
param_dist = {
        'cv__max_features':sp_randint(100, 3000),
        'pca__n_components':sp_randint(1, 60),
        'clf__n_estimators':sp_randint(50, 1000)
    }

n_iter_search = 20
random_search = RandomizedSearchCV(pipeline,
                                   param_distributions=param_dist, 
                                   n_iter=n_iter_search, cv=3)

In [10]:
# to perform the grid search we call the fit method
random_search.fit(X_train,Y_train);

In [11]:
Y_pred = random_search.predict(X_test)
print(classification_report(Y_test, Y_pred))

              precision    recall  f1-score   support

           0       0.68      0.78      0.72       319
           1       0.65      0.53      0.58       251

    accuracy                           0.67       570
   macro avg       0.66      0.65      0.65       570
weighted avg       0.66      0.67      0.66       570



There are other methods for hyperperamater optimisation not covered here, for example bayasian and genetic optimisation.