# 3.5 Hyperparameter Tuning

The choice of model parameters that is optimal for the specific problem can be found automatically by searching the model parameter space. The type of algorithm is fixed in this instance.

There are built-in toolkits to perform the hyperparameter search.

In [1]:
# basic tools
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
import wget
import h5py
import pandas as pd
import os

In [2]:
import numpy as np
from sklearn.datasets import load_digits,fetch_openml
digits = load_digits()
digits.keys()

dict_keys(['data', 'target', 'frame', 'feature_names', 'target_names', 'images', 'DESCR'])

In [3]:
# explore data type
data,y = digits["data"].copy(),digits["target"].copy()
print(type(data[0][:]),type(y[0]))
# note that we do not modify the raw data that is stored on the digits dictionary.

<class 'numpy.ndarray'> <class 'numpy.int64'>


In [4]:
print(min(data[0]),max(data[0]))
from sklearn.preprocessing import MinMaxScaler
from sklearn.model_selection import train_test_split
scaler = MinMaxScaler()
scaler.fit_transform(data)# fit the model for data normalization
newdata = scaler.transform(data) # transform the data. watch that data was converted to a numpy array

# Split data into 50% train and 50% test subsets
print(f"There are {data.shape[0]} data samples")
X_train, X_test, y_train, y_test = train_test_split(
    data, y, test_size=0.2, shuffle=False)


0.0 15.0
There are 1797 data samples


In [5]:
import sklearn
from sklearn import metrics
from sklearn.svm import SVC
from sklearn.neighbors import KNeighborsClassifier
from sklearn.ensemble import RandomForestClassifier

# Support Vector Machine classifier
clf = SVC(gamma=0.001) # model design
clf.fit(X_train, y_train) # learn
svc_prediction = clf.predict(X_test) # predict on test
print("SVC Accuracy:", metrics.accuracy_score(y_true=y_test ,y_pred=svc_prediction))


SVC Accuracy: 0.9583333333333334


what are the parameters we are trying to optimize?

In [6]:
clf.get_params()

{'C': 1.0,
 'break_ties': False,
 'cache_size': 200,
 'class_weight': None,
 'coef0': 0.0,
 'decision_function_shape': 'ovr',
 'degree': 3,
 'gamma': 0.001,
 'kernel': 'rbf',
 'max_iter': -1,
 'probability': False,
 'random_state': None,
 'shrinking': True,
 'tol': 0.001,
 'verbose': False}

A search consists of:

* an estimator (regressor or classifier such as ``SVC()``);

* a parameter space;

* a method for searching or sampling candidates;

* a cross-validation scheme; and

* a loss function.


## 1. Grid Search cross validation. 
Performs the search in the brute-force way using cross-validation. One has to define the parameter space. The scikit-learn function is ``GridSearchCV``. More details [here](!https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html).





In [7]:
from sklearn.model_selection import GridSearchCV
param_grid = [
  {'C': [1, 5,10,50, 100,500, 1000], 'kernel': ['linear']},
  {'C': [1,5, 10,50, 100,500, 1000], 'gamma': [0.01,0.001, 0.0001], 'kernel': ['rbf']},
 ]

The algorithm will search for all combinations of parameters, which can be from the model algorithms or the choice of features.

In [8]:
search = GridSearchCV(clf, param_grid, cv=5,verbose=3)

In [9]:
search.fit(X_train, y_train) # learn

Fitting 5 folds for each of 28 candidates, totalling 140 fits
[CV 1/5] END ................C=1, kernel=linear;, score=0.948 total time=   0.0s
[CV 2/5] END ................C=1, kernel=linear;, score=0.948 total time=   0.0s
[CV 3/5] END ................C=1, kernel=linear;, score=0.944 total time=   0.0s
[CV 4/5] END ................C=1, kernel=linear;, score=0.965 total time=   0.0s
[CV 5/5] END ................C=1, kernel=linear;, score=0.972 total time=   0.0s
[CV 1/5] END ................C=5, kernel=linear;, score=0.948 total time=   0.0s
[CV 2/5] END ................C=5, kernel=linear;, score=0.948 total time=   0.0s
[CV 3/5] END ................C=5, kernel=linear;, score=0.944 total time=   0.0s
[CV 4/5] END ................C=5, kernel=linear;, score=0.965 total time=   0.0s
[CV 5/5] END ................C=5, kernel=linear;, score=0.972 total time=   0.0s
[CV 1/5] END ...............C=10, kernel=linear;, score=0.948 total time=   0.0s
[CV 2/5] END ...............C=10, kernel=linear

GridSearchCV(cv=5, estimator=SVC(gamma=0.001),
             param_grid=[{'C': [1, 5, 10, 50, 100, 500, 1000],
                          'kernel': ['linear']},
                         {'C': [1, 5, 10, 50, 100, 500, 1000],
                          'gamma': [0.01, 0.001, 0.0001], 'kernel': ['rbf']}],
             verbose=3)

In [10]:
search.get_params()

{'cv': 5,
 'error_score': nan,
 'estimator__C': 1.0,
 'estimator__break_ties': False,
 'estimator__cache_size': 200,
 'estimator__class_weight': None,
 'estimator__coef0': 0.0,
 'estimator__decision_function_shape': 'ovr',
 'estimator__degree': 3,
 'estimator__gamma': 0.001,
 'estimator__kernel': 'rbf',
 'estimator__max_iter': -1,
 'estimator__probability': False,
 'estimator__random_state': None,
 'estimator__shrinking': True,
 'estimator__tol': 0.001,
 'estimator__verbose': False,
 'estimator': SVC(gamma=0.001),
 'n_jobs': None,
 'param_grid': [{'C': [1, 5, 10, 50, 100, 500, 1000], 'kernel': ['linear']},
  {'C': [1, 5, 10, 50, 100, 500, 1000],
   'gamma': [0.01, 0.001, 0.0001],
   'kernel': ['rbf']}],
 'pre_dispatch': '2*n_jobs',
 'refit': True,
 'return_train_score': False,
 'scoring': None,
 'verbose': 3}

In [11]:
search.cv_results_

{'mean_fit_time': array([0.0202435 , 0.02067056, 0.01902642, 0.02032223, 0.01920686,
        0.01953688, 0.01891789, 0.20665789, 0.06004992, 0.04982471,
        0.20282912, 0.06482105, 0.03267107, 0.20722747, 0.06392918,
        0.03161941, 0.20358534, 0.06525946, 0.03085084, 0.21259098,
        0.06442966, 0.03066716, 0.21929455, 0.06760669, 0.02960839,
        0.20766106, 0.06300383, 0.02873507]),
 'std_fit_time': array([0.00114511, 0.00178355, 0.00027251, 0.00060302, 0.00085577,
        0.00123313, 0.00039122, 0.01153848, 0.00060062, 0.00076676,
        0.00878694, 0.00794466, 0.00047886, 0.00471677, 0.00182859,
        0.00178545, 0.00475893, 0.00588141, 0.00295882, 0.01458503,
        0.00468219, 0.00284282, 0.00717183, 0.00408197, 0.00229398,
        0.00970747, 0.00293367, 0.00059843]),
 'mean_score_time': array([0.00566401, 0.00627389, 0.00564022, 0.00647168, 0.00561452,
        0.00579333, 0.00536418, 0.04278455, 0.02072887, 0.02577839,
        0.04078202, 0.02195706, 0.016586

In [12]:
search.best_params_

{'C': 5, 'gamma': 0.001, 'kernel': 'rbf'}

## 2. Random Search Cross Validation.
It performs the search in the brute-force way using cross-validation. One has to define the parameter space. The scikit-learn function is ``GridSearchCV``. More details [here](!https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html).

The advantage is that it can be used for a wide hyperparameter space and limit to ``n_iter`` number of iterations.

In [13]:
from sklearn.model_selection import RandomizedSearchCV
from scipy.stats import uniform,norm ,loguniform

 
distributions= [ {'C': uniform(loc=1, scale=1000), 'kernel': ['linear']},
  {'C': uniform(loc=1, scale=1000), 'gamma': loguniform(1e-4,1e-2), 'kernel': ['rbf']}]
clf2 = RandomizedSearchCV(clf, distributions, random_state=0,cv=5,n_iter=100)
clf2.fit(X_train,y_train)

RandomizedSearchCV(cv=5, estimator=SVC(gamma=0.001), n_iter=100,
                   param_distributions=[{'C': <scipy.stats._distn_infrastructure.rv_continuous_frozen object at 0x7facafc962e0>,
                                         'kernel': ['linear']},
                                        {'C': <scipy.stats._distn_infrastructure.rv_continuous_frozen object at 0x7facafc96670>,
                                         'gamma': <scipy.stats._distn_infrastructure.rv_continuous_frozen object at 0x7facafc96130>,
                                         'kernel': ['rbf']}],
                   random_state=0)

Compare the two

In [14]:
print(search.best_params_)
print(clf2.best_params_)

{'C': 5, 'gamma': 0.001, 'kernel': 'rbf'}
{'C': 20.987665408758737, 'gamma': 0.0007645780792982153, 'kernel': 'rbf'}


## 3. Ensemble learning

The group of models performs better than individual models. For example, Random Forests perform better than the individual Decision Trees that constitute the Forest.


### 3.1 Voting Classifier

Aggregate the predictions of each classifier and predict the class that gets the most votes.

![Voting Classifer](votingclassifier.png)

From "Hands on Machine Learning With Sci-kit Learn, Keras, and Tensorflow" (Gueron)

In [17]:
from sklearn.ensemble import VotingClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score


svc_clf = SVC(gamma=0.001) # model design
log_clf = LogisticRegression()
rf_clf = RandomForestClassifier()

voting_clf = VotingClassifier(
    estimators = [ ('lr',log_clf) , ('rf',rf_clf),('svc',svc_clf)],
    voting='hard')

voting_clf.fit(X_train, y_train) # learn
y_pred=voting_clf.predict(X_test)
accuracy_score(y_test,y_pred)


STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(


0.9388888888888889

Compare with individual ones

In [18]:
for clf in (log_clf,rf_clf,svc_clf,voting_clf):
    clf.fit(X_train,y_train)
    y_pred=clf.predict(X_test)
    print(clf.__class__.__name__,accuracy_score(y_test,y_pred))

STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(


LogisticRegression 0.9055555555555556
RandomForestClassifier 0.9222222222222223
SVC 0.9583333333333334


STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(


VotingClassifier 0.9472222222222222
