# CHAPTER 10: Optimizing Hyperparameters

This notebook is a supplement for *Chapter 10. Hyperparameter Tuning* of **Machine Learning For Everyone** book.




In this notebook, we shall see how to use GridSearchCV and also find out how it improves the performance of the model.

GridSearchCV is a function that comes in Scikit-learn’s(or SK-learn) model_selection package.So an important point here to note is that we need to have Scikit-learn library installed on the computer. This function helps to loop through predefined hyperparameters and fit your estimator (model) on your training set. So, in the end, we can select the best parameters from the listed hyperparameters.

# 1. Implementation

Now, let us see how to use GridSearchCV to improve the accuracy of our model. Here I am going to train the model twice, once without using GridsearchCV, i.e. using the default hyperparameter values, and the other time we will use GridSearchCV to find the optimal values of hyperparameters for the dataset at hand. I am using the famous Breast Cancer Wisconsin (Diagnostic) Data Set, accessible in sklearn.

## 1.1. Without GridSeach

Let's first import the neccessary libraries we are going to use further in this notebook.

In [8]:
#import all necessary libraries
import sklearn
from sklearn.datasets import load_breast_cancer
from sklearn.metrics import classification_report, confusion_matrix 
from sklearn.datasets import load_breast_cancer 
from sklearn.svm import SVC 
from sklearn.model_selection import GridSearchCV
from sklearn.model_selection import train_test_split

Let's now import the Breast Cancer Wisconsin (Diagnostic) Data Set.

In [11]:
#load the dataset
dataset = load_breast_cancer()
X=dataset.data
Y=dataset.target

In [22]:
#looking at X
X

array([[1.799e+01, 1.038e+01, 1.228e+02, ..., 2.654e-01, 4.601e-01,
        1.189e-01],
       [2.057e+01, 1.777e+01, 1.329e+02, ..., 1.860e-01, 2.750e-01,
        8.902e-02],
       [1.969e+01, 2.125e+01, 1.300e+02, ..., 2.430e-01, 3.613e-01,
        8.758e-02],
       ...,
       [1.660e+01, 2.808e+01, 1.083e+02, ..., 1.418e-01, 2.218e-01,
        7.820e-02],
       [2.060e+01, 2.933e+01, 1.401e+02, ..., 2.650e-01, 4.087e-01,
        1.240e-01],
       [7.760e+00, 2.454e+01, 4.792e+01, ..., 0.000e+00, 2.871e-01,
        7.039e-02]])

In [24]:
#looking at Y
Y

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0,
       0, 0, 1, 0, 1, 1, 1, 1, 1, 0, 0, 1, 0, 0, 1, 1, 1, 1, 0, 1, 0, 0,
       1, 1, 1, 1, 0, 1, 0, 0, 1, 0, 1, 0, 0, 1, 1, 1, 0, 0, 1, 0, 0, 0,
       1, 1, 1, 0, 1, 1, 0, 0, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 1, 1, 0, 1,
       1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 0, 0, 1, 1, 1, 0, 0, 1, 0, 1, 0,
       0, 1, 0, 0, 1, 1, 0, 1, 1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 0, 1, 1, 1, 1, 0, 0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 0, 1, 1, 1,
       1, 0, 1, 1, 0, 0, 0, 1, 0, 1, 0, 1, 1, 1, 0, 1, 1, 0, 0, 1, 0, 0,
       0, 0, 1, 0, 0, 0, 1, 0, 1, 0, 1, 1, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0,
       1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 0, 1, 1, 0, 0, 1, 0, 1, 1,
       1, 1, 0, 1, 1, 1, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 0, 1, 1, 0, 1, 0, 0, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 0,

Let's split the dataset into training and testing sets

In [25]:
#split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size = 0.30, random_state = 101)

Now let's use the training set, that is, X_train and y_train, to train SVM algorithm with the default hyperparameter values.

In [None]:
# train the model on train set without using GridSearchCV 
model = SVC() 
model.fit(X_train, y_train)

Lastly, let's evaluate how the model performs after the training phase. For that, we again need to calculate the model predictions by feeding the test set (X_test). Then, we need to compare the model predictions with the actual result (y_test). Gladly, classification_report from sklearn.metrics does it for us: it takes two arguments, actual results and model predictions. 

In [26]:
# calculate model predictions 
predictions = model.predict(X_test) 

# print classification report
print(classification_report(y_test, predictions)) 

              precision    recall  f1-score   support

           0       0.95      0.85      0.90        66
           1       0.91      0.97      0.94       105

    accuracy                           0.92       171
   macro avg       0.93      0.91      0.92       171
weighted avg       0.93      0.92      0.92       171



We see precision, recall and f1-score metrics. Let's see if we can reach a higher accuracy with the gridsearch! 

## 1.2. With GridSeach

As mentioned above, we pass predefined values for hyperparameters to the GridSearchCV function. We do this by defining a dictionary in which we mention a particular hyperparameter along with the values it can take. Below, *param_grid* is an example of it.

In [37]:
# defining parameter range 
param_grid = {'C': [0.1, 1, 10, 100],  
              'gamma': [1, 0.1, 0.01, 0.001, 0.0001], 
              'gamma':['scale', 'auto'],
              'kernel': ['linear']}  

Here C, gamma and kernels are some of the hyperparameters of an SVM model. Note that the rest of the hyperparameters will be set to their default values. GridSearchCV tries **all the combinations** of the values passed in the dictionary and evaluates the model for each combination using the Cross-Validation method. Hence after using this function we get accuracy/loss for every combination of hyperparameters and we can choose the one with the best performance.

Thus, next, we pass the dictionary inside the GridSearchCV() function, along with the defined model and few other hyperparameters (yes, GridSearchCV() also has hyperparameters) that we are going to briefly describe inside comments tag. You can see the complete list on the [original documentation](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html).

In [31]:
# pass the model SVM and pre-defined hyperparameter dictionary param_grid to GridSearchCV() 
grid = GridSearchCV(SVC(),                    # the defined SVM model
                    param_grid,               # hyperparameter dictionary
                    verbose = 3,              # you can set it to 1 to get the detailed print out while you fit the data to GridSearchCV
                    n_jobs=-1)                # number of processes you wish to run in parallel for this task if it -1 it will use all available processors.

Now, similarly, we fit the GridSearch on X_train and y_train.

In [32]:
# fitting the model for grid search 
grid.fit(X_train, y_train) 

Fitting 5 folds for each of 8 candidates, totalling 40 fits


[Parallel(n_jobs=-1)]: Using backend LokyBackend with 16 concurrent workers.
[Parallel(n_jobs=-1)]: Done  23 out of  40 | elapsed:    4.4s remaining:    3.2s
[Parallel(n_jobs=-1)]: Done  37 out of  40 | elapsed:    9.0s remaining:    0.7s
[Parallel(n_jobs=-1)]: Done  40 out of  40 | elapsed:   12.6s finished


GridSearchCV(estimator=SVC(), n_jobs=-1,
             param_grid={'C': [0.1, 1, 10, 100], 'gamma': ['scale', 'auto'],
                         'kernel': ['linear']},
             verbose=3)

While training, Gridsearch will try every combination of hyperparameter values mensioned in param_grid. After the training, we can extract the best model by calling grid.best_params_

In [34]:
# print best parameter after tuning 
print(grid.best_params_)

{'C': 100, 'gamma': 'scale', 'kernel': 'linear'}


You can see that this set of hyperparamter values were the best in model performance. Well, let's build a classification report for this specific model. For that, we again need to calculate the model predictions by feeding the test set (X_test). Then, we need to compare the model predictions with the actual result (y_test). Gladly, classification_report from sklearn.metrics does it for us: it takes two arguments, actual results and model predictions. 

In [36]:
# calculate model predictions 
grid_predictions = grid.predict(X_test) 
   
# print classification report 
print(classification_report(y_test, grid_predictions)) 

              precision    recall  f1-score   support

           0       0.97      0.91      0.94        66
           1       0.94      0.98      0.96       105

    accuracy                           0.95       171
   macro avg       0.96      0.95      0.95       171
weighted avg       0.95      0.95      0.95       171



Ha. Indeed, the accuracy is better with this specific model that GridSearch has chosen to be the best. 

**Last note:** A lot of you might think that {‘C’: 100, ‘gamma’: ‘scale’, ‘kernel’: ‘linear’} are the best values for hyperparameters for an SVM model. This is not the case, the above-mentioned hyperparameters may be the best for the dataset we are working on. But for any other dataset, the SVM model can have different optimal values for hyperparameters that may improve its performance.

This brings us to the end of this notebook where we learned how to find optimal hyperparameters of the SVM model to get the best performance out of it.