# Packages importation

In [1]:
#import all necessary libraries
import sklearn
from sklearn.datasets import load_breast_cancer
from sklearn.metrics import classification_report, confusion_matrix 
from sklearn.datasets import load_breast_cancer 
from sklearn.svm import SVC 
from sklearn.model_selection import GridSearchCV
from sklearn.model_selection import train_test_split 
from sklearn.preprocessing import StandardScaler

# Data importation

In [20]:
#load the dataset and split it into training and testing sets
dataset = load_breast_cancer()

X=dataset.data
Y=dataset.target

X_train, X_test, y_train, y_test = train_test_split(X,Y,test_size = 0.10, random_state = 123) 

scaler = StandardScaler()

scaler.fit(X_train)

X_train_norm = scaler.transform(X_train)

X_test_norm = scaler.transform(X_test)

# Using GridSearch with Sklearn

In almost any Machine Learning project, we train different models on the dataset and select the one with the best performance. However, there is room for improvement as we cannot say for sure that this particular model is best for the problem at hand. Hence, our aim is to improve the model in any way possible. One important factor in the performances of these models are their hyperparameters, once we set appropriate values for these hyperparameters, the performance of a model can improve significantly. In this notebook, we will find out how we can find optimal values for the hyperparameters of a model by using GridSearchCV.

## What is GridSearch ?

First, let us understand what is grid search? It is the process of performing hyperparameter tuning in order to determine the optimal values for a given model. As mentioned above, the performance of a model significantly depends on the value of hyperparameters. Note that there is no way to know in advance the best values for hyperparameters so ideally, we need to try all possible values to know the optimal values. Doing this manually could take a considerable amount of time and resources and thus we use GridSearchCV to automate the tuning of hyperparameters.

GridSearchCV is a function that comes in Scikit-learn’s(or SK-learn) model_selection package. This function helps to loop through predefined hyperparameters and fit your estimator (model) on your training set. So, in the end, we can select the best parameters from the listed hyperparameters.

Feel free to check the [documentation](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html) to find more information.

# How GirdSearch works ?

As mentioned above, we pass predefined values for hyperparameters to the GridSearchCV function. We do this by defining a dictionary in which we mention a particular hyperparameter along with the values it can take. 

Here is an example of it :

In [21]:
 dict_of_hyperparametrs = { 'C': [0.1, 1, 10, 100, 1000],  
                            'gamma': [1, 0.1, 0.01, 0.001, 0.0001], 
                            'kernel': ['rbf','linear','sigmoid']  }

Here C, gamma and kernels are some of the hyperparameters of an SVM model. Note that the rest of the hyperparameters will be set to their default values.

GridSearchCV tries all the combinations of the values passed in the dictionary and evaluates the model for each combination using the Cross-Validation method. Hence after using this function we get accuracy/loss for every combination of hyperparameters and we can choose the one with the best performance.

# How to use gridearch ? 

In this section, we shall see how to use GridSearchCV and also find out how it improves the performance of the model.

First, let us see some of the various arguments that are taken by GridSearchCV function:

**estimator**: Pass the model instance for which you want to check the hyperparameters.

**params_grid**: the dictionary object that holds the hyperparameters you want to try

**scoring**: evaluation metric that you want to use, you can simply pass a valid string/object of evaluation metric

**cv**: number of cross-validation you have to try for each selected set of hyperparameters

**verbose**: you can set it to 1 to get the detailed print out while you fit the data to GridSearchCV

**n_jobs**: number of processes you wish to run in parallel for this task if it -1 it will use all available processors. 

# Use Girdsearch

## Train a model with default hyperparameters

Initialize support vector machine model for classification using default hyperparameters.

Feel free to use the [documentation](https://scikit-learn.org/stable/modules/generated/sklearn.svm.SVR.html#sklearn.svm.SVR).

In [22]:
# train the model on train set without using GridSearchCV 
model = SVC() 

Train the model 

In [23]:
model.fit(X_train_norm, y_train) 

SVC()

Make the prediction on the test set

In [24]:
default_predictions = model.predict(X_test_norm) 

Use *classification_report* to get the performance of your model.

Feel free to use the [documentation](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.classification_report.html).

In [25]:
print(classification_report(y_test, default_predictions)) 

              precision    recall  f1-score   support

           0       1.00      0.96      0.98        24
           1       0.97      1.00      0.99        33

    accuracy                           0.98        57
   macro avg       0.99      0.98      0.98        57
weighted avg       0.98      0.98      0.98        57



## Train a model with optimized hyperparameters

Define a dictionnary call *param_grid* with these hyperparameters 

**C** : 0.1, 1, 10

**gamma** : 0.1, 0.01

**kernel** : linear, rbf

In [46]:
# defining parameter range 
param_grid = {'C': [0.1, 1, 10],
              'gamma' : [0.1, 0.01] ,
              'kernel': ['linear','rbf']}  

Initialize *Gridsearch* function.

Feel free to use the [documentation](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html).

In [47]:
grid = GridSearchCV(SVC(), param_grid, verbose=3) 

Train your gridsearch 

In [48]:
# fitting the model for grid search 
grid.fit(X_train_norm, y_train) 

Fitting 5 folds for each of 12 candidates, totalling 60 fits
[CV 1/5] END ...C=0.1, gamma=0.1, kernel=linear;, score=1.000 total time=   0.0s
[CV 2/5] END ...C=0.1, gamma=0.1, kernel=linear;, score=0.961 total time=   0.0s
[CV 3/5] END ...C=0.1, gamma=0.1, kernel=linear;, score=0.990 total time=   0.0s
[CV 4/5] END ...C=0.1, gamma=0.1, kernel=linear;, score=0.980 total time=   0.0s
[CV 5/5] END ...C=0.1, gamma=0.1, kernel=linear;, score=0.951 total time=   0.0s
[CV 1/5] END ......C=0.1, gamma=0.1, kernel=rbf;, score=0.990 total time=   0.0s
[CV 2/5] END ......C=0.1, gamma=0.1, kernel=rbf;, score=0.951 total time=   0.0s
[CV 3/5] END ......C=0.1, gamma=0.1, kernel=rbf;, score=0.922 total time=   0.0s
[CV 4/5] END ......C=0.1, gamma=0.1, kernel=rbf;, score=0.951 total time=   0.0s
[CV 5/5] END ......C=0.1, gamma=0.1, kernel=rbf;, score=0.902 total time=   0.0s
[CV 1/5] END ..C=0.1, gamma=0.01, kernel=linear;, score=1.000 total time=   0.0s
[CV 2/5] END ..C=0.1, gamma=0.01, kernel=linear;

GridSearchCV(estimator=SVC(),
             param_grid={'C': [0.1, 1, 10], 'gamma': [0.1, 0.01],
                         'kernel': ['linear', 'rbf']},
             verbose=3)

Print the parameter for the most performant model.

In [49]:
# print best parameter after tuning 
print(grid.best_params_) 
grid_predictions = grid.predict(X_test_norm) 

{'C': 10, 'gamma': 0.01, 'kernel': 'rbf'}


Use *classification_report* to get the performance of your model.

Feel free to use the [documentation](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.classification_report.html).

In [50]:
# print classification report 
print(classification_report(y_test, grid_predictions))

              precision    recall  f1-score   support

           0       1.00      0.96      0.98        24
           1       0.97      1.00      0.99        33

    accuracy                           0.98        57
   macro avg       0.99      0.98      0.98        57
weighted avg       0.98      0.98      0.98        57

