## Grid SearchCV
### Hyper-Parameter Tuning in Machine Learning
Hyper-parameter tuning refers to the process of find hyper-parameters that yield the best result. This, of course, sounds a lot easier than it actually is. Finding the best hyper-parameters can be an elusive art, especially given that it depends largely on your training and testing data.

As your data evolves, the hyper-parameters that were once high performing may not longer perform well. Keeping track of the success of your model is critical to ensure it grows with the data.

One way to tune your hyper-parameters is to use a grid search. This is probably the simplest method as well as the most crude. In a grid search, you try a grid of hyper-parameters and evaluate the performance of each combination of hyper-parameters.

The GridSearchCV class in Sklearn serves a dual purpose in tuning your model. The class allows you to:
1. Apply a grid search to an array of hyper-parameters, and
2. Cross-validate your model using k-fold cross validation

To start, let's check the difference between parameters and hyperparameters.

### Parameter 	
Parameters in a machine learning model refer to the variables that an algorithm itself produces (such as a coefficient) to produce a prediction. These parameters are not set or hard-coded and depend on the training data that is passed into your model. Because of this, they’re likely to change when your data changes.
- The configuration model’s parameters are internal to the model.
- Predictions require the use of parameters.
- These are specified or guessed while the model is being trained.
- This is internal to the model.
- These are learned & set by the model by itself.

### Hyperparameter
Hyper-parameters are variables that you specify while building a machine-learning model. This means that it’s the user that defines the hyper-parameters while building the model. For example, in a k-nearest neighbour algorithm, the hyper-parameters can refer the value for k or the type of distance measurement used.
- Hyperparameters are parameters that are explicitly specified and control the training process.
- Model optimization necessitates the use of hyperparameters.
- These are established prior to the start of the model’s training.
- This is external to the model.
- These are set manually by a machine learning engineer/practitioner.

In short, hyper-parameters control the learning process, while parameters are learned.



GridSearchCV tries all the combinations of the values passed in the dictionary and evaluates the model for each combination using the Cross-Validation method. Hence after using this function we get accuracy/loss for every combination of hyperparameters and we can choose the one with the best performance.

The “best” parameters that GridSearchCV identifies are technically the best that could be produced, but only by the parameters that you included in your parameter grid.

In [1]:
# importing dataset
import pandas as pd
df = pd.read_csv('penguins.csv')
df.head()

Unnamed: 0,species,island,bill_length_mm,bill_depth_mm,flipper_length_mm,body_mass_g,sex
0,Adelie,Torgersen,39.1,18.7,181.0,3750.0,MALE
1,Adelie,Torgersen,39.5,17.4,186.0,3800.0,FEMALE
2,Adelie,Torgersen,40.3,18.0,195.0,3250.0,FEMALE
3,Adelie,Torgersen,,,,,
4,Adelie,Torgersen,36.7,19.3,193.0,3450.0,FEMALE


In [2]:
# dropping all missing values
df = df.dropna(how='any')

In [3]:
# obtaining labels and fetures
X = df.drop(columns=['species', 'island', 'sex'])
y = df['species']

In [4]:
# splliting dataset into test and train dataset
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state=1234, )

From there, we can create a KNN classifier object as well as a GridSearchCV object. For this, we’ll need to import the classes from neighbors and model_selection respectively. We can also define a dictionary of the hyper-parameters we want to evaluate.

In [5]:
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import GridSearchCV
knn = KNeighborsClassifier()

A k-nearest neighbour classifier has a number of different hyper-parameters available. In this case, we’ll focus on:

n_neighbors, which determines the number of neighbours to look at
weights, which determines whether to weigh the distance of each neighbour
p, which determines the type of distance measure to use. For example, 1 would imply the use of the Manhattan Distance, while 2 would imply the use of the Euclidian distance.

In [6]:
params = {
    'n_neighbors' : [3,5,7,9,11,13], 
    'weights' : ['uniform', 'distance'], 
    'p' : [1,2]
}

From the class definition, you can see that the function that takes a number of parameters. Let’s explore these in a bit more detail:

- estimator= takes an estimator object, such as a classifier or a regression model.
- param_grid= takes a dictionary or a list of dictionaries. The dictionaries should be key-value pairs, where the key is the hyper-parameter and the value are the cases of hyper-parameter values to test.
- cv= takes an integer that determines the cross-validation strategy to apply. If None is passed, then 5 is used.
- scoring= takes a string or a callable. This represents the strategy to evaluate the performance of the test set.
- n_jobs= represents the number of jobs to run in parallel. Since this is a time-consuming process, running more jobs in parallel (if your computer can handle it) can speed up the process.
- verbose= determines how much information is displayed. Using a value of 1 displays the time for each run. 2 indicates that the score is also displayed. 3 indicates that the fold and candidate parameter are also displayed.

In [7]:
clf = GridSearchCV(estimator=knn, param_grid=params, cv = 5, n_jobs=5, verbose=1)

At this point, I’ve created a clf object, which is your GridSearchCV object. At this point, we’ve really just instantiated the object. We still haven’t done anything with it in particular.

Let’s apply the .fit() method to the object, by passing in our training data:

In [8]:
clf.fit(X_train, y_train)

Fitting 5 folds for each of 24 candidates, totalling 120 fits


GridSearchCV(cv=5, estimator=KNeighborsClassifier(), n_jobs=5,
             param_grid={'n_neighbors': [3, 5, 7, 9, 11, 13], 'p': [1, 2],
                         'weights': ['uniform', 'distance']},
             verbose=1)

We can see that, because we instructed Sklearn to be verbose, that our entire task took 0.6s and ran 120 jobs!

At this point, our object contains a number of really helpful attributes. One of these attributes is the .best_params_ attribute. This attribute provides the hyper-parameters that for the given data and options for the hyper-parameters.

In [9]:
print(clf.best_params_)

{'n_neighbors': 11, 'p': 1, 'weights': 'distance'}


This indicates that it’s best to use 11 neighbours, the Manhattan distance, and a distance-weighted neighbour search.

In [10]:
# checking best score
print(clf.best_score_)

0.8381551362683439


## Do We Need to Split Data with Sklearn GridSearchCV?
An important topic to consider is whether or not we need to split data into training and testing data when using GridSearchCV. The reason this is a consideration (and not a given), is that the cross validation process itself splits the data into training and testing data.

By first splitting our dataset, we’re effectively reducing the data that can be used by GridSearchCV. There are polarized opinions about whether pre-splitting the data is a good idea or not.

In general, there is potential for data leakage into the hyper-parameters by not first splitting your data. By reserving a percentage of records for your true testing of the model, you’re able to get a more representative view of whether or not the model actually performs effectively.

## Limitations of Sklearn GridSearchCV
At first glance, the GridSearchCV class looks like a miracle. It automates some very mundane tasks and gives you a good sense of what hyper-parameters will work best for your model.

That said, there are a number of limitations for the grid search.

1. best_params_ doesn’t show the overall best parameters, but rather the best parameters of the ones you passed in to search.
2. The process can end up being incredibly time consuming. When we fit the data, we noticed that the method ran through 120 instances of our model! Imagine running through a significantly larger dataset, with more parameters.
The reason that this required 120 runs of the model is that each of the hyper-parameters is tested in combination with each other. This is then multiplied by the value of the cross validations that are undertaken.

In our case, we tested with:

6 neighbours
2 distances
2 weights
5 cross validations
This amounts to 6 * 2 * 2 * 5 = 120 tests.

## Suggestion
Although GridSearchCV has numerous benefits, you may not want to spend too much time and effort perfectly tuning your model. A better use of time may be to investigate your features further. Feature engineering and selecting subsets of features can increase (or decrease) the performance of your model tremendously. This will take much more effort than plugging in numbers into a parameter grid but, in return, also further develop your understanding of the dataset and possibly discover new relationships between features.

## The Bottom Line
GridSearchCV is a useful tool to fine tune the parameters of your model. Depending on the estimator being used, there may be even more hyperparameters that need tuning than the ones in this blog (ex. K-Neighbors vs Random Forest). Do not expect the search to improve your results greatly. It may be more efficient to go back and explore your selected features or find other relationships between features to improve your model performance.