## Hyperparameter Tuning using Grid and Random Search
#### Model Parameters VS Hyperparameters
- 1) Hyperparameters are set manually by ML engineer/practitioner prior to the start of the model’s training.
- 1) Model parameters are learnt by the learning algorithm during the training phase.
- 2) Hyperparameters are used to optimize machine learning model.
- 2) Model’s parameters are later used for prediction.
- 3) They are internal to the model.
- 3) Thy are external to the model
- 4) Examples: Value of K in KNN, learning rate for training a neural network, number of trees in RandomForrest.
- 4) Examples: Coefficients in a Linear or Logistic regression, support vectors in a support vector machine, and weights in an     artificial neural network.


#### How to find the hyperparameters of a model

In [3]:
from sklearn.linear_model import Ridge
model = Ridge()
model.get_params()

{'alpha': 1.0,
 'copy_X': True,
 'fit_intercept': True,
 'max_iter': None,
 'positive': False,
 'random_state': None,
 'solver': 'auto',
 'tol': 0.0001}

### How to find best parameters of our model
***Hyperparameter Turing Techniques***
- 1) ***By Hand***: Select the hyperparameters values based on intuition/experience/guessing, train the model with the hyperparameters, and score on the validation data. Repeat process until you run out of patience or are satisfied with the results.
- 2) ***GridSearchCV***: Set up a grid of hyperparameter values and for each combination, train a model and score on the validation data. In this approach, every single combination of hyperparameter values is tried which can be very inefficient.
- 3) ***RandomizedSearchCV***: Set up a grid of hyperparameter values and select random combinations to train the model and score. The number of search iterations is based on time/resources


#### Hypertuning Steps
1) Make a list of different hyperparameters based on the problem in hand. If there are more than one hyperparameter then make grid with different combination of parameters
2) Fit all of them separately to the model.
3) Note down the model performance
4) Choose the best performing one
5) Always use cross validation technique for hyperparameter tuning to avoid the model overfitting on test data.


## 3. Find Optimized Hyperparameter By Hand

### a. Find Optimized Hyperparameter by Trial and Error
- Approach 1: Use train_test_split and manually tune parameters by trial and error

In [19]:
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import Ridge
from sklearn.metrics import r2_score
df = pd.read_csv("datasets/advertising4D.csv")
X = df.drop('sales', axis=1)
y = df['sales']
# Do a train-test-split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, shuffle=True)
# SCALE DATA
scaler = StandardScaler()
scaler.fit(X_train)
X_train = scaler.transform(X_train)
X_test = scaler.transform(X_test)
# Specify values of hyperparameters
model = Ridge(alpha=100, solver='lsqr')
# Train
model.fit(X_train, y_train)
# Evaluate
r2 = r2_score(y_test, model.predict(X_test))
r2 = model.score(X_test, y_test)
print("R2 Score: ", r2)


R2 Score:  0.769629851355745


- Each time you execute the above code cell, you get different R2 scores. None of these R2 scores is the true representation of the entire dataset. This is because of the train-test split. The limitation is each time the model is tested on different 20% of the test set. The solution is to use some Cross-Validation technique.

### Use of cross_val_score() Method (KFold)

In [26]:
from sklearn.model_selection import cross_val_score
df = pd.read_csv("datasets/advertising4D.csv")
X = df.drop('sales', axis=1)
y = df['sales']
# Do a train-test-split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, shuffle=True)
# SCALE DATA
scaler = StandardScaler()
scaler.fit(X_train)
X_train = scaler.transform(X_train)
X_test = scaler.transform(X_test)
#cross_val_score() is a cross validation method that trains and tests a model over mu
cv_scores = cross_val_score(
              estimator = Ridge(alpha=100, solver='lsqr'), 
              X = X_train, 
              y = y_train, 
              scoring = 'r2',
              cv = 5 #Instead of integer value you can pass KFold obect
 )
print("R2 scores for all the folds: ", cv_scores)
print("Mean R2 score: ", np.mean(cv_scores))

R2 scores for all the folds:  [0.74168497 0.73276148 0.7480924  0.75002622 0.72592539]
Mean R2 score:  0.7396980922198394


### b. Find Optimized Hyperparameter using For Loops
#### (i) Use Simple Train-Test-Split

In [29]:
df = pd.read_csv("datasets/advertising4D.csv")
X = df.drop('sales', axis=1)
y = df['sales']
# Do a train-test-split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, shuffle=True)
# SCALE DATA
scaler = StandardScaler()
scaler.fit(X_train)
X_train = scaler.transform(X_train)
X_test = scaler.transform(X_test)
alpha_list = [1000, 100, 10, 5, 1, 0.8, 0.5, 0.2]
solver_list = ['lsqr', 'svd']
for a in alpha_list: 
     for s in solver_list: 
          model = Ridge(alpha=a, solver=s)# Instentiate model each time with different 
          model.fit(X_train, y_train) # Fit the model to training data
          r2 = model.score(X_test, y_test) # Evaluate by calculating R2 Score
          print(a, s,":",r2)

1000 lsqr : 0.2456569245047806
1000 svd : 0.2456569245047806
100 lsqr : 0.8004208874327983
100 svd : 0.8004208874327985
10 lsqr : 0.9350626771176971
10 svd : 0.9350626771176971
5 lsqr : 0.9362068233757561
5 svd : 0.9362068233757561
1 lsqr : 0.9356287521153018
1 svd : 0.9356287521153018
0.8 lsqr : 0.935559868216586
0.8 svd : 0.935559868216586
0.5 lsqr : 0.9354489002095089
0.5 svd : 0.9354489002095089
0.2 lsqr : 0.9353286478012036
0.2 svd : 0.9353286478012036


***Limitation:***
- Model evaluation is done on just 20% of the data.
- After finalizing the best combination of hyperparameters, we are not left with any unseen data on
- which we can do the final evaluation of the model with the best hyperparameters

#### (ii) Use Cross Validation

In [31]:
df = pd.read_csv("datasets/advertising4D.csv")
X = df.drop('sales', axis=1)
y = df['sales']
# Do a train-test-split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, shuffle=True)
# SCALE DATA
scaler = StandardScaler()
scaler.fit(X_train)
X_train = scaler.transform(X_train)
X_test = scaler.transform(X_test)
alpha_list = [1000, 100, 10, 5, 1, 0.8, 0.5, 0.2]
solver_list = ['lsqr', 'svd']
for a in alpha_list: 
    for s in solver_list: 
        cv_scores = cross_val_score(Ridge(alpha=a, solver=s ), X_train, y_train, scoring = 'r2', cv = 5)
        print(a, s,":",np.mean(cv_scores))

1000 lsqr : 0.18308426373164532
1000 svd : 0.18307934795177522
100 lsqr : 0.7177160715206126
100 svd : 0.7177160715206126
10 lsqr : 0.8770035136002179
10 svd : 0.8770035136002179
5 lsqr : 0.8799663254092133
5 svd : 0.8799663254092133
1 lsqr : 0.8807775554934073
1 svd : 0.8807775554934073
0.8 lsqr : 0.8807758504716677
0.8 svd : 0.8807758504716677
0.5 lsqr : 0.8807651636870044
0.5 svd : 0.8807651636870044
0.2 lsqr : 0.8807445895239574
0.2 svd : 0.8807445895239576


## 4. Find Optimized Hyperparameters using GridSearchCV()
- Grid search is a method for hyperparameter optimization that involves specifying a list of values for each hyperparameter that you want to optimize, and then training a model for each combination of these values.
- Basically, we divide the domain of the hyperparameters into a discrete grid.
- Then, we try every combination of values of this grid, calculating some performance metrics using crossvalidation.
- The point of the grid that maximizes the average value in cross-validation, is the optimal combination of values for the hyperparameters.
- Additionally, it is recommended to use cross-validation when performing hyperparameter optimization. This can provide a more accurate estimate of the model’s performance and help to avoid overfitting.

In [33]:
from sklearn.model_selection import GridSearchCV
df = pd.read_csv("datasets/advertising4D.csv")
X = df.drop('sales', axis=1)
y = df['sales']
# Do a train-test-split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, shuffle=True)
# SCALE DATA
scaler = StandardScaler()
scaler.fit(X_train)
X_train = scaler.transform(X_train)
X_test = scaler.transform(X_test)
#Dictionary with parameters names (`str`) as keys and lists of parameter settings to 
params = { 'alpha': [1000, 100, 10, 1, 0.5],
 'solver': ['lsqr', 'svd'] }
gs = GridSearchCV(estimator=Ridge(), 
 param_grid=params,
scoring='r2',
cv=5,
n_jobs=-1) 
gs.fit(X_train, y_train)
print("Best Score: ", gs.best_score_)
print("Best Score: ", gs.best_params_)


Best Score:  0.8810203980478049
Best Score:  {'alpha': 0.5, 'solver': 'lsqr'}


***Limitations of GridSearchCV:***
- Grid search is an exhaustive algorithm that spans all the combinations, so it can actually find the best point in the domain.
- For that it trains a separate model for every combination of hyperparameter values.
- Suppose you have million of data points in your dataset and a bundle of hyperparameters and their values to tune. In that case the grid will be multidimensional and the algorithm will become computationally expensive as well as time consuming.


## 5. Find Optimized Hyperparameters using RandomizedSearchCV()
- Random search is similar to grid search, but instead of using all the points in the grid, it tests only a randomly selected subset of these points.
- The smaller this subset, the faster but less accurate the optimization. The larger this dataset, the more accurate the optimization but the closer to a grid search.
- Random search is a very useful option when you have several hyperparameters with a finegrained grid of values.


In [34]:
from sklearn.model_selection import RandomizedSearchCV
df = pd.read_csv("datasets/advertising4D.csv")
X = df.drop('sales', axis=1)
y = df['sales']
# Do a train-test-split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, shuffle=True)
# SCALE DATA
scaler = StandardScaler()
scaler.fit(X_train)
X_train = scaler.transform(X_train)
X_test = scaler.transform(X_test)
#Dictionary with parameters names (`str`) as keys and lists of parameter settings to 
params = { 'alpha': [1000, 100, 10, 1, 0.5],
 'solver': ['lsqr', 'svd'] }
rs = RandomizedSearchCV(estimator=Ridge(), 
 param_distributions=params,
n_iter=6, #Number of parameter combinations to try.
 scoring='r2',
cv=5,
n_jobs=-1) 
rs.fit(X_train, y_train)
print("Best Score: ", rs.best_score_)
print("Best Score: ", rs.best_params_)


Best Score:  0.903727913298369
Best Score:  {'solver': 'svd', 'alpha': 1}
