# Hyperparameter Tuning

Grid search and Random search are automated ways of tuning different hyperparameters. Both require a grid to sample from (which hyperparameter-value combinations), a cross-validation scheme, and a scoring function.

## Grid Search
Grid Search exhaustively tries all combinations within the sample space. No sampling methodology is necessary.

Grid Search is computational expensive, but guaranteed to find the best score in the sample space.

## Setup

In [49]:
import numpy as np
import pandas as pd
from itertools import product
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.ensemble import GradientBoostingClassifier, RandomForestClassifier
from sklearn.metrics import accuracy_score

In [27]:
df = pd.read_csv('diabetes.csv')
df.head()

Unnamed: 0,age,bmi,systolic_bp,diastolic_bp,cholesterol_level,glucose_level,activity_level,family_history,smoking_status,diet_score,diabetes_risk
0,56,33.4,159,79,205,151,358,1,0,10,1
1,69,27.5,135,104,245,146,219,0,0,9,0
2,46,43.0,132,66,224,145,105,0,0,10,0
3,32,41.0,110,73,292,107,374,0,1,6,0
4,60,16.4,112,68,181,140,69,1,1,7,0


## Train, test, split

In [28]:
X = df.drop('diabetes_risk', axis=1)
y = df['diabetes_risk']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=1)

## Manual Grid Search

In [29]:
def gbm_grid_search(learn_rate, max_depth):
    model = GradientBoostingClassifier(
        learning_rate=learn_rate,
        max_depth=max_depth,
    )
    predictions = model.fit(X_train, y_train).predict(X_test)
    return [learn_rate, max_depth, accuracy_score(y_test, predictions)]

In [30]:
learn_rates = [0.001, 0.01, 0.05]
max_depths = [4,6,8,10]
results_list = []

for learn_rate in learn_rates:
    for max_depth in max_depths:
        results_list.append(gbm_grid_search(learn_rate, max_depth))

results_df = pd.DataFrame(results_list, columns=['learn_rate', 'max_depth', 'accuracy'])

results_df

Unnamed: 0,learn_rate,max_depth,accuracy
0,0.001,4,0.755
1,0.001,6,0.79
2,0.001,8,0.86
3,0.001,10,0.85
4,0.01,4,0.92
5,0.01,6,0.86
6,0.01,8,0.825
7,0.01,10,0.85
8,0.05,4,0.965
9,0.05,6,0.92


## Grid Search with Scikit Learn

Three steps.
1. Select an algorithm to tune the hyperparameters (sometimes referred to as an estimator)
2. Define hyperparameters to tune
3. Define range of values for hyperparameters
4. Set a cross-validation scheme and scoring function

## Create a GridSearchCV object

In [40]:
# create the grid
param_grid= {'max_depth': [2, 4, 6, 8], 'min_samples_leaf': [1, 2, 4, 6]}

# create a base classifier
rf_class = RandomForestClassifier(criterion='entropy', max_features=None)

grid_rf_class = GridSearchCV(
    estimator=rf_class,
    param_grid=param_grid,
    scoring='accuracy',
    n_jobs=4,
    cv = 10,
    refit=True, # enables direct use of the object as an estimator
    return_train_score=True
)

# fit the estimator to the data
grid_rf_class.fit(X_train, y_train)

# make predictions
predictions = grid_rf_class.predict(X_test)

# calculate accuracy
print(accuracy_score(y_test, predictions))

0.925


## Analyze GridSearchCV output

GridSearchCV properties

Results log:
- cv_results_

Best results:
- best_index_, best_params_, best_score_

Extra information:
- scorer_, n_splits_, refit_time_

### cv_results_

In [47]:
results = grid_rf_class.cv_results_ # A dictionary of results for each model created
results_df = pd.DataFrame(results)

results_df.head()

Unnamed: 0,mean_fit_time,std_fit_time,mean_score_time,std_score_time,param_max_depth,param_min_samples_leaf,params,split0_test_score,split1_test_score,split2_test_score,...,split2_train_score,split3_train_score,split4_train_score,split5_train_score,split6_train_score,split7_train_score,split8_train_score,split9_train_score,mean_train_score,std_train_score
0,0.094573,0.004188,0.003008,0.000126,2,1,"{'max_depth': 2, 'min_samples_leaf': 1}",0.6625,0.825,0.8125,...,0.809722,0.825,0.830556,0.834722,0.847222,0.8375,0.830556,0.825,0.826528,0.01522
1,0.091536,0.000654,0.003055,8.2e-05,2,2,"{'max_depth': 2, 'min_samples_leaf': 2}",0.675,0.8,0.8125,...,0.823611,0.823611,0.818056,0.820833,0.848611,0.836111,0.858333,0.818056,0.827361,0.014998
2,0.090912,0.00077,0.00296,0.000126,2,4,"{'max_depth': 2, 'min_samples_leaf': 4}",0.6625,0.7875,0.7875,...,0.797222,0.831944,0.818056,0.848611,0.85,0.838889,0.833333,0.848611,0.826528,0.022571
3,0.090571,0.000717,0.002986,8.6e-05,2,6,"{'max_depth': 2, 'min_samples_leaf': 6}",0.6625,0.8,0.75,...,0.813889,0.830556,0.815278,0.823611,0.843056,0.838889,0.838889,0.841667,0.824028,0.017569
4,0.14091,0.000583,0.003125,6.9e-05,4,1,"{'max_depth': 4, 'min_samples_leaf': 1}",0.85,0.825,0.875,...,0.931944,0.934722,0.929167,0.938889,0.940278,0.940278,0.919444,0.919444,0.932917,0.007634


## Random Search
Random Search randomly selects a subset of combinations within the provided sample space. Uniform is the default sampling methodology, though others can be selected.

Random Search is less computationally expensive, but likely to find a *good* one *faster*.

### Manual Random Search

In [52]:
# set hyperparameter limits
learn_rates_list = np.linspace(0.001, 2, 150)
min_samples_leaf_list = list(range(1,51))

# create a list of combinations
combinations_list = [list(x) for x in product(learn_rates, min_samples_leaf_list)]

# randomly select 100 models
random_combinations_index = np.random.choice(
    range(0,len(combinations_list)), 100,
    replace=False)

selected_combinations = [combinations_list[x] for x in random_combinations_index]
print(selected_combinations)

[[0.001, 25], [0.05, 49], [0.01, 19], [0.001, 11], [0.05, 35], [0.05, 39], [0.01, 44], [0.01, 27], [0.05, 26], [0.01, 30], [0.001, 50], [0.01, 37], [0.01, 42], [0.001, 27], [0.05, 13], [0.01, 14], [0.01, 23], [0.01, 5], [0.001, 35], [0.05, 38], [0.001, 48], [0.01, 47], [0.05, 29], [0.01, 43], [0.001, 14], [0.01, 8], [0.05, 25], [0.001, 16], [0.01, 16], [0.01, 25], [0.001, 37], [0.01, 10], [0.001, 42], [0.05, 23], [0.05, 48], [0.05, 9], [0.01, 36], [0.01, 40], [0.01, 29], [0.01, 6], [0.001, 13], [0.001, 15], [0.05, 28], [0.001, 40], [0.001, 8], [0.05, 46], [0.01, 35], [0.01, 41], [0.01, 22], [0.05, 47], [0.05, 30], [0.05, 31], [0.001, 19], [0.001, 18], [0.001, 17], [0.05, 18], [0.01, 24], [0.05, 22], [0.05, 32], [0.01, 31], [0.01, 45], [0.01, 15], [0.001, 33], [0.05, 50], [0.05, 36], [0.05, 12], [0.001, 46], [0.001, 21], [0.05, 16], [0.05, 11], [0.001, 22], [0.001, 7], [0.05, 1], [0.01, 33], [0.001, 36], [0.05, 24], [0.01, 18], [0.001, 31], [0.05, 42], [0.01, 3], [0.01, 17], [0.001, 10]

### Random Search with Scikit Learn

## Considerations
Random search may be more valuable than Grid Search if:
- There is a large quantity of data
- There are a large number of hyperparameter-value combinations
- There is limited computational resources