# Hyperparameter Tuning in Deep Learning Using `hyperopt`

- It's the process of systematically searching for the best combination of hyperparameter values for a machine learning model
- The process tests out all the provided permutations and evaluates the outcome of the model. In the end, it selects a subset of the hyperparameters that give the best outcome.

> NOTE: Used Google Colab

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score, roc_auc_score, confusion_matrix, classification_report
from sklearn.datasets import load_breast_cancer
from sklearn.neural_network import MLPClassifier # Sk Learn's version of NN


- We'll be using `hyperopt` library to perform the hyperparameter tuning
- It defines a search space for the hyperparameter grid
- Then, it uses the TPE (Tree Parzen Estimators) algorithm to optimize the paramters
- Submodules:
  - `hp` has the functions for the hyperparameter search
  - `fmin` takes the objective functions, seach space, and optimization algorithm. Then, performs minimization of the objective function to get the optimal paramters
  - `tpe` implements a tree-structured estimator of the search space

In [2]:
from hyperopt import hp, fmin, tpe

## 1. Data Import and Prep

In [3]:
load_breast_cancer_data = load_breast_cancer()
load_breast_cancer_data.keys()

dict_keys(['data', 'target', 'frame', 'target_names', 'DESCR', 'feature_names', 'filename', 'data_module'])

In [4]:
print(load_breast_cancer_data['DESCR'])

.. _breast_cancer_dataset:

Breast cancer wisconsin (diagnostic) dataset
--------------------------------------------

**Data Set Characteristics:**

:Number of Instances: 569

:Number of Attributes: 30 numeric, predictive attributes and the class

:Attribute Information:
    - radius (mean of distances from center to points on the perimeter)
    - texture (standard deviation of gray-scale values)
    - perimeter
    - area
    - smoothness (local variation in radius lengths)
    - compactness (perimeter^2 / area - 1.0)
    - concavity (severity of concave portions of the contour)
    - concave points (number of concave portions of the contour)
    - symmetry
    - fractal dimension ("coastline approximation" - 1)

    The mean, standard error, and "worst" or largest (mean of the three
    worst/largest values) of these features were computed for each image,
    resulting in 30 features.  For instance, field 0 is Mean Radius, field
    10 is Radius SE, field 20 is Worst Radius.

    - 

In [5]:
load_breast_cancer_data['target_names']

array(['malignant', 'benign'], dtype='<U9')

In [6]:
#define X and y
X, y = load_breast_cancer_data['data'], load_breast_cancer_data['target']

In [7]:
#optional - build it as a dataframe
df = pd.DataFrame(X, columns=load_breast_cancer_data['feature_names'])
df['cancer'] = y
df.head()

Unnamed: 0,mean radius,mean texture,mean perimeter,mean area,mean smoothness,mean compactness,mean concavity,mean concave points,mean symmetry,mean fractal dimension,...,worst texture,worst perimeter,worst area,worst smoothness,worst compactness,worst concavity,worst concave points,worst symmetry,worst fractal dimension,cancer
0,17.99,10.38,122.8,1001.0,0.1184,0.2776,0.3001,0.1471,0.2419,0.07871,...,17.33,184.6,2019.0,0.1622,0.6656,0.7119,0.2654,0.4601,0.1189,0
1,20.57,17.77,132.9,1326.0,0.08474,0.07864,0.0869,0.07017,0.1812,0.05667,...,23.41,158.8,1956.0,0.1238,0.1866,0.2416,0.186,0.275,0.08902,0
2,19.69,21.25,130.0,1203.0,0.1096,0.1599,0.1974,0.1279,0.2069,0.05999,...,25.53,152.5,1709.0,0.1444,0.4245,0.4504,0.243,0.3613,0.08758,0
3,11.42,20.38,77.58,386.1,0.1425,0.2839,0.2414,0.1052,0.2597,0.09744,...,26.5,98.87,567.7,0.2098,0.8663,0.6869,0.2575,0.6638,0.173,0
4,20.29,14.34,135.1,1297.0,0.1003,0.1328,0.198,0.1043,0.1809,0.05883,...,16.67,152.2,1575.0,0.1374,0.205,0.4,0.1625,0.2364,0.07678,0


In [8]:
df['cancer'].value_counts(normalize=True)

Unnamed: 0_level_0,proportion
cancer,Unnamed: 1_level_1
1,0.627417
0,0.372583


In [9]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=42)

## 2. Initialize The Hyperparameter Space

Possible Hyperparameters:
```python
class sklearn.neural_network.MLPClassifier(hidden_layer_sizes=(100,), activation='relu', *, solver='adam', alpha=0.0001, batch_size='auto', learning_rate='constant', learning_rate_init=0.001, power_t=0.5, max_iter=200, shuffle=True, random_state=None, tol=0.0001, verbose=False, warm_start=False, momentum=0.9, nesterovs_momentum=True, early_stopping=False, validation_fraction=0.1, beta_1=0.9, beta_2=0.999, epsilon=1e-08, n_iter_no_change=10, max_fun=15000)
```



In [10]:
param_grid = {
    # adjust the number of neurons and hidden layers
    'hidden_layer_sizes': hp.choice('hidden_layer_sizes', [(10), (10, 20), (20,30,40)]), # e.g. last one is 3 hidden layers with 20, 30, 40 neurons respectivly
    # choose the activation function - here it will try every possible combination
    # e.g. for 3 alyers: relu, relu, relu ; tanh, relu, relu; tanh, tanh, relu
    'activation': hp.choice('activation', ['relu', 'tanh']),
    # choose the optmizer algorithm (solver)
    'solver': hp.choice('solver', ['adam', 'sgd']),
    # strength of l2 regularization
    'alpha': hp.choice('alpha', [0.0001, 0.05]),
    # adjust the learning rate method
    'learning_rate': hp.choice('learning_rate', ['constant', 'adaptive']),
    # adjust the learning rate value
    'learning_rate_init': hp.choice('learning_rate_init', [0.001, 0.01]),
    #'max_iter': hp.choice('max_iter', [200, 500]),
}

## 3. Define The Objective Function

In [11]:
# define the objective function to be minimized
def objective_func(input_params):
  '''
  This function takes a set of hyperparamters to train the MLPClassifier and returns the best accuracy score
  '''

  # create a dictionary to pull in the hyperparamters grid
  params_dict = {
      'hidden_layer_sizes': input_params['hidden_layer_sizes'],
      'activation': input_params['activation'],
      'solver': input_params['solver'],
      'alpha': input_params['alpha'],
      'learning_rate': input_params['learning_rate'],
      'learning_rate_init': input_params['learning_rate_init'],
      'random_state': 1000, # set a random seed for reproducibility
      'max_iter': 100 # set the num of iterations
  }

  # instantiate the MLP Classifier model with the given hyperparameters
  mlp_model = MLPClassifier(**params_dict)

  # perform 4-fold cross-validation
  model_score = cross_val_score(mlp_model, X_train, y_train, cv=4, scoring='accuracy').mean()

  # to maximize accuracy we calcualte the negative of the model score
  return -model_score

## 4. Perform Hyperparameter Tuning Using TPE

In [12]:
grid_search = fmin(
    fn=objective_func, # my defined obj function
    space=param_grid, #my custom hyperparamters
    algo=tpe.suggest, # optimization algorithm for hyperparamter tuning
    max_evals=50 #, rstate=np.random.default_rng(123)
)

  2%|▏         | 1/50 [00:01<01:21,  1.66s/trial, best loss: -0.6291218479985893]




  8%|▊         | 4/50 [00:03<00:34,  1.33trial/s, best loss: -0.9153808852054311]






 10%|█         | 5/50 [00:04<00:36,  1.24trial/s, best loss: -0.9153808852054311]





 14%|█▍        | 7/50 [00:06<00:39,  1.08trial/s, best loss: -0.9319123611356022]





 20%|██        | 10/50 [00:09<00:32,  1.22trial/s, best loss: -0.9319123611356022]





 22%|██▏       | 11/50 [00:11<00:42,  1.09s/trial, best loss: -0.9319123611356022]







 24%|██▍       | 12/50 [00:13<00:56,  1.48s/trial, best loss: -0.9319123611356022]





 28%|██▊       | 14/50 [00:16<00:47,  1.33s/trial, best loss: -0.9319123611356022]





 34%|███▍      | 17/50 [00:18<00:26,  1.24trial/s, best loss: -0.9319123611356022]





 36%|███▌      | 18/50 [00:19<00:24,  1.28trial/s, best loss: -0.9319123611356022]







 40%|████      | 20/50 [00:19<00:15,  1.92trial/s, best loss: -0.9319123611356022]






 42%|████▏     | 21/50 [00:19<00:14,  2.07trial/s, best loss: -0.9319123611356022]






 44%|████▍     | 22/50 [00:20<00:12,  2.21trial/s, best loss: -0.9319123611356022]






 46%|████▌     | 23/50 [00:20<00:11,  2.35trial/s, best loss: -0.9319123611356022]




 48%|████▊     | 24/50 [00:20<00:10,  2.56trial/s, best loss: -0.9319123611356022]




 50%|█████     | 25/50 [00:21<00:09,  2.74trial/s, best loss: -0.9319123611356022]




 52%|█████▏    | 26/50 [00:21<00:08,  2.93trial/s, best loss: -0.9319123611356022]




 54%|█████▍    | 27/50 [00:21<00:07,  2.97trial/s, best loss: -0.9319123611356022]




 56%|█████▌    | 28/50 [00:22<00:07,  3.10trial/s, best loss: -0.9319123611356022]




 58%|█████▊    | 29/50 [00:22<00:07,  2.90trial/s, best loss: -0.9319123611356022]




 60%|██████    | 30/50 [00:23<00:08,  2.36trial/s, best loss: -0.9319123611356022]




 62%|██████▏   | 31/50 [00:23<00:07,  2.44trial/s, best loss: -0.9319123611356022]




 64%|██████▍   | 32/50 [00:23<00:06,  2.62trial/s, best loss: -0.9319123611356022]




 68%|██████▊   | 34/50 [00:24<00:04,  3.44trial/s, best loss: -0.9319123611356022]






 70%|███████   | 35/50 [00:24<00:05,  2.97trial/s, best loss: -0.9319123611356022]





 74%|███████▍  | 37/50 [00:25<00:03,  3.26trial/s, best loss: -0.9319123611356022]




 76%|███████▌  | 38/50 [00:25<00:04,  2.89trial/s, best loss: -0.9319123611356022]







 80%|████████  | 40/50 [00:26<00:03,  2.63trial/s, best loss: -0.9319123611356022]




 82%|████████▏ | 41/50 [00:27<00:03,  2.37trial/s, best loss: -0.9319123611356022]






 84%|████████▍ | 42/50 [00:27<00:03,  2.15trial/s, best loss: -0.9319123611356022]





 90%|█████████ | 45/50 [00:28<00:01,  2.61trial/s, best loss: -0.9319123611356022]





 94%|█████████▍| 47/50 [00:29<00:01,  2.41trial/s, best loss: -0.9319123611356022]






 96%|█████████▌| 48/50 [00:29<00:00,  2.53trial/s, best loss: -0.9319123611356022]




100%|██████████| 50/50 [00:30<00:00,  1.63trial/s, best loss: -0.9319123611356022]





## 5. Get The Best Hyperparameters

In [14]:
grid_search

{'activation': 0,
 'alpha': 0,
 'hidden_layer_sizes': 0,
 'learning_rate': 0,
 'learning_rate_init': 1,
 'solver': 0}

- 'activation': 0, -> **relu**
- 'alpha': 0 -> **0.0001**
- 'hidden_layer_sizes': 0 -> **1 hidden layer with 10 neurons**
- 'learning_rate': 0 -> **'constant'**
- 'learning_rate_init': 1 -> **0.01**
- 'solver': 0 -> **'adam'**

## 6. Test The Best Hyperparameters

In [15]:
best_mlp_model = MLPClassifier(
    hidden_layer_sizes=10,
    activation='relu',
    solver='adam',
    alpha=0.0001,
    learning_rate='constant',
    learning_rate_init=0.01)

best_score = cross_val_score(best_mlp_model, X_train, y_train, cv=4, scoring='accuracy').mean()
print('Best Score with Best Hyperparameters',best_score)

Best Score with Best Hyperparameters 0.9530285663904072


