# Design Pattern 15 - Hyperparameter tuning  (Chapter 4)

Hyperparameter tuning is the process of selecting the best value for elements of the model architecture and training loop that are not learned by that training loop, which is what we call the *hyperparameters*, in contrast with the *parameters* or *weights*. Its an outer training loop in a sense that selects , which are those hyperparameters are selected.

A key different is that the outputs of the inner loops are usually differentiable and so allow for techniques that move relatively smoothly towards minimising the cost functions (mnimising loss, maximiing an accuracy metric etc.), for example through something like gradient descent. The outer loop of hyperparameter tuning is not usually differentiable, necessitating a different approach that is often more costly.


## Hyperamater tuning approaches
* manual - manually select some hyperparameter combinations and run and evaluate independently.
* grid search - Select some values for each hyperparameter, and run trials of all combos.
* random - Select distributions for each hyperparameter to sample from, then select a number of ttrials to run, eac h of which will be a random sample.
* bayesian - Train a surrogate model with hyperparameters as input and metric value as your trarget. Use this to predict metric value to select a smaller number of trials with which to do a full training loop
* genetic algorithms - Treat candidate hyperamater cobinations as individuals in a puplation and use genetic alfgorithms to mix or persist or remove individuals acocording to *fitness* (performance against a specified metric).

## Types of hyperparameters

* model architecture - hyperparameters related to the architecture of the ML model to be trained
  * exmaples: number of layers, neruons per layer, decision tree depth, random forest number of estimators
* model training - hyperparameters 
  * exmaples NN learning rate, max iterations


### Trade-offs and alternatives
* managed hyperparameter tuning - use a manager to ensure systematic tuning with logs etc.
* genetic algorithms - consider trials to be a population, and breed your population to search for the "fittest" individuals.
* ensemble - the outcome might not be just the best combination, but an ensemble which are btter for different metrics (e.g. different places on your pareto frontier).


![dp15_ch4_hyperparameter_tuning_loops](dp15_ch4_hyperparameter_tuning_loops.jpg)

## Excercise - Hyperparameter tuning


In [1]:
import pathlib
import os
import functools
import math
import datetime

In [2]:
import pandas
import numpy

In [3]:
import matplotlib
import matplotlib.pyplot
%matplotlib inline

In [4]:
import sklearn
import sklearn.tree
import sklearn.preprocessing
import sklearn.ensemble

In [9]:
try:
    falklands_data_dir = pathlib.Path(os.environ['OPMET_ROTORS_DATA_ROOT'])
except KeyError:
    falklands_data_dir = pathlib.Path('/project/informatics_lab/data_science_cop/ML_challenges/2021_opmet_challenge') / 'rotors'
print(falklands_data_dir.is_dir())
falklands_data_dir

True


PosixPath('/Users/stephen.haddad/data/ml_challenges/Rotors')

In [10]:
falklands_data_fname = 'new_training.csv'
falklands_data_path = falklands_data_dir / falklands_data_fname
falklands_df = pandas.read_csv(falklands_data_path)

In [11]:
temp_feature_names = [f'air_temp_{i1}' for i1 in range(1,23)]
humidity_feature_names = [f'sh_{i1}' for i1 in range(1,23)]
wind_direction_feature_names = [f'winddir_{i1}' for i1 in range(1,23)]
wind_speed_feature_names = [f'windspd_{i1}' for i1 in range(1,23)]
target_feature_name = 'rotors_present'


In [12]:
falklands_df = falklands_df.rename({'Rotors 1 is true': target_feature_name},axis=1)
falklands_df.loc[falklands_df[falklands_df[target_feature_name].isna()].index, target_feature_name] = 0
falklands_df['DTG'] = pandas.to_datetime(falklands_df['DTG'])
falklands_df = falklands_df.drop_duplicates(subset=['DTG'])
falklands_df = falklands_df[~falklands_df['DTG'].isnull()]
falklands_df = falklands_df[(falklands_df['wind_speed_obs'] >= 0.0) &
                            (falklands_df['air_temp_obs'] >= 0.0) &
                            (falklands_df['wind_direction_obs'] >= 0.0) &
                            (falklands_df['dewpoint_obs'] >= 0.0) 
                           ]
falklands_df = falklands_df.drop_duplicates(subset='DTG')
falklands_df[target_feature_name]  = falklands_df[target_feature_name] .astype(bool)
falklands_df['time'] = pandas.to_datetime(falklands_df['DTG'])

In [13]:
def get_v_wind(wind_dir_name, wind_speed_name, row1):
    return math.cos(math.radians(row1[wind_dir_name])) * row1[wind_speed_name]

def get_u_wind(wind_dir_name, wind_speed_name, row1):
    return math.sin(math.radians(row1[wind_dir_name])) * row1[wind_speed_name]

In [14]:
%%time
u_feature_template = 'u_wind_{level_ix}'
v_feature_template = 'v_wind_{level_ix}'
u_wind_feature_names = []
v_wind_features_names = []
for wsn1, wdn1 in zip(wind_speed_feature_names, wind_direction_feature_names):
    level_ix = int( wsn1.split('_')[1])
    u_feature = u_feature_template.format(level_ix=level_ix)
    u_wind_feature_names += [u_feature]
    falklands_df[u_feature] = falklands_df.apply(functools.partial(get_u_wind, wdn1, wsn1), axis='columns')
    v_feature = v_feature_template.format(level_ix=level_ix)
    v_wind_features_names += [v_feature]
    falklands_df[v_feature] = falklands_df.apply(functools.partial(get_v_wind, wdn1, wsn1), axis='columns')

CPU times: user 14.6 s, sys: 1.31 s, total: 15.9 s
Wall time: 16 s


In [15]:
rotors_train_df = falklands_df[falklands_df['time'] < datetime.datetime(2020,1,1,0,0)]
rotors_test_df = falklands_df[falklands_df['time'] > datetime.datetime(2020,1,1,0,0)]

In [16]:
def preproc_input(data_subset, pp_dict):
    return numpy.concatenate([scaler1.transform(data_subset[[if1]]) for if1,scaler1 in pp_dict.items()],axis=1)

def preproc_target(data_subset, enc1):
     return enc1.transform(data_subset[[target_feature_name]])


In [17]:
input_feature_names = temp_feature_names + humidity_feature_names + u_wind_feature_names + v_wind_features_names
preproc_dict = {}
for if1 in input_feature_names:
    scaler1 = sklearn.preprocessing.StandardScaler()
    scaler1.fit(rotors_train_df[[if1]])
    preproc_dict[if1] = scaler1
    
target_encoder = sklearn.preprocessing.LabelEncoder()
target_encoder.fit(rotors_train_df[[target_feature_name]])

  return f(*args, **kwargs)


LabelEncoder()

In [18]:
X_train_rotors = preproc_input(rotors_train_df, preproc_dict)
y_train_rotors = preproc_target(rotors_train_df, target_encoder)

  return f(*args, **kwargs)


In [19]:
X_test_rotors = preproc_input(rotors_test_df, preproc_dict)
y_test_rotors = preproc_target(rotors_test_df, target_encoder)

  return f(*args, **kwargs)


In [20]:
sklearn.tree.DecisionTreeClassifier().get_params()

{'ccp_alpha': 0.0,
 'class_weight': None,
 'criterion': 'gini',
 'max_depth': None,
 'max_features': None,
 'max_leaf_nodes': None,
 'min_impurity_decrease': 0.0,
 'min_impurity_split': None,
 'min_samples_leaf': 1,
 'min_samples_split': 2,
 'min_weight_fraction_leaf': 0.0,
 'random_state': None,
 'splitter': 'best'}

In [21]:
clf_opts = {'max_depth':[5,10,15,20], 
            'min_samples_leaf': [1,2,5],
            'min_samples_split': [4,10,20],
            # 'ccp_alpha': [0.0, 0.01, 0.1],
           }


In [22]:
%%time
clf1 = sklearn.tree.DecisionTreeClassifier()
cv1 = sklearn.model_selection.KFold(n_splits=5, shuffle=True)
hpt_grid = sklearn.model_selection.GridSearchCV(estimator=clf1, 
                                                param_grid=clf_opts,
                                                cv=cv1,
                                               )
res1 = hpt_grid.fit(X_train_rotors, y_train_rotors)

CPU times: user 2min 19s, sys: 946 ms, total: 2min 20s
Wall time: 2min 24s


In [23]:
hpt_grid.best_estimator_

DecisionTreeClassifier(max_depth=5, min_samples_leaf=5, min_samples_split=20)

In [24]:
hpt_grid.best_estimator_

DecisionTreeClassifier(max_depth=5, min_samples_leaf=5, min_samples_split=20)

In [25]:
import scipy.stats

In [26]:
%%time
hpt_random = sklearn.model_selection.RandomizedSearchCV(estimator=sklearn.tree.DecisionTreeClassifier(),
                                                        param_distributions={
                                                            'max_depth': scipy.stats.randint(5,10), 
                                                            'min_samples_leaf': scipy.stats.randint(1,7),
                                                            'min_samples_split': scipy.stats.randint(4,20),
                                                        },
                                                        cv=sklearn.model_selection.KFold(n_splits=5, shuffle=True),
                                                        n_iter=20,
                                                     )
res1 = hpt_random.fit(X_train_rotors, y_train_rotors)  

CPU times: user 58.1 s, sys: 113 ms, total: 58.2 s
Wall time: 58.4 s


In [27]:
hpt_random.best_estimator_

DecisionTreeClassifier(max_depth=5, min_samples_leaf=4, min_samples_split=19)

In [28]:
hpt_random.best_estimator_.get_params()

{'ccp_alpha': 0.0,
 'class_weight': None,
 'criterion': 'gini',
 'max_depth': 5,
 'max_features': None,
 'max_leaf_nodes': None,
 'min_impurity_decrease': 0.0,
 'min_impurity_split': None,
 'min_samples_leaf': 4,
 'min_samples_split': 19,
 'min_weight_fraction_leaf': 0.0,
 'random_state': None,
 'splitter': 'best'}

In [29]:
sklearn.metrics.precision_recall_fscore_support(
    y_test_rotors, 
    hpt_random.best_estimator_.predict(X_test_rotors)
)

(array([0.95570055, 0.        ]),
 array([0.9996408, 0.       ]),
 array([0.97717697, 0.        ]),
 array([2784,  129]))

In [30]:
hpt_random_recall = sklearn.model_selection.RandomizedSearchCV(estimator=sklearn.tree.DecisionTreeClassifier(),
                                                        param_distributions={
                                                            'max_depth': scipy.stats.randint(5,10), 
                                                            'min_samples_leaf': scipy.stats.randint(1,7),
                                                            'min_samples_split': scipy.stats.randint(4,20),
                                                        },
                                                        cv=sklearn.model_selection.KFold(n_splits=5, shuffle=True),
                                                        n_iter=20,
                                                        scoring='recall',
                                                     )
res1 = hpt_random_recall.fit(X_train_rotors, y_train_rotors) 

In [31]:
hpt_random_recall.best_estimator_.get_params()

{'ccp_alpha': 0.0,
 'class_weight': None,
 'criterion': 'gini',
 'max_depth': 8,
 'max_features': None,
 'max_leaf_nodes': None,
 'min_impurity_decrease': 0.0,
 'min_impurity_split': None,
 'min_samples_leaf': 3,
 'min_samples_split': 15,
 'min_weight_fraction_leaf': 0.0,
 'random_state': None,
 'splitter': 'best'}

In [32]:
sklearn.metrics.precision_recall_fscore_support(
    y_test_rotors, 
    hpt_random_recall.best_estimator_.predict(X_test_rotors)
)

(array([0.95911296, 0.40740741]),
 array([0.99425287, 0.08527132]),
 array([0.97636684, 0.14102564]),
 array([2784,  129]))

In [33]:
hpt_random_ba = sklearn.model_selection.RandomizedSearchCV(estimator=sklearn.tree.DecisionTreeClassifier(),
                                                        param_distributions={
                                                            'max_depth': scipy.stats.randint(5,10), 
                                                            'min_samples_leaf': scipy.stats.randint(1,7),
                                                            'min_samples_split': scipy.stats.randint(4,20),
                                                        },
                                                        cv=sklearn.model_selection.KFold(n_splits=5, shuffle=True),
                                                        n_iter=20,
                                                        scoring='balanced_accuracy',
                                                     )
res1 = hpt_random_ba.fit(X_train_rotors, y_train_rotors) 

In [34]:
hpt_random_ba.best_estimator_.get_params()

{'ccp_alpha': 0.0,
 'class_weight': None,
 'criterion': 'gini',
 'max_depth': 9,
 'max_features': None,
 'max_leaf_nodes': None,
 'min_impurity_decrease': 0.0,
 'min_impurity_split': None,
 'min_samples_leaf': 6,
 'min_samples_split': 19,
 'min_weight_fraction_leaf': 0.0,
 'random_state': None,
 'splitter': 'best'}

### Further reading

Libraries and Platforms 
* Scikit HP - https://scikit-learn.org/stable/modules/grid_search.html 
* Keras Tuner https://www.tensorflow.org/tutorials/keras/keras_tuner
* Optuna https://optuna.org/ 
* Azure Ml Hyperdrive https://learn.microsoft.com/en-us/azure/machine-learning/how-to-tune-hyperparameters?view=azureml-api-2 
* AWS Sagemaker Hyperparameter tuning https://docs.aws.amazon.com/sagemaker/latest/dg/automatic-model-tuning-how-it-works.html

