<center><a target="_blank" href="http://www.propulsion.academy"><img src="https://drive.google.com/uc?id=1MleNI0rcICpvrGd7SdYuQz7dn8NlAlEc" width="200" style="background:none; border:none; box-shadow:none;" /></a> </center>

_____

<center> <h1> Hyperparameter Tuning Methodologies (Live coding) </h1> </center>

<p style="margin-bottom:1cm;"></p>

_____

<center>SIT Academy, 2022</center>



# 1. Introduction <a id="1"></a> <br>

**Hyperparameter tuning** is choosing a set of optimal hyperparameters for a learning algorithm.

**What is a hyperparameter?

**A hyperparameter is a parameter whose value is set before the learning process begins.**

Some examples of hyperparameters include penalty in logistic regression and loss in stochastic gradient descent.

In sklearn, hyperparameters are passed in as arguments to the constructor of the model classes.

Hyper-parameters are parameters that are not directly learnt within estimators. In scikit-learn they are passed as arguments to the constructor of the estimator classes. Typical examples include C, kernel and gamma for Support Vector Classifier, alpha for Lasso, etc.

It is possible and recommended to search the hyper-parameter space for the best Cross-validation i.e evaluating estimator performance score.

Any parameter provided when constructing an estimator may be optimized in this manner. Specifically, to find the names and current values for all parameters for a given estimator, we can use the following method

estimator.get_params()

A search consists of:

* an estimator (regressor or classifier such as sklearn.svm.SVC());
* a parameter space;
* a method for searching or sampling candidates;
* a cross-validation scheme;
* a score function.

Some models allow for specialized, efficient parameter search strategies, outlined below.

Two generic approaches to sampling search candidates are provided in scikit-learn:
![](https://developer.qualcomm.com/sites/default/files/attachments/learning_resources_03-05.png)
**GridSearchCV** :For given values, GridSearchCV exhaustively considers all parameter combinations. The grid search provided by GridSearchCV exhaustively generates candidates from a grid of parameter values specified with the param_grid parameter.
For instance, the following param_grid specifies that it has one grid to be explored that is a linear kernel with alpha values in [0.0002,0.0003,0.0004,0.0005,0.0006,0.0007,0.0009], and 'max_iter' i.e maximum 10000 iterations.

param_grid = {'alpha':[0.01,0.001,0.0001,0.0002,0.0003,0.0004,0.0005,0.0006,0.0007,0.0009],'max_iter':[10000]}

**RandomizedSearchCV**: It can sample a given number of candidates from a parameter space with a specified distribution.
After describing these tools we detail best practice applicable to both approaches.

Note that it is common that a small subset of those parameters can have a large impact on the predictive or computation performance of the model while others can be left to their default values. It is recommend to read the docstring of the estimator class to get a finer understanding of their expected behavior.

Now lets jump into practice.

To perform Hyperparameters Optimization in Python, we will use Credit Card Fraud Detection Dataset. 

The dataset can be downloaded from [here](https://drive.google.com/file/d/1sEfpj80zNodR8GPQCT_qa8XaQkCbA8RH/view?usp=sharing)

In [None]:
import pandas as pd
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split


The dataset can be downloaded from [here](https://drive.google.com/file/d/1sEfpj80zNodR8GPQCT_qa8XaQkCbA8RH/view?usp=sharing), add it to your google drive in a folder `MyDrive/Machine Learning/data`.

In [None]:
# from google.colab import drive
# drive.mount('/gdrive')

Mounted at /gdrive


In [None]:
# data_path="/gdrive/MyDrive/Machine Learning/data"
# df = pd.read_csv(f'{data_path}/creditcard.csv',na_values = '#NAME?')

For big files we could load the data using below command with replacing `FILEID` and `FILENAME`.<br>
`wget --load-cookies /tmp/cookies.txt "https://docs.google.com/uc?export=download&confirm=$(wget --quiet --save-cookies /tmp/cookies.txt --keep-session-cookies --no-check-certificate 'https://docs.google.com/uc?export=download&id=FILEID' -O- | sed -rn 's/.*confirm=([0-9A-Za-z_]+).*/\1\n/p')&id=FILEID" -O FILENAME && rm -rf /tmp/cookies.txt`


In [None]:
link = 'https://drive.google.com/file/d/1sEfpj80zNodR8GPQCT_qa8XaQkCbA8RH/view?usp=sharing' #public access link for data
file_id = link.split("/")[-2]
file_id

'1sEfpj80zNodR8GPQCT_qa8XaQkCbA8RH'

In [None]:
!wget --load-cookies /tmp/cookies.txt "https://docs.google.com/uc?export=download&confirm=$(wget --quiet --save-cookies /tmp/cookies.txt --keep-session-cookies --no-check-certificate 'https://docs.google.com/uc?export=download&id=1sEfpj80zNodR8GPQCT_qa8XaQkCbA8RH' -O- | sed -rn 's/.*confirm=([0-9A-Za-z_]+).*/\1\n/p')&id=1sEfpj80zNodR8GPQCT_qa8XaQkCbA8RH" -O 'creditcard.csv' && rm -rf /tmp/cookies.txt

--2021-09-17 07:47:01--  https://docs.google.com/uc?export=download&confirm=jFZF&id=1sEfpj80zNodR8GPQCT_qa8XaQkCbA8RH
Resolving docs.google.com (docs.google.com)... 74.125.202.102, 74.125.202.138, 74.125.202.100, ...
Connecting to docs.google.com (docs.google.com)|74.125.202.102|:443... connected.
HTTP request sent, awaiting response... 302 Moved Temporarily
Location: https://doc-14-54-docs.googleusercontent.com/docs/securesc/v6d8kfv2b1hcs8hvi4qp5qsd96qtk5ef/h0tr5ug9lu8b0rhd7n2k3ms79ku5c57d/1631864775000/15699573888146190962/16160307378354573665Z/1sEfpj80zNodR8GPQCT_qa8XaQkCbA8RH?e=download [following]
--2021-09-17 07:47:01--  https://doc-14-54-docs.googleusercontent.com/docs/securesc/v6d8kfv2b1hcs8hvi4qp5qsd96qtk5ef/h0tr5ug9lu8b0rhd7n2k3ms79ku5c57d/1631864775000/15699573888146190962/16160307378354573665Z/1sEfpj80zNodR8GPQCT_qa8XaQkCbA8RH?e=download
Resolving doc-14-54-docs.googleusercontent.com (doc-14-54-docs.googleusercontent.com)... 142.251.6.132, 2607:f8b0:4001:c5a::84
Connecting 

In [None]:
df = pd.read_csv('creditcard.csv',na_values = '#NAME?')
df.shape

(284807, 31)

In [None]:
df_0 = df[df.Class == 0].sample(n=8000, random_state=999) # select randomly 8000 non fraudulent transaction rows
df_1 = df[df.Class == 1].sample(n=80, random_state=999) # select randomly 80 non fraudulent transaction rows
df = pd.concat([df_0, df_1], ignore_index=True)

In [None]:
X = df.drop(columns='Class')
#X.columns = ['transaction_rate', 'amount', 'days_since_last_transaction', 'hour']
Y = df['Class']

In [None]:
X.head()

Unnamed: 0,Time,V1,V2,V3,V4,V5,V6,V7,V8,V9,V10,V11,V12,V13,V14,V15,V16,V17,V18,V19,V20,V21,V22,V23,V24,V25,V26,V27,V28,Amount
0,85563.0,-2.101793,-1.319715,1.666445,-2.277249,-0.756431,-0.463003,-1.247058,0.681235,-2.461398,0.87461,1.089353,-0.181977,0.582483,-0.162549,-0.148541,0.043328,0.337144,0.484291,-1.028459,-0.138556,0.114293,0.414879,-0.070529,0.008534,0.373403,-0.170413,0.075017,0.027026,52.0
1,41688.0,0.586683,-0.922169,0.533519,1.402979,-0.580172,0.838403,-0.128986,0.227051,0.513178,-0.249108,0.760067,1.536614,0.471439,-0.214581,-1.018911,-0.391637,-0.07387,-0.147979,-0.017214,0.418535,0.183667,0.252728,-0.380845,-0.23964,0.53816,-0.274996,0.023832,0.058588,277.22
2,44881.0,-0.718815,1.15003,0.987635,0.553702,0.080471,0.728115,0.688676,-0.016909,-0.338821,0.466067,0.151933,0.128944,0.345608,0.023787,0.652402,0.4523,-1.084521,1.264135,0.933067,-0.030013,0.23313,0.883804,-0.351194,-0.801294,-0.254959,-0.263018,-0.608508,-0.397528,77.3
3,119711.0,1.707167,-1.372546,-2.076366,-1.731858,1.453493,3.507479,-0.962745,0.91578,1.30493,-0.461834,0.196558,0.549854,0.073093,0.220704,1.5942,0.235073,-0.581078,-0.400018,0.0838,0.256385,-0.074749,-0.600467,0.301127,0.722943,-0.666974,0.336647,-0.051282,-0.022926,175.0
4,33924.0,-0.884826,0.745981,1.967062,-0.356323,0.139664,0.909589,0.135413,0.195846,0.630394,0.254632,-0.256223,0.18205,-0.100807,-0.693385,-0.51989,0.491002,-0.969612,0.830814,0.903568,0.336394,-0.163673,-0.041402,-0.368921,-0.933949,0.081282,0.339777,0.135426,0.000846,11.5


In [None]:
Y.head()

0    0
1    0
2    0
3    0
4    0
Name: Class, dtype: int64

In [None]:
X_Train, X_Test, Y_Train, Y_Test = train_test_split(X, Y, test_size=0.3, random_state=42)

In [None]:
X_Train.shape

(5656, 30)

In [None]:
Y_Train.value_counts()

0    5597
1      59
Name: Class, dtype: int64

# 2. Manual Search <a id="2"></a> <br>
We will use a Random Forest Classifier as our model to optimize.Random Forest models are formed by a large number of uncorrelated decision trees, which joint together constitute an ensemble. In Random Forest, each decision tree makes its own prediction and the overall model output is selected to be the prediction which appeared most frequently.

We can now start by calculating our base model accuracy.

In [None]:
from sklearn.metrics import classification_report,confusion_matrix
from sklearn.ensemble import RandomForestClassifier

model = RandomForestClassifier(random_state=42).fit(X_Train, Y_Train)
predictionforest = model.predict(X_Test)

print(confusion_matrix(Y_Test, predictionforest))
print(classification_report(Y_Test, predictionforest))

[[2401    2]
 [   2   19]]
              precision    recall  f1-score   support

           0       1.00      1.00      1.00      2403
           1       0.90      0.90      0.90        21

    accuracy                           1.00      2424
   macro avg       0.95      0.95      0.95      2424
weighted avg       1.00      1.00      1.00      2424



In [None]:
model

RandomForestClassifier(bootstrap=True, ccp_alpha=0.0, class_weight=None,
                       criterion='gini', max_depth=None, max_features='auto',
                       max_leaf_nodes=None, max_samples=None,
                       min_impurity_decrease=0.0, min_impurity_split=None,
                       min_samples_leaf=1, min_samples_split=2,
                       min_weight_fraction_leaf=0.0, n_estimators=100,
                       n_jobs=None, oob_score=False, random_state=42, verbose=0,
                       warm_start=False)

When using Manual Search, we choose some model hyperparameters based on our judgment/experience. We then train the model, evaluate its accuracy and start the process again. This loop is repeated until a satisfactory accuracy is scored.

The main parameters used by a Random Forest Classifier are:

* criterion = the function used to evaluate the quality of a split.
* max_depth = maximum number of levels allowed in each tree.
* max_features = maximum number of features considered when splitting a node.
* min_samples_leaf = minimum number of samples which can be stored in a tree leaf.
* min_samples_split = minimum number of samples necessary in a node to cause node splitting.
* n_estimators = number of trees in the ensemble.

In [None]:
model = RandomForestClassifier(n_estimators=50, random_state=42).fit(X_Train,Y_Train)
predictionforest = model.predict(X_Test)
print(confusion_matrix(Y_Test,predictionforest))
print(classification_report(Y_Test,predictionforest))

[[2401    2]
 [   2   19]]
              precision    recall  f1-score   support

           0       1.00      1.00      1.00      2403
           1       0.90      0.90      0.90        21

    accuracy                           1.00      2424
   macro avg       0.95      0.95      0.95      2424
weighted avg       1.00      1.00      1.00      2424



# 3. Random Search <a id="3"></a> <br>

In Random Search, we create a grid of hyperparameters and train/test our model on just some random combination of these hyperparameters. In this example, we additionally perform Cross-Validation on the training set.

When performing Machine Learning tasks, we generally divide our dataset in training and test sets. This is done so that to test our model after having trained it (in this way we can check it’s performances when working with unseen data). When using Cross-Validation, we divide our training set into N other partitions to make sure our model is not overfitting our data.

One of the most common used Cross-Validation methods is K-Fold Validation. In K-Fold, we divide our training set into N partitions and then iteratively train our model using N-1 partitions and test it with the left-over partition (at each iteration we change the left-over partition). Once having trained N times the model we then average the training results obtained in each iteration to obtain our overall training performance results.

Using Cross-Validation when implementing Hyperparameters optimization can be really important. In this way, we might avoid using some Hyperparameters which works really good on the training data but not so good with the test data.
We can now start implementing Random Search by first defying a grid of hyperparameters which will be randomly sampled when calling RandomizedSearchCV().

In [None]:
import numpy as np 
from sklearn.model_selection import RandomizedSearchCV
from sklearn.model_selection import cross_val_score

random_search = {'criterion': ['entropy', 'gini'],
                 'max_depth': [10, 20, None],
                 'max_features': ['sqrt', 'log2'],
                 'n_estimators': [50, 100, 200]}

               # 2 x 3 x 2 x 3 = 36 => randomly selects 10 combinations (models) from 36 possible model hyperparameter configs
               # 1. RandomForestClassifier(criterion='gini', max_depth=10, max_features='sqrt', n_estimators=50)
               # 2. RandomForestClassifier(criterion='entropy', max_depth=20, max_features='log2', n_estimators=200)
               # ... total of n_iter models


clf = RandomForestClassifier(random_state=42)

random_search_obj = RandomizedSearchCV(estimator=clf, 
                           param_distributions=random_search, 
                           n_iter=10, # total number of models it will try out by random selections
                           scoring='f1',
                           cv=3, verbose=1, random_state=42, n_jobs=-1)

random_search_obj.fit(X_Train, Y_Train)

Fitting 3 folds for each of 10 candidates, totalling 30 fits


[Parallel(n_jobs=-1)]: Using backend LokyBackend with 2 concurrent workers.
[Parallel(n_jobs=-1)]: Done  30 out of  30 | elapsed:   30.4s finished


RandomizedSearchCV(cv=3, error_score=nan,
                   estimator=RandomForestClassifier(bootstrap=True,
                                                    ccp_alpha=0.0,
                                                    class_weight=None,
                                                    criterion='gini',
                                                    max_depth=None,
                                                    max_features='auto',
                                                    max_leaf_nodes=None,
                                                    max_samples=None,
                                                    min_impurity_decrease=0.0,
                                                    min_impurity_split=None,
                                                    min_samples_leaf=1,
                                                    min_samples_split=2,
                                                    min_weight_fraction_leaf=0.0,
               

Once trained our model, we can then visualize how changing some of its Hyperparameters can affect the overall model accuracy. In this case, we observe how changing the number of estimators and the criterion can affect our Random Forest accuracy.

We can now evaluate how our model performed using Random Search. In this case, using Random Search leads to a consistent increase in accuracy compared to our base model.

In [None]:
# best model
best_model = random_search_obj.best_estimator_
best_model

RandomForestClassifier(bootstrap=True, ccp_alpha=0.0, class_weight=None,
                       criterion='gini', max_depth=20, max_features='sqrt',
                       max_leaf_nodes=None, max_samples=None,
                       min_impurity_decrease=0.0, min_impurity_split=None,
                       min_samples_leaf=1, min_samples_split=2,
                       min_weight_fraction_leaf=0.0, n_estimators=200,
                       n_jobs=None, oob_score=False, random_state=42, verbose=0,
                       warm_start=False)

In [None]:
# best params
random_search_obj.best_params_

{'criterion': 'gini',
 'max_depth': 20,
 'max_features': 'sqrt',
 'n_estimators': 200}

In [None]:
# detailed history of all models tried and their performance in cross validation
pd.set_option('display.max_colwidth', None)

cv_result_df = pd.DataFrame({
    'Model Rank': random_search_obj.cv_results_['rank_test_score'],
    'Model Hyperparams': random_search_obj.cv_results_['params'],
    'Avg CV F1-Score': random_search_obj.cv_results_['mean_test_score'],
    'Std Dev CV F1-Score': random_search_obj.cv_results_['std_test_score'],
    'CV Fold 1 F1-Score': random_search_obj.cv_results_['split0_test_score'],
    'CV Fold 2 F1-Score': random_search_obj.cv_results_['split1_test_score'],
    'CV Fold 3 F1-Score': random_search_obj.cv_results_['split2_test_score']
})


cv_result_df.sort_values(by=['Model Rank'], ascending=True)

Unnamed: 0,Model Rank,Model Hyperparams,Avg CV F1-Score,Std Dev CV F1-Score,CV Fold 1 F1-Score,CV Fold 2 F1-Score,CV Fold 3 F1-Score
2,1,"{'n_estimators': 200, 'max_features': 'sqrt', 'max_depth': 20, 'criterion': 'gini'}",0.851852,0.094426,0.888889,0.944444,0.722222
3,2,"{'n_estimators': 50, 'max_features': 'sqrt', 'max_depth': None, 'criterion': 'gini'}",0.841799,0.085187,0.888889,0.914286,0.722222
5,2,"{'n_estimators': 100, 'max_features': 'sqrt', 'max_depth': None, 'criterion': 'gini'}",0.841799,0.085187,0.888889,0.914286,0.722222
0,4,"{'n_estimators': 200, 'max_features': 'log2', 'max_depth': None, 'criterion': 'gini'}",0.82963,0.10229,0.888889,0.914286,0.685714
6,5,"{'n_estimators': 50, 'max_features': 'log2', 'max_depth': 10, 'criterion': 'gini'}",0.816744,0.120433,0.888889,0.914286,0.647059
1,6,"{'n_estimators': 100, 'max_features': 'sqrt', 'max_depth': None, 'criterion': 'entropy'}",0.815126,0.07793,0.857143,0.882353,0.705882
8,6,"{'n_estimators': 200, 'max_features': 'sqrt', 'max_depth': 20, 'criterion': 'entropy'}",0.815126,0.07793,0.857143,0.882353,0.705882
7,8,"{'n_estimators': 50, 'max_features': 'sqrt', 'max_depth': None, 'criterion': 'entropy'}",0.814419,0.078499,0.888889,0.848485,0.705882
9,9,"{'n_estimators': 200, 'max_features': 'log2', 'max_depth': None, 'criterion': 'entropy'}",0.810967,0.059286,0.857143,0.848485,0.727273
4,10,"{'n_estimators': 100, 'max_features': 'log2', 'max_depth': None, 'criterion': 'entropy'}",0.798972,0.053875,0.857143,0.8125,0.727273


In [None]:
predictionforest = best_model.predict(X_Test)
print(confusion_matrix(Y_Test,predictionforest))
print(classification_report(Y_Test,predictionforest))

[[2401    2]
 [   2   19]]
              precision    recall  f1-score   support

           0       1.00      1.00      1.00      2403
           1       0.90      0.90      0.90        21

    accuracy                           1.00      2424
   macro avg       0.95      0.95      0.95      2424
weighted avg       1.00      1.00      1.00      2424



# 4. Grid Search <a id="4"></a> <br>
In Grid Search, we set up a grid of hyperparameters and train/test our model on each of the possible combinations.
In order to choose the parameters to use in Grid Search, we can now look at which parameters worked best with Random Search and form a grid based on them to see if we can find a better combination.

Grid Search can be implemented in Python using scikit-learn GridSearchCV() function.

In [None]:
from sklearn.model_selection import GridSearchCV

grid_search = {'criterion': ['entropy', 'gini'],
                 'max_depth': [10, 20, None],
                 'max_features': ['sqrt', 'log2'],
                 'n_estimators': [50, 100, 200]}
               # 2 x 3 x 2 x 3 = 36 => Total of 36 models from 36 possible model hyperparameter configs
               # 1. RandomForestClassifier(criterion='entropy', max_depth=10, max_features='sqrt', n_estimators=50)
               # 2. RandomForestClassifier(criterion='entropy', max_depth=20, max_features='sqrt', n_estimators=50)
               # 3. RandomForestClassifier(criterion='entropy', max_depth=None, max_features='sqrt', n_estimators=50)
               # 4. RandomForestClassifier(criterion='entropy', max_depth=10, max_features='log2', n_estimators=50)
               # ... total of 36 models

clf = RandomForestClassifier(random_state=42)

grid_search_obj = GridSearchCV(estimator=clf, 
                               param_grid=grid_search,
                               scoring='f1', cv=3, verbose=5, n_jobs=-1)

grid_search_obj.fit(X_Train,Y_Train)

predictionforest = grid_search_obj.best_estimator_.predict(X_Test)
print(confusion_matrix(Y_Test,predictionforest))
print(classification_report(Y_Test,predictionforest))

Fitting 3 folds for each of 36 candidates, totalling 108 fits


[Parallel(n_jobs=-1)]: Using backend LokyBackend with 2 concurrent workers.
[Parallel(n_jobs=-1)]: Done  14 tasks      | elapsed:   13.2s
[Parallel(n_jobs=-1)]: Done  68 tasks      | elapsed:  1.0min
[Parallel(n_jobs=-1)]: Done 108 out of 108 | elapsed:  1.6min finished


[[2401    2]
 [   2   19]]
              precision    recall  f1-score   support

           0       1.00      1.00      1.00      2403
           1       0.90      0.90      0.90        21

    accuracy                           1.00      2424
   macro avg       0.95      0.95      0.95      2424
weighted avg       1.00      1.00      1.00      2424



In [None]:
grid_search_obj.best_params_

{'criterion': 'gini',
 'max_depth': 20,
 'max_features': 'sqrt',
 'n_estimators': 200}

Grid Search is slower compared to Random Search but it can be overall more effective because it can go through the whole search space. Instead, Random Search can be faster fast but might miss some important points in the search space.
# 5. Automated Hyperparameter Tuning <a id="5"></a> <br>

![](https://better.future-processing.com/directus/storage/uploads/2399317284eda5016daac68812d5d3c3.png)

As we have seen above tuning machine learning hyperparameters is indeed a tedious but crucial task, as the performance of an algorithm can be highly dependent on the choice of hyperparameters. Manual tuning takes time away from important steps of the machine learning pipeline like feature engineering and interpreting results. Grid and random search are hands-off, but require long run times because they waste time evaluating unpromising areas of the search space. Increasingly, hyperparameter tuning is done by automated methods that aim to find optimal hyperparameters in less time using an informed search with no manual effort necessary beyond the initial set-up.

When using Automated Hyperparameter Tuning, the model hyperparameters to use are identified using techniques such as: Bayesian Optimization, Gradient Descent and Evolutionary Algorithms.

## Bayesian Optimization using HyperOpt <a id="51"></a> <br>

![](https://i.imgur.com/BWbgCSx.jpg)
Bayesian optimization, a model-based method for finding the minimum of a function,while the final aim is to find the input value to a function which can give us the lowest possible output value has resulted in achieving better performance while requiring fewer iterations than random search.  Bayesian Optimization can, therefore, lead to better performance in the testing phase and reduced optimization time.

Bayesian Optimization can be performed in Python using the Hyperopt library.  

![](https://camo.githubusercontent.com/b92ead141ef3726da38eef053864aa1173012789/68747470733a2f2f692e706f7374696d672e63632f54506d66665772702f68797065726f70742d6e65772e706e67)

In Hyperopt, Bayesian Optimization can be implemented giving 3 three main parameters to the function fmin().

* **Objective Function** = defines the loss function to minimize.
* **Domain Space** = defines the range of input values to test (in Bayesian Optimization this space creates a probability distribution for each of the used Hyperparameters).
* **Optimization Algorithm** = defines the search algorithm to use to select the best input values to use in each new iteration.

Additionally, can also be defined in **fmin()** the maximum number of evaluations to perform.

Bayesian Optimization can reduce the number of search iterations by choosing the input values bearing in mind the past outcomes. In this way, we can concentrate our search from the beginning on values which are closer to our desired output.
We can now run our Bayesian Optimizer using the fmin() function. A Trials() object is first created to make possible to visualize later what was going on while the **fmin()** function was running (eg. how the loss function was changing and how to used Hyperparameters were changing).


Hyperopt is one of several automated hyperparameter tuning libraries using Bayesian optimization. These libraries differ in the algorithm used to both construct the surrogate (probability model) of the objective function and choose the next hyperparameters to evaluate in the objective function. Hyperopt uses the Tree Parzen Estimator (TPE). Other Python libraries include Spearmint, which uses a Gaussian process for the surrogate, and SMAC, which uses a random forest regression.

Hyperopt has a simple syntax for structuring an optimization problem which extends beyond hyperparameter tuning to any problem that involves minimizing a function.

In [None]:
!pip install hyperopt



In [None]:
from hyperopt import hp, fmin, tpe, STATUS_OK, Trials

space = {
    'criterion': hp.choice('criterion', ['entropy', 'gini']),
    'max_depth':  hp.choice('max_depth', [10, 20, None]),
    'max_features': hp.choice('max_features', ['sqrt','log2']),
    'n_estimators' : hp.choice('n_estimators', [50, 100, 200])   
}

def objective(space):
    model = RandomForestClassifier(criterion = space['criterion'], 
                                   max_depth = space['max_depth'],
                                   max_features = space['max_features'],
                                   n_estimators = space['n_estimators'], 
                                   random_state=42
                                 )
    
    f1 = cross_val_score(model, X_Train, Y_Train, cv=3, scoring='f1').mean()

    # We aim to maximize accuracy, therefore we return it as a negative value
    return {'loss': -f1, 'status': STATUS_OK }
    
trials = Trials()

best = fmin(fn=objective,
            space=space,
            algo=tpe.suggest,
            max_evals=20,
            trials=trials)
best

100%|██████████| 20/20 [01:12<00:00,  3.63s/it, best loss: -0.851851851851852]


{'criterion': 1, 'max_depth': 2, 'max_features': 0, 'n_estimators': 2}

We can now retrieve the set of **best** parameters identified and test our model using the **best** dictionary created during training. Some of the parameters have been stored in the **best** dictionary numerically using indices, therefore, we need first to convert them back as strings before input them in our Random Forest.

In [None]:
crit = {0: 'entropy', 1: 'gini'}
feat = {0: 'sqrt', 1: 'log2'}
depth = {0: 10, 1: 20, 2: None}
est = {0: 50, 1: 100, 2: 200}

trainedforest = RandomForestClassifier(criterion = crit[best['criterion']], 
                                       max_depth = depth[best['max_depth']], 
                                       max_features = feat[best['max_features']],  
                                       n_estimators = est[best['n_estimators']],
                                       random_state=42
                                      ).fit(X_Train,Y_Train)
                                      
predictionforest = trainedforest.predict(X_Test)
print(confusion_matrix(Y_Test,predictionforest))
print(classification_report(Y_Test,predictionforest))

[[2401    2]
 [   2   19]]
              precision    recall  f1-score   support

           0       1.00      1.00      1.00      2403
           1       0.90      0.90      0.90        21

    accuracy                           1.00      2424
   macro avg       0.95      0.95      0.95      2424
weighted avg       1.00      1.00      1.00      2424




# Conclusion <a id="9"></a> <br>

**Now you have a fair understanding of how to do Hyperparameter Tuning with open source libraries as mentioned above.**

In [None]:
GradientBoostingClassifier