#### All Techniques of Hyper Parameter Optimization: 
1. GridSearchCV
2. RandomizedSearchCV
3. Bayesian Optimization - Automate Hyperparameter Tuning(Hyperopt)
4. Sequential Model Based Optimization (Tuning a scikit- learn estimator with skopt)
5. Optuna - Automate Hyperparameter Tuning 
6. Genetic Algorithms (TPOT Classifier)

References:
* https://github.com/fmfn/BayesianOptimization
* https://github.com/hyperopt/hyperopt
* https://www.jeremyjordan.me/hyperparameter-tuning/
* https://optuna.org/
* https://towardsdatascience.com/hyperparameters-optimization-526348bb8e2d(By Pier Paolo Ippolito )
* https://scikit-optimize.github.io/stable/auto_examples/hyperparameter-optimization.html

In [1]:
import warnings
warnings.filterwarnings('ignore')

In [2]:
import pandas as pd
df=pd.read_csv(r'C:\Users\User\Desktop\diabetes.csv')
df.head()

Unnamed: 0,Pregnancies,Glucose,BloodPressure,SkinThickness,Insulin,BMI,DiabetesPedigreeFunction,Age,Outcome
0,6,148,72,35,0,33.6,0.627,50,1
1,1,85,66,29,0,26.6,0.351,31,0
2,8,183,64,0,0,23.3,0.672,32,1
3,1,89,66,23,94,28.1,0.167,21,0
4,0,137,40,35,168,43.1,2.288,33,1


#### Lifecycle of a Data Science 


1. Feature Engineering
2. Feature Selection 
3. Model Creation 
4. Hyperparameter Tuning(Random Forest Classifier)

Why Hyperparameter Tuning is Required  ? 
Hyperparameters are crucial as they control the overall behaviour of a machine learning model. The ultimate goal is to find an 
optimal combination of hyperparametrs that minimizes a predifined loss function to give better results. 

In [3]:
df.columns

Index(['Pregnancies', 'Glucose', 'BloodPressure', 'SkinThickness', 'Insulin',
       'BMI', 'DiabetesPedigreeFunction', 'Age', 'Outcome'],
      dtype='object')

In [4]:
import numpy as np
df['Glucose']=np.where(df['Glucose']==0,df['Glucose'].median(),df['Glucose'])
df['Insulin']=np.where(df['Insulin']==0,df['Insulin'].median(),df['Insulin'])
df['SkinThickness']=np.where(df['SkinThickness']==0,df['SkinThickness'].median(),df['SkinThickness'])
df.head()

Unnamed: 0,Pregnancies,Glucose,BloodPressure,SkinThickness,Insulin,BMI,DiabetesPedigreeFunction,Age,Outcome
0,6,148.0,72,35.0,30.5,33.6,0.627,50,1
1,1,85.0,66,29.0,30.5,26.6,0.351,31,0
2,8,183.0,64,23.0,30.5,23.3,0.672,32,1
3,1,89.0,66,23.0,94.0,28.1,0.167,21,0
4,0,137.0,40,35.0,168.0,43.1,2.288,33,1


In practical, glucose of a body can't be Zero (0). So we have replaced Zero value with median value of glucose 

Now, We are using Random Forest Model here so do we require scaling for this model or not ...!! 
Scaling is not required here because Random Forest model works on decision Tree concept, use branches for analysis.

For insulin too, we have replaced Zeros with median value as a human with zero value can't be alive, in practical insulin can't be zero.

Now, Same for skin thickness too. it can't be zero.

In [5]:
#### Independent And Dependent features
X=df.drop('Outcome',axis=1) # Dropped Outcome column from X
y=df['Outcome'] # Assigned Outcome column to y , this is target variable ( dependent feature)

In [6]:
print(X.head())
print(y.head())

   Pregnancies  Glucose  BloodPressure  SkinThickness  Insulin   BMI  \
0            6    148.0             72           35.0     30.5  33.6   
1            1     85.0             66           29.0     30.5  26.6   
2            8    183.0             64           23.0     30.5  23.3   
3            1     89.0             66           23.0     94.0  28.1   
4            0    137.0             40           35.0    168.0  43.1   

   DiabetesPedigreeFunction  Age  
0                     0.627   50  
1                     0.351   31  
2                     0.672   32  
3                     0.167   21  
4                     2.288   33  
0    1
1    0
2    1
3    0
4    1
Name: Outcome, dtype: int64


In [7]:
#### Train Test Split
from sklearn.model_selection import train_test_split
X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.20,random_state=33)

In [8]:
from sklearn.ensemble import RandomForestClassifier
rf_classifier=RandomForestClassifier(n_estimators=10).fit(X_train,y_train)
prediction=rf_classifier.predict(X_test)

In [9]:
y.value_counts()

0    500
1    268
Name: Outcome, dtype: int64

In [10]:
from sklearn.metrics import confusion_matrix,classification_report,accuracy_score
print(confusion_matrix(y_test,prediction))
print(accuracy_score(y_test,prediction))
print(classification_report(y_test,prediction))

[[84 15]
 [27 28]]
0.7272727272727273
              precision    recall  f1-score   support

           0       0.76      0.85      0.80        99
           1       0.65      0.51      0.57        55

    accuracy                           0.73       154
   macro avg       0.70      0.68      0.69       154
weighted avg       0.72      0.73      0.72       154



The main parameters used by a Random Forest Classifier are:

* criterion = the function used to evaluate the quality of a split.
* max_depth = maximum number of levels allowed in each tree.
* max_features = maximum number of features considered when splitting a node.
* min_samples_leaf = minimum number of samples which can be stored in a tree leaf.
* min_samples_split = minimum number of samples necessary in a node to cause node splitting.
* n_estimators = number of trees in the ensemble.

#### Manual Hyperparameter Tuning ( Random Selection )

In [11]:
model=RandomForestClassifier(n_estimators=300,criterion='entropy',
                             max_features='sqrt',min_samples_leaf=10,random_state=100).fit(X_train,y_train)
predictions=model.predict(X_test)
print(confusion_matrix(y_test,predictions))
print(accuracy_score(y_test,predictions))
print(classification_report(y_test,predictions))

[[87 12]
 [28 27]]
0.7402597402597403
              precision    recall  f1-score   support

           0       0.76      0.88      0.81        99
           1       0.69      0.49      0.57        55

    accuracy                           0.74       154
   macro avg       0.72      0.68      0.69       154
weighted avg       0.73      0.74      0.73       154



Here, at manual (random) tuning we got reduced value. so this technique should not be used ...

Which Hyperparameter tuning should we use first ?? randomized or gridsearch.
Answer: suppose there are 1 million people and those one million people are in in four areas and if you want to search 'Gaurav Singh'
then you have no idea in which area i'm located .
Grid Search ---- will go to check every area ---- A, B, C, D ---- B
Randomized searchCV ---- RANDOMLY SEARCH ANY AREA. --- B & C --- Narrow Down Our Results. (may be present in B & C) 
GridSearch takes the combination of all permutation and combination, find the required details 

In [12]:
[int(x) for x in np.linspace(start = 200, stop = 2000, num = 10)]

[200, 400, 600, 800, 1000, 1200, 1400, 1600, 1800, 2000]

In [13]:
import numpy as np
from sklearn.model_selection import RandomizedSearchCV
# Number of trees in random forest
n_estimators = [int(x) for x in np.linspace(start = 200, stop = 2000, num = 10)]
# Number of features to consider at every split
max_features = ['auto', 'sqrt','log2']
# Maximum number of levels in tree
max_depth = [int(x) for x in np.linspace(10, 1000,10)]
# Minimum number of samples required to split a node
min_samples_split = [1,2,3,4,5,6,7,8,9]
# Minimum number of samples required at each leaf node
min_samples_leaf = [1, 2, 4,6,8]
# Create the random grid
random_grid = {'n_estimators': n_estimators,
               'max_features': max_features,
               'max_depth': max_depth,
               'min_samples_split': min_samples_split,
               'min_samples_leaf': min_samples_leaf,
              'criterion':['entropy','gini']}
print(random_grid)

{'n_estimators': [200, 400, 600, 800, 1000, 1200, 1400, 1600, 1800, 2000], 'max_features': ['auto', 'sqrt', 'log2'], 'max_depth': [10, 120, 230, 340, 450, 560, 670, 780, 890, 1000], 'min_samples_split': [1, 2, 3, 4, 5, 6, 7, 8, 9], 'min_samples_leaf': [1, 2, 4, 6, 8], 'criterion': ['entropy', 'gini']}


In [14]:
# Random Forest takes permutation and combination. 
rf=RandomForestClassifier()
rf_randomcv=RandomizedSearchCV(estimator=rf,param_distributions=random_grid,n_iter=100,cv=3,verbose=2,
                               random_state=100,n_jobs=-1)
### fit the randomized model
rf_randomcv.fit(X_train,y_train)

Fitting 3 folds for each of 100 candidates, totalling 300 fits


[Parallel(n_jobs=-1)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=-1)]: Done  33 tasks      | elapsed:   34.6s
[Parallel(n_jobs=-1)]: Done 154 tasks      | elapsed:  2.4min
[Parallel(n_jobs=-1)]: Done 300 out of 300 | elapsed:  4.9min finished


RandomizedSearchCV(cv=3, estimator=RandomForestClassifier(), n_iter=100,
                   n_jobs=-1,
                   param_distributions={'criterion': ['entropy', 'gini'],
                                        'max_depth': [10, 120, 230, 340, 450,
                                                      560, 670, 780, 890,
                                                      1000],
                                        'max_features': ['auto', 'sqrt',
                                                         'log2'],
                                        'min_samples_leaf': [1, 2, 4, 6, 8],
                                        'min_samples_split': [1, 2, 3, 4, 5, 6,
                                                              7, 8, 9],
                                        'n_estimators': [200, 400, 600, 800,
                                                         1000, 1200, 1400, 1600,
                                                         1800, 2000]},
             

In [15]:
rf_randomcv.best_params_

{'n_estimators': 2000,
 'min_samples_split': 8,
 'min_samples_leaf': 6,
 'max_features': 'log2',
 'max_depth': 230,
 'criterion': 'entropy'}

In [16]:
rf_randomcv

RandomizedSearchCV(cv=3, estimator=RandomForestClassifier(), n_iter=100,
                   n_jobs=-1,
                   param_distributions={'criterion': ['entropy', 'gini'],
                                        'max_depth': [10, 120, 230, 340, 450,
                                                      560, 670, 780, 890,
                                                      1000],
                                        'max_features': ['auto', 'sqrt',
                                                         'log2'],
                                        'min_samples_leaf': [1, 2, 4, 6, 8],
                                        'min_samples_split': [1, 2, 3, 4, 5, 6,
                                                              7, 8, 9],
                                        'n_estimators': [200, 400, 600, 800,
                                                         1000, 1200, 1400, 1600,
                                                         1800, 2000]},
             

In [17]:
rf_randomcv.best_estimator_

RandomForestClassifier(criterion='entropy', max_depth=230, max_features='log2',
                       min_samples_leaf=6, min_samples_split=8,
                       n_estimators=2000)

In [18]:
best_random_grid=rf_randomcv.best_estimator_

In [19]:
from sklearn.metrics import accuracy_score
y_pred=best_random_grid.predict(X_test)
print(confusion_matrix(y_test,y_pred))
print("Accuracy Score {}".format(accuracy_score(y_test,y_pred)))
print("Classification report: {}".format(classification_report(y_test,y_pred)))

[[86 13]
 [24 31]]
Accuracy Score 0.7597402597402597
Classification report:               precision    recall  f1-score   support

           0       0.78      0.87      0.82        99
           1       0.70      0.56      0.63        55

    accuracy                           0.76       154
   macro avg       0.74      0.72      0.72       154
weighted avg       0.75      0.76      0.75       154



Very little increase in efficiency... 

#### GridSearchCV 

We use random forest parameters here too, just for comparing .....

In [20]:
rf_randomcv.best_params_

{'n_estimators': 2000,
 'min_samples_split': 8,
 'min_samples_leaf': 6,
 'max_features': 'log2',
 'max_depth': 230,
 'criterion': 'entropy'}

In [21]:
[rf_randomcv.best_params_['min_samples_leaf'], 
                         rf_randomcv.best_params_['min_samples_leaf']+2, 
                         rf_randomcv.best_params_['min_samples_leaf'] + 4]

[6, 8, 10]

In [22]:
[rf_randomcv.best_params_['min_samples_split'] - 2,
                          rf_randomcv.best_params_['min_samples_split'] - 1,
                          rf_randomcv.best_params_['min_samples_split'], 
                          rf_randomcv.best_params_['min_samples_split'] +1,
                          rf_randomcv.best_params_['min_samples_split'] + 2]

[6, 7, 8, 9, 10]

In [23]:
[rf_randomcv.best_params_['n_estimators'] - 200, rf_randomcv.best_params_['n_estimators'] - 100, 
                     rf_randomcv.best_params_['n_estimators'], 
                     rf_randomcv.best_params_['n_estimators'] + 100, rf_randomcv.best_params_['n_estimators'] + 200]

[1800, 1900, 2000, 2100, 2200]

In [24]:

from sklearn.model_selection import GridSearchCV

param_grid = {
    'criterion': [rf_randomcv.best_params_['criterion']],
    'max_depth': [rf_randomcv.best_params_['max_depth']],
    'max_features': [rf_randomcv.best_params_['max_features']],
    'min_samples_leaf': [rf_randomcv.best_params_['min_samples_leaf'], 
                         rf_randomcv.best_params_['min_samples_leaf']+2, 
                         rf_randomcv.best_params_['min_samples_leaf'] + 4],
    'min_samples_split': [rf_randomcv.best_params_['min_samples_split'] - 2,
                          rf_randomcv.best_params_['min_samples_split'] - 1,
                          rf_randomcv.best_params_['min_samples_split'], 
                          rf_randomcv.best_params_['min_samples_split'] +1,
                          rf_randomcv.best_params_['min_samples_split'] + 2],
    'n_estimators': [rf_randomcv.best_params_['n_estimators'] - 200, rf_randomcv.best_params_['n_estimators'] - 100, 
                     rf_randomcv.best_params_['n_estimators'], 
                     rf_randomcv.best_params_['n_estimators'] + 100, rf_randomcv.best_params_['n_estimators'] + 200]
}

print(param_grid)

{'criterion': ['entropy'], 'max_depth': [230], 'max_features': ['log2'], 'min_samples_leaf': [6, 8, 10], 'min_samples_split': [6, 7, 8, 9, 10], 'n_estimators': [1800, 1900, 2000, 2100, 2200]}


In [25]:
# The number of iteration in GridSearchCv is depend upon Number of values in parameters. No. of permutation & combination -- 
1*1*1*3*5*5 

75

In [26]:
#### Fit the grid_search to the data
rf=RandomForestClassifier()
grid_search=GridSearchCV(estimator=rf,param_grid=param_grid,cv=10,n_jobs=-1,verbose=2)
grid_search.fit(X_train,y_train)

Fitting 10 folds for each of 75 candidates, totalling 750 fits


[Parallel(n_jobs=-1)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=-1)]: Done  33 tasks      | elapsed:  1.3min
[Parallel(n_jobs=-1)]: Done 154 tasks      | elapsed:  5.9min
[Parallel(n_jobs=-1)]: Done 357 tasks      | elapsed: 13.7min
[Parallel(n_jobs=-1)]: Done 640 tasks      | elapsed: 23.8min
[Parallel(n_jobs=-1)]: Done 750 out of 750 | elapsed: 27.7min finished


GridSearchCV(cv=10, estimator=RandomForestClassifier(), n_jobs=-1,
             param_grid={'criterion': ['entropy'], 'max_depth': [230],
                         'max_features': ['log2'],
                         'min_samples_leaf': [6, 8, 10],
                         'min_samples_split': [6, 7, 8, 9, 10],
                         'n_estimators': [1800, 1900, 2000, 2100, 2200]},
             verbose=2)

In [27]:
grid_search.best_estimator_

RandomForestClassifier(criterion='entropy', max_depth=230, max_features='log2',
                       min_samples_leaf=10, min_samples_split=7,
                       n_estimators=1800)

In [28]:
best_grid=grid_search.best_estimator_

In [29]:
best_grid

RandomForestClassifier(criterion='entropy', max_depth=230, max_features='log2',
                       min_samples_leaf=10, min_samples_split=7,
                       n_estimators=1800)

In [30]:
y_pred=best_grid.predict(X_test)
print(confusion_matrix(y_test,y_pred))
print("Accuracy Score {}".format(accuracy_score(y_test,y_pred)))
print("Classification report: {}".format(classification_report(y_test,y_pred)))

[[86 13]
 [29 26]]
Accuracy Score 0.7272727272727273
Classification report:               precision    recall  f1-score   support

           0       0.75      0.87      0.80        99
           1       0.67      0.47      0.55        55

    accuracy                           0.73       154
   macro avg       0.71      0.67      0.68       154
weighted avg       0.72      0.73      0.71       154



Automated Hyperparameter Tuning
Automated Hyperparameter Tuning can be done by using techniques such as

*Bayesian Optimization
*Gradient Descent
*Evolutionary Algorithms

Bayesian Optimization
Bayesian optimization uses probability to find the minimum of a function. The final aim is to find the input value to a function which can gives us the lowest possible output value.It usually performs better than random,grid and manual search providing better performance in the testing phase and reduced optimization time. In Hyperopt, Bayesian Optimization can be implemented giving 3 three main parameters to the function fmin.

*Objective Function = defines the loss function to minimize.
*Domain Space = defines the range of input values to test (in Bayesian Optimization this space creates a probability distribution for each of the used Hyperparameters).
*Optimization Algorithm = defines the search algorithm to use to select the best input values to use in each new iteration.

In [31]:
from hyperopt import hp,fmin,tpe,STATUS_OK,Trials

In [32]:
space = {'criterion': hp.choice('criterion', ['entropy', 'gini']),
        'max_depth': hp.quniform('max_depth', 10, 1200, 10),
        'max_features': hp.choice('max_features', ['auto', 'sqrt','log2', None]),
        'min_samples_leaf': hp.uniform('min_samples_leaf', 0, 0.5),
        'min_samples_split' : hp.uniform ('min_samples_split', 0, 1),
        'n_estimators' : hp.choice('n_estimators', [10, 50, 300, 750, 1200,1300,1500])
    }

In [33]:
space

{'criterion': <hyperopt.pyll.base.Apply at 0x205caab8220>,
 'max_depth': <hyperopt.pyll.base.Apply at 0x205caab83d0>,
 'max_features': <hyperopt.pyll.base.Apply at 0x205caab84c0>,
 'min_samples_leaf': <hyperopt.pyll.base.Apply at 0x205caab86a0>,
 'min_samples_split': <hyperopt.pyll.base.Apply at 0x205caab87f0>,
 'n_estimators': <hyperopt.pyll.base.Apply at 0x205caab88e0>}

In [34]:
def objective(space):
    model = RandomForestClassifier(criterion = space['criterion'], max_depth = space['max_depth'],
                                 max_features = space['max_features'],
                                 min_samples_leaf = space['min_samples_leaf'],
                                 min_samples_split = space['min_samples_split'],
                                 n_estimators = space['n_estimators'], 
                                 )
    
    accuracy = cross_val_score(model, X_train, y_train, cv = 5).mean()

    # We aim to maximize accuracy, therefore we return it as a negative value
    return {'loss': -accuracy, 'status': STATUS_OK }

In [35]:
from sklearn.model_selection import cross_val_score
trials = Trials()
best = fmin(fn= objective,
            space= space,
            algo= tpe.suggest,
            max_evals = 80,
            trials= trials)
best

100%|████████████████████████████████████████████| 80/80 [2:31:18<00:00, 113.48s/trial, best loss: -0.7752765560442489]


{'criterion': 1,
 'max_depth': 870.0,
 'max_features': 0,
 'min_samples_leaf': 0.0015576558449129276,
 'min_samples_split': 0.04643968321189477,
 'n_estimators': 4}

#### Genetic Algorithmns 

This algorithm tries to apply natural selection mechanisms to machine learning. 

Let's imagine we create a population of N machine learning models with some predifined Hyper-parameters. We can then calculate the accuracy of each model and decide to keep 
just half of the models(the ones that performs best). we can now generate some offsprings having similar hyperparameters to the ones of the best models so that go get again a population of N models. 
At this point we can again calculate the accuracy of each model and repeat the cycle for a defined number of generations. In this way, just the best models will survive at the end of the process. 

In [1]:
import numpy as np 
import pandas as pd 

In [5]:
from sklearn.model_selection import RandomizedSearchCV
# Number of trees in random forest
n_estimators = [int(x) for x in np.linspace(start = 200, stop = 2000, num = 10 )]
# Number of features to consider at every split 
max_features = ['auto', 'sqrt', 'log2']
# max number of levels in tree 
max_depth = [int(x) for x in np.linspace(10, 1000, 10)]
# Minimum number of samples required to split a node 
min_samples_split = [2,5,10,14]
# Minimum number of samples required at each leaf node 
min_samples_leaf = [1,2,3,4,6,8]
# Create the random grid 
param = {'n_estimators' : n_estimators, 'max_features': max_features, 'max_depth': max_depth, 'min_samples_split': min_samples_split, 'min_samples_leaf': min_samples_leaf, 'criterion': ['entropy', 'gini']}
print(param)


{'n_estimators': [200, 400, 600, 800, 1000, 1200, 1400, 1600, 1800, 2000], 'max_features': ['auto', 'sqrt', 'log2'], 'max_depth': [10, 120, 230, 340, 450, 560, 670, 780, 890, 1000], 'min_samples_split': [2, 5, 10, 14], 'min_samples_leaf': [1, 2, 3, 4, 6, 8], 'criterion': ['entropy', 'gini']}
