# All Techniques Of Hyper Parameter Optimization
1) GridSearchCV
2) RandomizedSearchCV
3) Bayesian Optimization -Automate Hyperparameter Tuning (Hyperopt)
4) Sequential Model Based Optimization(Tuning a scikit-learn estimator with skopt)
5) Optuna- Automate Hyperparameter Tuning
6) Genetic Algorithms (TPOT Classifier)

In [1]:
import warnings
warnings.filterwarnings('ignore')

In [2]:
import pandas as pd
df=pd.read_csv(r'C:\Users\user\Pictures\PYTHON_PANDAS\diabetes.csv')
df.head()

Unnamed: 0,Pregnancies,Glucose,BloodPressure,SkinThickness,Insulin,BMI,DiabetesPedigreeFunction,Age,Outcome
0,6,148,72,35,0,33.6,0.627,50,1
1,1,85,66,29,0,26.6,0.351,31,0
2,8,183,64,0,0,23.3,0.672,32,1
3,1,89,66,23,94,28.1,0.167,21,0
4,0,137,40,35,168,43.1,2.288,33,1


In [3]:
import numpy as np
df['Glucose']=np.where(df['Glucose']==0,df['Glucose'].median(),df['Glucose'])
df['Insulin']=np.where(df['Insulin']==0,df['Insulin'].median(),df['Insulin'])
df['SkinThickness']=np.where(df['SkinThickness']==0,df['SkinThickness'].median(),df['SkinThickness'])

df.head()

Unnamed: 0,Pregnancies,Glucose,BloodPressure,SkinThickness,Insulin,BMI,DiabetesPedigreeFunction,Age,Outcome
0,6,148.0,72,35.0,30.5,33.6,0.627,50,1
1,1,85.0,66,29.0,30.5,26.6,0.351,31,0
2,8,183.0,64,23.0,30.5,23.3,0.672,32,1
3,1,89.0,66,23.0,94.0,28.1,0.167,21,0
4,0,137.0,40,35.0,168.0,43.1,2.288,33,1


In [4]:
#### Independent And Dependent features
X=df.drop('Outcome',axis=1)
y=df['Outcome']

In [5]:
print("shape of X : ",X.shape)
print("shape of y : ",y.shape)

shape of X :  (768, 8)
shape of y :  (768,)


### Train Test Split

In [10]:
from sklearn.model_selection import train_test_split
X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.20,random_state=33)

In [11]:
print("shape of X_train : ",X_train.shape)
print("shape of y_train : ",y_train.shape)
print("shape of X_test : ",X_test.shape)
print("shape of y_test : ",y_test.shape)

shape of X_train :  (614, 8)
shape of y_train :  (614,)
shape of X_test :  (154, 8)
shape of y_test :  (154,)


In [12]:
y.value_counts()


0    500
1    268
Name: Outcome, dtype: int64

-- Data is not imbalanced

## RandomForestCLasssifier 

In [22]:
from sklearn.ensemble import RandomForestClassifier
rf_classifier=RandomForestClassifier(n_estimators=10).fit(X_train,y_train)
prediction=rf_classifier.predict(X_test)

In [23]:
from sklearn.metrics import confusion_matrix,classification_report,accuracy_score
print(confusion_matrix(y_test,prediction))
print("------------------------------------------------------------------")
print(accuracy_score(y_test,prediction))
print("------------------------------------------------------------------")
print(classification_report(y_test,prediction))

[[87 12]
 [24 31]]
------------------------------------------------------------------
0.7662337662337663
------------------------------------------------------------------
              precision    recall  f1-score   support

           0       0.78      0.88      0.83        99
           1       0.72      0.56      0.63        55

    accuracy                           0.77       154
   macro avg       0.75      0.72      0.73       154
weighted avg       0.76      0.77      0.76       154



### The main parameters used by a Random Forest Classifier are:

--> criterion = the function used to evaluate the quality of a split. || 
--> max_depth = maximum number of levels allowed in each tree.  ||
--> max_features = maximum number of features considered when splitting a node.  ||
--> min_samples_leaf = minimum number of samples which can be stored in a tree leaf.   ||
--> min_samples_split = minimum number of samples necessary in a node to cause node splitting.   ||
--> n_estimators = number of trees in the ensamble.   ||

In [24]:
### Manual Hyperparameter Tuning
model=RandomForestClassifier(n_estimators=300,criterion='entropy',
                             max_features='sqrt',min_samples_leaf=10,random_state=100).fit(X_train,y_train)
predictions=model.predict(X_test)
print(confusion_matrix(y_test,predictions))
print(accuracy_score(y_test,predictions))
print(classification_report(y_test,predictions))

[[87 12]
 [28 27]]
0.7402597402597403
              precision    recall  f1-score   support

           0       0.76      0.88      0.81        99
           1       0.69      0.49      0.57        55

    accuracy                           0.74       154
   macro avg       0.72      0.68      0.69       154
weighted avg       0.73      0.74      0.73       154



# Randomized Search Cv


In [25]:
import numpy as np
from sklearn.model_selection import RandomizedSearchCV
# Number of trees in random forest
n_estimators = [int(x) for x in np.linspace(start = 200, stop = 2000, num = 10)]
# Number of features to consider at every split
max_features = ['auto', 'sqrt','log2']
# Maximum number of levels in tree
max_depth = [int(x) for x in np.linspace(10, 1000,10)]
# Minimum number of samples required to split a node
min_samples_split = [1,3,4,5,7,9]
# Minimum number of samples required at each leaf node
min_samples_leaf = [1, 2, 4,6,8]

random_grid = {
    'n_estimators' : n_estimators,
    'max_features' : max_features ,
    'max_depth' : max_depth ,
    'min_samples_split' : min_samples_split,
    'min_samples_leaf' : min_samples_leaf,
    'criterion' : ['entropy','gini']
}

print(random_grid)

{'n_estimators': [200, 400, 600, 800, 1000, 1200, 1400, 1600, 1800, 2000], 'max_features': ['auto', 'sqrt', 'log2'], 'max_depth': [10, 120, 230, 340, 450, 560, 670, 780, 890, 1000], 'min_samples_split': [1, 3, 4, 5, 7, 9], 'min_samples_leaf': [1, 2, 4, 6, 8], 'criterion': ['entropy', 'gini']}


In [27]:
rf=RandomForestClassifier()
rf_randomcv=RandomizedSearchCV(estimator= rf, param_distributions= random_grid,n_iter=100,cv=3,verbose=2,
                               random_state=100,n_jobs=-1 )
rf_randomcv.fit(X_train,y_train)

Fitting 3 folds for each of 100 candidates, totalling 300 fits


[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  25 tasks      | elapsed:   18.7s
[Parallel(n_jobs=-1)]: Done 146 tasks      | elapsed:  1.5min
[Parallel(n_jobs=-1)]: Done 300 out of 300 | elapsed:  3.1min finished


RandomizedSearchCV(cv=3, estimator=RandomForestClassifier(), n_iter=100,
                   n_jobs=-1,
                   param_distributions={'criterion': ['entropy', 'gini'],
                                        'max_depth': [10, 120, 230, 340, 450,
                                                      560, 670, 780, 890,
                                                      1000],
                                        'max_features': ['auto', 'sqrt',
                                                         'log2'],
                                        'min_samples_leaf': [1, 2, 4, 6, 8],
                                        'min_samples_split': [1, 3, 4, 5, 7, 9],
                                        'n_estimators': [200, 400, 600, 800,
                                                         1000, 1200, 1400, 1600,
                                                         1800, 2000]},
                   random_state=100, verbose=2)

In [29]:
rf_randomcv.best_score_

0.7769249163079867

In [30]:
best_random_grid=rf_randomcv.best_estimator_


In [32]:
from sklearn.metrics import accuracy_score
y_pred=best_random_grid.predict(X_test)
print(confusion_matrix(y_test,y_pred))
print()
print("Accuracy Score {}".format(accuracy_score(y_test,y_pred)))
print()
print("Classification report: {}".format(classification_report(y_test,y_pred)))

[[87 12]
 [24 31]]

Accuracy Score 0.7662337662337663

Classification report:               precision    recall  f1-score   support

           0       0.78      0.88      0.83        99
           1       0.72      0.56      0.63        55

    accuracy                           0.77       154
   macro avg       0.75      0.72      0.73       154
weighted avg       0.76      0.77      0.76       154



## GridSearch CV

In [33]:
rf_randomcv.best_params_

{'n_estimators': 1000,
 'min_samples_split': 7,
 'min_samples_leaf': 2,
 'max_features': 'auto',
 'max_depth': 890,
 'criterion': 'gini'}

In [35]:
from sklearn.model_selection import GridSearchCV

param_grid = {
    'criterion': [rf_randomcv.best_params_['criterion']],
    'max_depth': [rf_randomcv.best_params_['max_depth']],
    'max_features': [rf_randomcv.best_params_['max_features']],
    'min_samples_leaf': [rf_randomcv.best_params_['min_samples_leaf'], 
                         rf_randomcv.best_params_['min_samples_leaf']+2, 
                         rf_randomcv.best_params_['min_samples_leaf'] + 4],
    'min_samples_split': [rf_randomcv.best_params_['min_samples_split'] - 2,
                          rf_randomcv.best_params_['min_samples_split'] - 1,
                          rf_randomcv.best_params_['min_samples_split'], 
                          rf_randomcv.best_params_['min_samples_split'] +1,
                          rf_randomcv.best_params_['min_samples_split'] + 2],
    'n_estimators': [rf_randomcv.best_params_['n_estimators'] - 500, rf_randomcv.best_params_['n_estimators'] - 300, 
                     rf_randomcv.best_params_['n_estimators'], 
                     rf_randomcv.best_params_['n_estimators'] + 200, rf_randomcv.best_params_['n_estimators'] + 300]
}

print(param_grid)

{'criterion': ['gini'], 'max_depth': [890], 'max_features': ['auto'], 'min_samples_leaf': [2, 4, 6], 'min_samples_split': [5, 6, 7, 8, 9], 'n_estimators': [500, 700, 1000, 1200, 1300]}


In [36]:
#### Fit the grid_search to the data
rf=RandomForestClassifier()
grid_search=GridSearchCV(estimator=rf,param_grid=param_grid,cv=10,n_jobs=-1,verbose=2)
grid_search.fit(X_train,y_train)

Fitting 10 folds for each of 75 candidates, totalling 750 fits


[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
[Parallel(n_jobs=-1)]: Done  25 tasks      | elapsed:   12.1s
[Parallel(n_jobs=-1)]: Done 146 tasks      | elapsed:  1.3min
[Parallel(n_jobs=-1)]: Done 349 tasks      | elapsed:  3.3min
[Parallel(n_jobs=-1)]: Done 632 tasks      | elapsed:  5.9min
[Parallel(n_jobs=-1)]: Done 750 out of 750 | elapsed:  7.0min finished


GridSearchCV(cv=10, estimator=RandomForestClassifier(), n_jobs=-1,
             param_grid={'criterion': ['gini'], 'max_depth': [890],
                         'max_features': ['auto'],
                         'min_samples_leaf': [2, 4, 6],
                         'min_samples_split': [5, 6, 7, 8, 9],
                         'n_estimators': [500, 700, 1000, 1200, 1300]},
             verbose=2)

In [37]:
grid_search.best_estimator_

RandomForestClassifier(max_depth=890, min_samples_leaf=6, min_samples_split=6,
                       n_estimators=1300)

In [38]:
best_grid=grid_search.best_estimator_
best_grid

RandomForestClassifier(max_depth=890, min_samples_leaf=6, min_samples_split=6,
                       n_estimators=1300)

In [39]:
y_pred=best_grid.predict(X_test)
print(confusion_matrix(y_test,y_pred))
print()
print("Accuracy Score {}".format(accuracy_score(y_test,y_pred)))
print()
print("Classification report: {}".format(classification_report(y_test,y_pred)))

[[87 12]
 [27 28]]

Accuracy Score 0.7467532467532467

Classification report:               precision    recall  f1-score   support

           0       0.76      0.88      0.82        99
           1       0.70      0.51      0.59        55

    accuracy                           0.75       154
   macro avg       0.73      0.69      0.70       154
weighted avg       0.74      0.75      0.74       154



# Automated Hyperparameter Tuning
Automated Hyperparameter Tuning can be done by using techniques such as

i) Bayesian Optimization               ||
ii) Gradient Descent                     ||
iii) Evolutionary Algorithms

# Bayesian Optimization
Bayesian optimization uses probability to find the minimum of a function. The final aim is to find the input value to a function which can gives us the lowest possible output value.It usually performs better than random,grid and manual search providing better performance in the testing phase and reduced optimization time. In Hyperopt, Bayesian Optimization can be implemented giving 3 three main parameters to the function fmin.

1) Objective Function = defines the loss function to minimize.
2) Domain Space = defines the range of input values to test (in Bayesian Optimization this space creates a probability distribution for each of the used Hyperparameters).
3) Optimization Algorithm = defines the search algorithm to use to select the best input values to use in each new iteration.