# Hyperparameter Tuning

Machine Learning models are composed of two different types of parameters:

- **Hyperparameters** = are all the parameters which can be arbitrarily set by the user before starting training (eg. number of estimators in Random Forest). Hyperparameters determine how our model is structured in the first place.
- **Model parameters** = are instead learned during the model training (eg. weights in Neural Networks, Linear Regression). The model parameters define how to use input data to get the desired output and are learned at training time.

Machine Learning models tuning is a type of optimization problem. We have a set of hyperparameters and we aim to find the right combination of their values which can help us to find either the minimum (eg. loss) or the maximum (eg. accuracy) of a function

## Definition

Wikipedia states that:

*A hyperparameter is a parameter whose value is set before the learning process begins. Hyperparameter tuning is choosing a set of optimal hyperparameters for a learning algorithm.* 

Some examples of hyperparameters include penalty in logistic regression and loss in stochastic gradient descent. In sklearn, hyperparameters are passed in as arguments to the constructor of the model classes

# Tuning Strategies

### 1. Manual Search
**Advantage of manual tuning is:**
- You can learn the behavior of hyperparameters and use your knowledge in another project. Therefore, I would recommend doing a manual tuning of major models at least once.

**Disadvantage is :**
- Manual works are required.
You may overthink about the unexpected movement of the score without trying many and checking if it was generalized movement.
Time Management

### 2. Random Search
**Advantage of the use of random search is:**
- You do not have to worry about the run time because you can control the number of parameter searches.

**Disadvantage is:**
- There should be some compromise that the finally selected hyperparameter set might not be the true best out of the ranges you put in the search.
- Depending on the number of searches and how large the parameter space is, some parameters might not be explored enough.

### 3. Grid Search
**Advantage of this approach is:**
- You can cover all possible prospective sets of parameters. No matter how you strongly believed one set is most viable, who knows, the neighbor could be more successful. You do not lose that possibility with grid search.

**The disadvantage is that it is:**
- One run for one hyperparameter set takes some while. The run time of the whole parameter sets can be huge, and therefore the number of parameters to explore has practical limitations.##

# More Detail On Hyperparameter

## 1. Random Search CV
In Random Search, we create a grid of hyperparameters and train/test our model on just some random combination of these hyperparameters. In this example, I additionally decided to perform **Cross-Validation** on the training set.

When performing Machine Learning tasks, we generally divide our dataset in training and test sets. This is done so that to test our model after having trained it (in this way we can check it’s performances when working with unseen data). When using Cross-Validation, we divide our training set into N other partitions to make sure our model is not overfitting our data.

One of the most common used Cross-Validation methods is K-Fold Validation. In K-Fold, we divide our training set into N partitions and then iteratively train our model using N-1 partitions and test it with the left-over partition (at each iteration we change the left-over partition). Once having trained N times the model we then average the training results obtained in each iteration to obtain our overall training performance results (Figure 3).

<img src='image_skssk\22_img.png' width='500'>

Using Cross-Validation when implementing Hyperparameters optimization can be really important. In this way, we might avoid using some Hyperparameters which works really good on the training data but not so good with the test data.

We can now start implementing Random Search by first defying a grid of hyperparameters which will be randomly sampled when calling RandomizedSearchCV().

For example: We can divide our training set into 4 Folds (cv = 4) and select 80 as the number of combinations to sample (n_iter = 80). Using the scikit-learn bestestimator attribute, we can then retrieve the set of hyperparameters which performed best during training to test our model.

## 2. Grid Search CV
In Grid Search, we set up a grid of hyperparameters and train/test our model on each of the possible combinations.

In order to choose the parameters to use in Grid Search, we can now look at which parameters worked best with Random Search and form a grid based on them to see if we can find a better combination.

Grid Search can be implemented in Python using scikit-learn GridSearchCV() function.

When using Grid Search, all the possible combinations of the parameters in the grid are tried. In this case, 128000 combinations (2 × 10 × 4 × 4 × 4 × 10) will be used during training. Instead, in the Grid Search example before, just 80 combinations have been used.

## Cancer Dataset

In [31]:
import pandas as pd
import numpy as np
import seaborn
import matplotlib.pyplot as plt

from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report, accuracy_score, confusion_matrix, recall_score, precision_score, roc_auc_score, f1_score

# untuk hyperparameter tuning
from sklearn.model_selection import RandomizedSearchCV
from sklearn.model_selection import GridSearchCV

import warnings
warnings.filterwarnings('ignore')

In [32]:
df = pd.read_csv(r'C:\Users\HP.LAPTOP-5BTBEJFV\Documents\data science\SHIFTACADEMY\Bahan Ajar\SKSSK\Data\cancer.csv')
df.head(2)

Unnamed: 0,id,diagnosis,radius_mean,texture_mean,perimeter_mean,area_mean,smoothness_mean,compactness_mean,concavity_mean,concave points_mean,...,texture_worst,perimeter_worst,area_worst,smoothness_worst,compactness_worst,concavity_worst,concave points_worst,symmetry_worst,fractal_dimension_worst,Unnamed: 32
0,842302,M,17.99,10.38,122.8,1001.0,0.1184,0.2776,0.3001,0.1471,...,17.33,184.6,2019.0,0.1622,0.6656,0.7119,0.2654,0.4601,0.1189,
1,842517,M,20.57,17.77,132.9,1326.0,0.08474,0.07864,0.0869,0.07017,...,23.41,158.8,1956.0,0.1238,0.1866,0.2416,0.186,0.275,0.08902,


In [33]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 569 entries, 0 to 568
Data columns (total 33 columns):
 #   Column                   Non-Null Count  Dtype  
---  ------                   --------------  -----  
 0   id                       569 non-null    int64  
 1   diagnosis                569 non-null    object 
 2   radius_mean              569 non-null    float64
 3   texture_mean             569 non-null    float64
 4   perimeter_mean           569 non-null    float64
 5   area_mean                569 non-null    float64
 6   smoothness_mean          569 non-null    float64
 7   compactness_mean         569 non-null    float64
 8   concavity_mean           569 non-null    float64
 9   concave points_mean      569 non-null    float64
 10  symmetry_mean            569 non-null    float64
 11  fractal_dimension_mean   569 non-null    float64
 12  radius_se                569 non-null    float64
 13  texture_se               569 non-null    float64
 14  perimeter_se             5

In [34]:
# drop kolom yang tidak digunakan
df.drop(['Unnamed: 32', 'id'], axis=1, inplace = True)

In [35]:
# encode target
df['diagnosis'] = [1 if i == 'M' else 0 for i in df['diagnosis']]
df.head(2)

Unnamed: 0,diagnosis,radius_mean,texture_mean,perimeter_mean,area_mean,smoothness_mean,compactness_mean,concavity_mean,concave points_mean,symmetry_mean,...,radius_worst,texture_worst,perimeter_worst,area_worst,smoothness_worst,compactness_worst,concavity_worst,concave points_worst,symmetry_worst,fractal_dimension_worst
0,1,17.99,10.38,122.8,1001.0,0.1184,0.2776,0.3001,0.1471,0.2419,...,25.38,17.33,184.6,2019.0,0.1622,0.6656,0.7119,0.2654,0.4601,0.1189
1,1,20.57,17.77,132.9,1326.0,0.08474,0.07864,0.0869,0.07017,0.1812,...,24.99,23.41,158.8,1956.0,0.1238,0.1866,0.2416,0.186,0.275,0.08902


In [36]:
x = df.drop(['diagnosis'], axis = 1)
y = df['diagnosis']

x_train, x_test, y_train, y_test = train_test_split(x, y, test_size = 0.10, random_state=0)

model_LogReg_Asli = LogisticRegression()
model_LogReg_Asli.fit(x_train, y_train)


print(model_LogReg_Asli.coef_)
print(model_LogReg_Asli.intercept_)

[[-1.67569291 -0.10617017 -0.10195365  0.00758992  0.07160538  0.3162873
   0.45167333  0.18833172  0.12051594  0.02144249 -0.07073227 -0.84283209
  -0.40117396  0.1266349   0.00664237  0.06911359  0.10163978  0.02474984
   0.02679342  0.00597987 -1.71174395  0.31516826  0.28402635  0.02064008
   0.13442829  1.01351095  1.31145365  0.37298164  0.3579491   0.10533305]]
[-0.32782913]


In [37]:
y_pred = model_LogReg_Asli.predict(x_test)

In [38]:
print(classification_report(y_test, y_pred))

              precision    recall  f1-score   support

           0       1.00      0.94      0.97        35
           1       0.92      1.00      0.96        22

    accuracy                           0.96        57
   macro avg       0.96      0.97      0.96        57
weighted avg       0.97      0.96      0.97        57



In [39]:
pd.DataFrame(data = [accuracy_score(y_test, y_pred)*100, recall_score(y_test, y_pred)*100,
                    precision_score(y_test, y_pred)*100, roc_auc_score(y_test, y_pred)*100,
                    f1_score(y_test, y_pred)*100],
            index = ['accuracy', 'recall', 'precision', 'roc_auc_score', 'f1_score'],
            columns = ['Score (%)'])

Unnamed: 0,Score (%)
accuracy,96.491228
recall,100.0
precision,91.666667
roc_auc_score,97.142857
f1_score,95.652174


In [40]:
pd.DataFrame(data = [model_LogReg_Asli.score(x_train, y_train)*100,
                    model_LogReg_Asli.score(x_test, y_test)*100],
             index = ['Model Score in Data Train', 'Model Score in Data Test'],
             columns = ['Score (%)']
            )

Unnamed: 0,Score (%)
Model Score in Data Train,95.507812
Model Score in Data Test,96.491228


## Langkah-Lagkah Hyperparameter Tuning Menggunakan Python

### 1. Melihat Parameter Default Pada Model

In [41]:
# parameter yang dipakai di model asli
model_LogReg_Asli.get_params()

{'C': 1.0,
 'class_weight': None,
 'dual': False,
 'fit_intercept': True,
 'intercept_scaling': 1,
 'l1_ratio': None,
 'max_iter': 100,
 'multi_class': 'auto',
 'n_jobs': None,
 'penalty': 'l2',
 'random_state': None,
 'solver': 'lbfgs',
 'tol': 0.0001,
 'verbose': 0,
 'warm_start': False}

### 2. Memberikan Opsi Parameter yang Akan DIuji Coba

In [15]:
# parameter model linear regression yang akan dituned + value di setiap parameter
penalty = ['l1', 'l2', 'elasticnet', 'none']
solver = ['newton-cg', 'lbfgs', 'liblinier', 'sag', 'saga']
max_iter = [1, 10, 100, 1000, 10000]

# simpan dalam variabel dengan nama 'param'
param = {'penalty': penalty, 'solver': solver, 'max_iter': max_iter}
param

{'penalty': ['l1', 'l2', 'elasticnet', 'none'],
 'solver': ['newton-cg', 'lbfgs', 'liblinier', 'sag', 'saga'],
 'max_iter': [1, 10, 100, 1000, 10000]}

## Pilihan Hyperparameter Tuning:
- Randomized Search Cross Validation
- Grid Search Cross Validation

### Trial 1 : Randomized Search Cross Validation (RandomSearchCV)

In [44]:
# mencari parameter terbaik pada model: logistic regression
model_LR = LogisticRegression()
model_LR_RS = RandomizedSearchCV(estimator = model_LR, param_distributions = param, cv = 5)


In [45]:
model_LR_RS.fit(x_train, y_train)
model_LR_RS.best_params_

{'solver': 'newton-cg', 'penalty': 'none', 'max_iter': 100}

### Perbandingan Sebelum dan Sesudah Tuning
#### Sebelum Tuning :

In [46]:

y_pred_asli = model_LogReg_Asli.predict(x_test)

pd.DataFrame(data = [accuracy_score(y_test, y_pred_asli)*100, recall_score(y_test, y_pred_asli)*100,
                    precision_score(y_test, y_pred_asli)*100, roc_auc_score(y_test, y_pred_asli)*100,
                    f1_score(y_test, y_pred_asli)*100],
            index = ['accuracy', 'recall', 'precision', 'roc_auc_score', 'f1_score'],
            columns = ['Score (%)'])

Unnamed: 0,Score (%)
accuracy,96.491228
recall,100.0
precision,91.666667
roc_auc_score,97.142857
f1_score,95.652174


In [47]:
pd.DataFrame(data = [model_LogReg_Asli.score(x_train, y_train)*100,
                    model_LogReg_Asli.score(x_test, y_test)*100],
             index = ['Model Score in Data Train', 'Model Score in Data Test'],
             columns = ['Score (%)']
            )

Unnamed: 0,Score (%)
Model Score in Data Train,95.507812
Model Score in Data Test,96.491228


#### Setelah Tuning

In [48]:
model_LogReg_RS = LogisticRegression(solver='newton-cg', penalty = 'none', max_iter = 100)
model_LogReg_RS.fit(x_train, y_train)

LogisticRegression(penalty='none', solver='newton-cg')

In [49]:
y_pred_RS = model_LogReg_RS.predict(x_test)

pd.DataFrame(data = [accuracy_score(y_test, y_pred_RS)*100, recall_score(y_test, y_pred_RS)*100,
                    precision_score(y_test, y_pred_RS)*100, roc_auc_score(y_test, y_pred_RS)*100,
                    f1_score(y_test, y_pred_RS)*100],
            index = ['accuracy', 'recall', 'precision', 'roc_auc_score', 'f1_score'],
            columns = ['Score (%)'])

Unnamed: 0,Score (%)
accuracy,98.245614
recall,95.454545
precision,100.0
roc_auc_score,97.727273
f1_score,97.674419


In [51]:
pd.DataFrame(data = [model_LogReg_RS.score(x_train, y_train)*100,
                    model_LogReg_RS.score(x_test, y_test)*100],
             index = ['Model Score in Data Train', 'Model Score in Data Test'],
             columns = ['Score (%)']
            )

Unnamed: 0,Score (%)
Model Score in Data Train,98.632812
Model Score in Data Test,98.245614


## Trial 2 : Grid Search Cross Validation

In [52]:
model_LR2 = LogisticRegression()
model_LR_GS = GridSearchCV(model_LR2, param, cv = 5)

In [53]:
model_LR_GS.fit(x_train, y_train)x
model_LR_GS.best_params_

{'max_iter': 100, 'penalty': 'none', 'solver': 'newton-cg'}

#### Sebelum Tuning

In [26]:
y_pred_asli = model_LogReg_Asli.predict(x_test)

pd.DataFrame(data = [accuracy_score(y_test, y_pred_asli)*100, recall_score(y_test, y_pred_asli)*100,
                    precision_score(y_test, y_pred_asli)*100, roc_auc_score(y_test, y_pred_asli)*100,
                    f1_score(y_test, y_pred_asli)*100],
            index = ['accuracy', 'recall', 'precision', 'roc_auc_score', 'f1_score'],
            columns = ['Score (%)'])

Unnamed: 0,Score (%)
accuracy,96.491228
recall,100.0
precision,91.666667
roc_auc_score,97.142857
f1_score,95.652174


In [27]:
pd.DataFrame(data = [model_LogReg_Asli.score(x_train, y_train)*100,
                    model_LogReg_Asli.score(x_test, y_test)*100],
             index = ['Model Score in Data Train', 'Model Score in Data Test'],
             columns = ['Score (%)']
            )

Unnamed: 0,Score (%)
Model Score in Data Train,95.507812
Model Score in Data Test,96.491228


#### Setelah Tuning

In [28]:
model_LogReg_GS = LogisticRegression(solver='newton-cg', penalty = 'none', max_iter = 100)
model_LogReg_GS.fit(x_train, y_train)

LogisticRegression(penalty='none', solver='newton-cg')

In [29]:
y_pred_GS = model_LogReg_GS.predict(x_test)

pd.DataFrame(data = [accuracy_score(y_test, y_pred_GS)*100, recall_score(y_test, y_pred_GS)*100,
                    precision_score(y_test, y_pred_GS)*100, roc_auc_score(y_test, y_pred_GS)*100,
                    f1_score(y_test, y_pred_GS)*100],
            index = ['accuracy', 'recall', 'precision', 'roc_auc_score', 'f1_score'],
            columns = ['Score (%)'])

Unnamed: 0,Score (%)
accuracy,98.245614
recall,95.454545
precision,100.0
roc_auc_score,97.727273
f1_score,97.674419


In [30]:
pd.DataFrame(data = [model_LogReg_GS.score(x_train, y_train)*100,
                    model_LogReg_GS.score(x_test, y_test)*100],
             index = ['Model Score in Data Train', 'Model Score in Data Test'],
             columns = ['Score (%)']
            )

Unnamed: 0,Score (%)
Model Score in Data Train,98.632812
Model Score in Data Test,98.245614


## In-Class Exercise
1. Dataset yang dipakai adalah Heart Disease Dataset ('heart.csv').
2. Keterangan target: 1 = sakit (positif), 0 = sehat (negatif)
3. Jalankan Hyperparameter Tuning untuk Model Random Forest Classification!
4. Jelaskan performa model sebelum dan sesudah dilakukan tuning berdasarkan evaluation metric yang Anda pilih!

**Parameter Option:**
- 'max_depth': [10, 20, 30, 40, 50, 60, 70, 80, 90, 100],

- 'min_samples_leaf': [1, 2, 4],

- 'min_samples_split': [2, 5, 10],

- 'n_estimators': [200, 400, 600, 800, 1000, 1200, 1400, 1600, 1800, 2000]

In [56]:
df = pd.read_csv(r'C:\Users\HP.LAPTOP-5BTBEJFV\Documents\data science\SHIFTACADEMY\Bahan Ajar\SKSSK\Data\heart.csv')
df.head(3)

Unnamed: 0,age,sex,cp,trestbps,chol,fbs,restecg,thalach,exang,oldpeak,slope,ca,thal,target
0,63,1,3,145,233,1,0,150,0,2.3,0,0,1,1
1,37,1,2,130,250,0,1,187,0,3.5,0,0,2,1
2,41,0,1,130,204,0,0,172,0,1.4,2,0,2,1


In [57]:
x = df.drop(['target'], axis=1)
y = df['target']

In [58]:
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size = 0.2, random_state=0)


## Fitting Model


In [60]:
from sklearn.ensemble import RandomForestClassifier

## Hyper parameter

In [74]:
max_depth = [10,20,30,40,50,60,70,80,90,100]
min_samples_leaf = [1,2,4]
min_samples_split = [2,5,10]
n_estimators = [200,400,600,800,1000,1200,1400,1600,1800,2000]

param = {'max_depth': max_depth, 'min_samples_leaf': min_samples_leaf, 
         'min_samples_split': min_samples_split, 'n_estimators': n_estimators}
param

{'max_depth': [10, 20, 30, 40, 50, 60, 70, 80, 90, 100],
 'min_samples_leaf': [1, 2, 4],
 'min_samples_split': [2, 5, 10],
 'n_estimators': [200, 400, 600, 800, 1000, 1200, 1400, 1600, 1800, 2000]}

In [75]:
model_RF = RandomForestClassifier()
model_RF_RS = RandomizedSearchCV(estimator = model_RF, param_distributions= param, cv = 5)

In [76]:
model_RF_RS.fit(x_train, y_train)

RandomizedSearchCV(cv=5, estimator=RandomForestClassifier(),
                   param_distributions={'max_depth': [10, 20, 30, 40, 50, 60,
                                                      70, 80, 90, 100],
                                        'min_samples_leaf': [1, 2, 4],
                                        'min_samples_split': [2, 5, 10],
                                        'n_estimators': [200, 400, 600, 800,
                                                         1000, 1200, 1400, 1600,
                                                         1800, 2000]})

In [77]:
model_RF_RS.best_params_

{'n_estimators': 1600,
 'min_samples_split': 2,
 'min_samples_leaf': 4,
 'max_depth': 20}

In [78]:
model_RF_Baru = RandomForestClassifier(n_estimators=1600, 
                                       min_samples_split= 2,
                                       min_samples_leaf= 4,
                                       max_depth= 20)

model_RF_Baru.fit(x_train, y_train)
model_RF_Baru.score(x_test, y_test)

0.8524590163934426

In [79]:
model_asli = RandomForestClassifier()
model_asli.fit(x_train, y_train)
model_asli.score(x_test, y_test)

0.8032786885245902

In [85]:
y_pred_asli = model_asli.predict(x_test)

score_awal = pd.DataFrame(data = [accuracy_score(y_test, y_pred_asli)*100, recall_score(y_test, y_pred_asli)*100,
                    precision_score(y_test, y_pred_asli)*100, roc_auc_score(y_test, y_pred_asli)*100,
                    f1_score(y_test, y_pred_asli)*100],
            index = ['accuracy', 'recall', 'precision', 'roc_auc_score', 'f1_score'],
            columns = ['Score Awal(%)'])

In [86]:
y_pred_tuning = model_RF_Baru.predict(x_test)

score_tuning = pd.DataFrame(data = [accuracy_score(y_test, y_pred_tuning)*100, recall_score(y_test, y_pred_tuning)*100,
                    precision_score(y_test, y_pred_tuning)*100, roc_auc_score(y_test, y_pred_tuning)*100,
                    f1_score(y_test, y_pred_tuning)*100],
            index = ['accuracy', 'recall', 'precision', 'roc_auc_score', 'f1_score'],
            columns = ['Score Setelah Tuning(%)'])

In [87]:
pd.concat([score_awal, score_tuning], axis=1)

Unnamed: 0,Score Awal(%),Score Setelah Tuning(%)
accuracy,80.327869,85.245902
recall,82.352941,91.176471
precision,82.352941,83.783784
roc_auc_score,80.065359,84.477124
f1_score,82.352941,87.323944


## Reference:
- Tara Boyle, "Hyperparameter Tuning", https://towardsdatascience.com/hyperparameter-tuning-c5619e7e6624
- Pier Paolo Ippolito, "Hyperparameters Optimization", https://towardsdatascience.com/hyperparameters-optimization-526348bb8e2d
- Jiahao Weng, "Hyperparameter Tuning: A Practical Guide and Template", https://towardsdatascience.com/hyperparameter-tuning-a-practical-guide-and-template-b3bf0504f095
- Moto DEI, "Hyperparameter Tuning Explained — Tuning Phases, Tuning Methods, Bayesian Optimization, and Sample Code!", https://towardsdatascience.com/hyperparameter-tuning-explained-d0ebb2ba1d35