# Tutorial Tuning Hyperparams and Feature Engineering for Classification Model

## 1.0 Function Objective

### Apply tunning hyperparams and feature engineering in the classification model

## 1.1 References

[cross_val_score](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.cross_val_score.html)


[KFold](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.KFold.html?highlight=kfold#sklearn.model_selection.KFold)


[optuna](https://optuna.readthedocs.io/en/v0.19.0/)


[RandomSampler](https://optuna.readthedocs.io/en/stable/reference/generated/optuna.samplers.RandomSampler.html)


[make_scorer](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.make_scorer.html)


## 2.0 Library Import

In [None]:
!pip install mlutils

In [6]:
from mlutils.tuning_hyperparams import uning_hyperparams

## 3.0 Treating the Data

In [7]:
import pandas as pd

In [8]:
colnames = ['preg', 'plas', 'pres', 'skin', 'test', 'mass', 'pedi', 'age', 'class']

df_diabetes = pd.read_csv('https://raw.githubusercontent.com/jbrownlee/Datasets/master/pima-indians-diabetes.csv', names=colnames)
df_diabetes.head()

Unnamed: 0,preg,plas,pres,skin,test,mass,pedi,age,class
0,6,148,72,35,0,33.6,0.627,50,1
1,1,85,66,29,0,26.6,0.351,31,0
2,8,183,64,0,0,23.3,0.672,32,1
3,1,89,66,23,94,28.1,0.167,21,0
4,0,137,40,35,168,43.1,2.288,33,1


## 4.0 Import Feature Engineering

In [10]:
from mlutils.feature_engineering import feature_engineering

## 4.1 Applying Feature Selection Filter CHI2

In [11]:
feature_selection_filter(df_diabetes, 'class', num_feats= 4)

['preg', 'plas', 'pedi', 'age']

## 4.2 Applying Feature Wrapper RFE

In [12]:
feature_selection_wrapper(df_diabetes, 'class',num_feats = 4 , step = 10)

['preg', 'plas', 'mass', 'pedi']

## 4.3 Applying Feature Embedded LGBMClassifier

In [13]:
feature_selection_embedded(df_diabetes, 'class', num_feats=4 , n_estimators = 50)

['plas', 'mass', 'pedi', 'age']

## 5.0 Applying Tuning Hyperparams

## 5.1 Import Algorithm and Metrics for Classification Model

In [14]:
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score

## 5.2 Selecting Colums and Params RF

In [15]:
df = df_diabetes[['plas', 'mass', 'pedi', 'age','class']]

In [16]:
df.head()

Unnamed: 0,plas,mass,pedi,age,class
0,148,33.6,0.627,50,1
1,85,26.6,0.351,31,0
2,183,23.3,0.672,32,1
3,89,28.1,0.167,21,0
4,137,43.1,2.288,33,1


In [17]:
param_RF = [
        {"name": "min_samples_leaf", "type": "Integer", "low": 50, "high": 75},
        {"name": "max_depth", "type": "Integer", "low": 12, "high": 24},
    ]

## 5.3 Calling Tuning Hyperparams Function for Classificarion Model

In [18]:
tuning_hyperparams(
        df=df,
        target='class',
        parameters=param_RF,
        algorithm=RandomForestClassifier,
        metric=accuracy_score,
        scoring_option="maximize",
        n_trials=20,
    )

[32m[I 2021-11-30 17:17:04,566][0m A new study created in memory with name: no-name-17547d6e-bbeb-4b5d-a99d-74a1c7e319b2[0m
[32m[I 2021-11-30 17:17:05,338][0m Trial 0 finished with value: 0.7694634313055365 and parameters: {'min_samples_leaf': 59, 'max_depth': 24}. Best is trial 0 with value: 0.7694634313055365.[0m
[32m[I 2021-11-30 17:17:06,076][0m Trial 1 finished with value: 0.7681476418318524 and parameters: {'min_samples_leaf': 69, 'max_depth': 19}. Best is trial 0 with value: 0.7694634313055365.[0m
[32m[I 2021-11-30 17:17:06,848][0m Trial 2 finished with value: 0.7616370471633629 and parameters: {'min_samples_leaf': 54, 'max_depth': 14}. Best is trial 0 with value: 0.7694634313055365.[0m
[32m[I 2021-11-30 17:17:07,619][0m Trial 3 finished with value: 0.76161995898838 and parameters: {'min_samples_leaf': 51, 'max_depth': 23}. Best is trial 0 with value: 0.7694634313055365.[0m
[32m[I 2021-11-30 17:17:08,360][0m Trial 4 finished with value: 0.7655331510594668 and pa

{'min_samples_leaf': 61, 'max_depth': 22}