# **All Techniques of Hyper Parameter Optimization**
- GridSearchCV
- RandomizedSearchCV
- Bayesian Optimization -Automate Hyperparameter Tuning (Hyperopt)
- Sequential Model Based Optimization (Tuning a scikit-learn estimator with skopt)
- Optuna- Automate Hyperparameter Tuning
- Genetic Algorithms (TPOT Classifier)

## Why do we require hyper parameter tuning?
Hyperparameters directly control model structure, function, and performance. Hyperparameter tuning allows data scientists to tweak model performance for optimal results. This process is an essential part of machine learning, and choosing appropriate hyperparameter values is crucial for success.

In [14]:
import warnings
warnings.filterwarnings('ignore')

In [15]:
import pandas as pd
df=pd.read_csv("/kaggle/input/pima-indians-diabetes-database/diabetes.csv")
df.head()

Unnamed: 0,Pregnancies,Glucose,BloodPressure,SkinThickness,Insulin,BMI,DiabetesPedigreeFunction,Age,Outcome
0,6,148,72,35,0,33.6,0.627,50,1
1,1,85,66,29,0,26.6,0.351,31,0
2,8,183,64,0,0,23.3,0.672,32,1
3,1,89,66,23,94,28.1,0.167,21,0
4,0,137,40,35,168,43.1,2.288,33,1


## Doing the below strp to remove any '0' values in 'Glucose' and similarly for 'Insulin'

In [16]:
import numpy as np
df['Glucose']=np.where(df['Glucose']==0, df['Glucose'].median(), df['Glucose'])
df['Insulin']=np.where(df['Insulin']==0, df['Insulin'].median(), df['Insulin'])
df['SkinThickness']=np.where(df['SkinThickness']==0, df['SkinThickness'].median(), df['SkinThickness'])
df.head()

Unnamed: 0,Pregnancies,Glucose,BloodPressure,SkinThickness,Insulin,BMI,DiabetesPedigreeFunction,Age,Outcome
0,6,148.0,72,35.0,30.5,33.6,0.627,50,1
1,1,85.0,66,29.0,30.5,26.6,0.351,31,0
2,8,183.0,64,23.0,30.5,23.3,0.672,32,1
3,1,89.0,66,23.0,94.0,28.1,0.167,21,0
4,0,137.0,40,35.0,168.0,43.1,2.288,33,1


## Do we require feature scaling if we are using RandomForest?
### No, because RandomForest works on DecissionTree (makes branches).

In [17]:
X = df.drop('Outcome', axis=1)
y=df['Outcome']

In [18]:
X.head()

Unnamed: 0,Pregnancies,Glucose,BloodPressure,SkinThickness,Insulin,BMI,DiabetesPedigreeFunction,Age
0,6,148.0,72,35.0,30.5,33.6,0.627,50
1,1,85.0,66,29.0,30.5,26.6,0.351,31
2,8,183.0,64,23.0,30.5,23.3,0.672,32
3,1,89.0,66,23.0,94.0,28.1,0.167,21
4,0,137.0,40,35.0,168.0,43.1,2.288,33


In [19]:
y.head()

0    1
1    0
2    1
3    0
4    1
Name: Outcome, dtype: int64

# Train Test Split

In [28]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, train_size = 0.20, random_state=33)

In [29]:
from sklearn.ensemble import RandomForestClassifier
rf = RandomForestClassifier(n_estimators=10).fit(X_train, y_train)
pred = rf.predict(X_test)

In [30]:
y.value_counts()

0    500
1    268
Name: Outcome, dtype: int64

In [32]:
from sklearn.metrics import confusion_matrix, classification_report, accuracy_score
print(confusion_matrix(y_test, pred))
print(accuracy_score(y_test, pred))
print(classification_report(y_test, pred))

[[358  47]
 [116  94]]
0.734959349593496
              precision    recall  f1-score   support

           0       0.76      0.88      0.81       405
           1       0.67      0.45      0.54       210

    accuracy                           0.73       615
   macro avg       0.71      0.67      0.68       615
weighted avg       0.73      0.73      0.72       615

