## Hyperparameter Tuning 
Hyperparameter in machine learning is a setting you choose before training your model, and it controls how the training process works.


In [None]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.metrics import roc_auc_score as ras
from sklearn.model_selection import GridSearchCV 
from sklearn.model_selection import RandomizedSearchCV
from xgboost import XGBClassifier
from skopt import BayesSearchCV 
from skopt.space import Real, Integer
from sklearn.ensemble import RandomForestClassifier

In [8]:
import warnings
warnings.filterwarnings('ignore')

In [9]:
df = pd.read_csv("onlinefraud.csv", nrows=1_000)
# df.info()
df.head()

Unnamed: 0,step,type,amount,nameOrig,oldbalanceOrg,newbalanceOrig,nameDest,oldbalanceDest,newbalanceDest,isFraud,isFlaggedFraud
0,1,PAYMENT,9839.64,C1231006815,170136.0,160296.36,M1979787155,0.0,0.0,0,0
1,1,PAYMENT,1864.28,C1666544295,21249.0,19384.72,M2044282225,0.0,0.0,0,0
2,1,TRANSFER,181.0,C1305486145,181.0,0.0,C553264065,0.0,0.0,1,0
3,1,CASH_OUT,181.0,C840083671,181.0,0.0,C38997010,21182.0,0.0,1,0
4,1,PAYMENT,11668.14,C2048537720,41554.0,29885.86,M1230701703,0.0,0.0,0,0


In [10]:
type_new = pd.get_dummies(df['type'], drop_first=True)
df_new = pd.concat([df, type_new], axis=1)
df_new.head()

Unnamed: 0,step,type,amount,nameOrig,oldbalanceOrg,newbalanceOrig,nameDest,oldbalanceDest,newbalanceDest,isFraud,isFlaggedFraud,CASH_OUT,DEBIT,PAYMENT,TRANSFER
0,1,PAYMENT,9839.64,C1231006815,170136.0,160296.36,M1979787155,0.0,0.0,0,0,False,False,True,False
1,1,PAYMENT,1864.28,C1666544295,21249.0,19384.72,M2044282225,0.0,0.0,0,0,False,False,True,False
2,1,TRANSFER,181.0,C1305486145,181.0,0.0,C553264065,0.0,0.0,1,0,False,False,False,True
3,1,CASH_OUT,181.0,C840083671,181.0,0.0,C38997010,21182.0,0.0,1,0,True,False,False,False
4,1,PAYMENT,11668.14,C2048537720,41554.0,29885.86,M1230701703,0.0,0.0,0,0,False,False,True,False


In [11]:
X = df_new.drop(['isFraud', 'type', 'nameOrig', 'nameDest', 'isFlaggedFraud', 'DEBIT', 'CASH_OUT', 'TRANSFER', 'PAYMENT'], axis=1)
y = df_new['isFraud']

In [12]:
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=42)

In [None]:
Xgb_model =  XGBClassifier()
Xgb_model.fit(X_train, y_train)
train_pred = Xgb_model.predict_proba(X_train)[:,1]
print("Training Accuracy", ras(y_train, train_pred))
y_pred = Xgb_model.predict_proba(X_test)[:,1]
print("Testing Accuracy", ras(y_test, y_pred))

XBG
Training Accuracy 1.0
Testing Accuracy 0.9823825503355704


### Manual Hyperparameter Tuning

In [14]:
model=RandomForestClassifier(n_estimators=300,criterion='entropy',
                             max_features='sqrt',min_samples_leaf=10,random_state=100).fit(X_train,y_train)
predictions=model.predict(X_test)
train_pred = Xgb_model.predict_proba(X_train)[:,1]
print("Training Accuracy", ras(y_train, train_pred))
y_pred = Xgb_model.predict_proba(X_test)[:,1]
print("Testing Accuracy", ras(y_test, y_pred))


Training Accuracy 1.0
Testing Accuracy 0.9823825503355704


## Grid SearchCV

GridSearchCV is a technique to search through the best parameter values from the given set of the grid of parameters. It is basically a cross-validation method. the model and the parameters are required to be fed in. Best parameter values are extracted and then the predictions are made.  

Here I am defining the param_grid to find the best paramater 

- C = regularizationn strenght,
- solver = solver algorithm,
- max_iter = maximum iterations,

In [15]:
param_grid = {
    'C': [0.01, 0.1, 1, 10, 100], 
    'solver': ['liblinear', 'lbfgs'], 
    'max_iter': [100, 200, 500]  
}
grid_search = GridSearchCV(XGBClassifier(), param_grid, cv=5, scoring='accuracy')
grid_search.fit(X_train, y_train)

best_params = grid_search.best_params_
best_score = grid_search.best_score_
best_params, best_score


({'C': 0.01, 'max_iter': 100, 'solver': 'liblinear'}, 0.9885714285714287)



We created the XGB model for all possible combinations, and after evaluating, we found the best hyperparameters:

- `C`: 0.01
- `max_iter`: 100
- `solver`: 'liblinear'

#### Pros:
- Simple and easy to implement

#### Cons:
- Not ideal for large datasets


## RandomSearchCV
Random search selects random combinations of hyperparameters from the predefined search space and evaluates their performance. Unlike grid search, it doesn’t test every possible combination.

##### Prons- 
- More efficient than grid search
- Works well with high-dimensional search spaces

##### Cons-
- No guarantee of finding the best solution
- Still requires many trials:

In [16]:
random_search = RandomizedSearchCV(
    Xgb_model,
    param_distributions=param_grid,
    n_iter=20,
    cv=5,
    scoring='roc_auc', 
    random_state=42
)
random_search.fit(X_train, y_train)
best_logistic_model = random_search.best_estimator_
train_pred = best_logistic_model.predict_proba(X_train)[:, 1]
train_auc = ras(y_train, train_pred)
print("Training ROC AUC:", train_auc)
y_pred = best_logistic_model.predict_proba(X_test)[:, 1]
test_auc = ras(y_test, y_pred)
print("Testing ROC AUC:", test_auc)
print("Best hyperparameters:", random_search.best_params_)



Training ROC AUC: 1.0
Testing ROC AUC: 0.9823825503355704
Best hyperparameters: {'solver': 'lbfgs', 'max_iter': 200, 'C': 100}


## Bayesian Optimization
Bayesian optimization is a probabilistic model-based optimization technique that uses a surrogate model  to predict the performance of the model for different hyperparameters.

##### Prons-
- Efficient exploration
- Incorporates prior knowledge

##### Cons
- Complex to implement
- Sensitive to choice of surrogate model

In [17]:
param_space = {
    'learning_rate': Real(0.01, 0.3, prior='uniform'),
    'n_estimators': Integer(50, 500),
    'max_depth': Integer(3, 12),
    'min_child_weight': Integer(1, 10),
    'subsample': Real(0.5, 1.0, prior='uniform'),
    'colsample_bytree': Real(0.5, 1.0, prior='uniform'),
    'gamma': Real(0, 5, prior='uniform'),
    'reg_alpha': Real(0, 2, prior='uniform'),
    'reg_lambda': Real(0, 2, prior='uniform')
}
opt = BayesSearchCV(
    estimator=Xgb_model,
    search_spaces=param_space,
    n_iter=50,
    cv=3,      
    scoring='roc_auc',  
    random_state=42
)
opt.fit(X_train, y_train)
print("Best Hyperparameters:", opt.best_params_)
train_pred = opt.predict_proba(X_train)[:,1]
print("Training Accuracy", ras(y_train, train_pred))
y_pred = opt.predict_proba(X_test)[:,1]
print("Testing Accuracy", ras(y_test, y_pred))


Best Hyperparameters: OrderedDict({'colsample_bytree': 0.5, 'gamma': 0.0, 'learning_rate': 0.3, 'max_depth': 11, 'min_child_weight': 1, 'n_estimators': 500, 'reg_alpha': 0.41184231284904066, 'reg_lambda': 0.0, 'subsample': 0.7592247633359859})
Training Accuracy 1.0
Testing Accuracy 0.9731543624161073


## Genetic Algorithms
Genetic algorithms are inspired by the process of natural selection and evolution. Hyperparameter sets are treated as chromosomes, and the algorithm iteratively evolves a population of candidates using operations like selection, crossover, and mutation.

### Pros
- Can explore complex search spaces.
- Capable of global optimization, avoiding local minima.
- Flexible and can be adapted to various types of optimization problems.

### Cons
- Computationally expensive, especially for large populations or many generations.
- Can converge prematurely to suboptimal solutions.



In [31]:
n_estimators = [int(x) for x in np.linspace(start=200, stop=2000, num=10)]
max_features = ['auto', 'sqrt', 'log2']  
max_depth = [int(x) for x in np.linspace(10, 1000, 10)]
min_samples_split = [2, 5, 10, 14]
min_samples_leaf = [1, 2, 4, 6, 8]

param_grid = {
    'n_estimators': n_estimators,
    'max_features': max_features,
    'max_depth': max_depth,
    'min_samples_split': min_samples_split,
    'min_samples_leaf': min_samples_leaf,
    'criterion': ['entropy', 'gini']
}

print("RandomizedSearchCV Parameter Grid:")
print(param_grid)

RandomizedSearchCV Parameter Grid:
{'n_estimators': [200, 400, 600, 800, 1000, 1200, 1400, 1600, 1800, 2000], 'max_features': ['auto', 'sqrt', 'log2'], 'max_depth': [10, 120, 230, 340, 450, 560, 670, 780, 890, 1000], 'min_samples_split': [2, 5, 10, 14], 'min_samples_leaf': [1, 2, 4, 6, 8], 'criterion': ['entropy', 'gini']}


In [32]:
X_train_selected = X_train[['step', 'amount', 'oldbalanceOrg', 'newbalanceOrig', 'oldbalanceDest', 'newbalanceDest']]
X_test_selected = X_test[['step', 'amount', 'oldbalanceOrg', 'newbalanceOrig', 'oldbalanceDest', 'newbalanceDest']]

Xgb_model = XGBClassifier()
random_search = RandomizedSearchCV(estimator=Xgb_model, param_distributions=param_grid, n_iter=10, cv=3, verbose=2, random_state=42)
random_search.fit(X_train_selected, y_train)
print("Best Parameters:", random_search.best_params_)
train_pred = random_search.predict_proba(X_train_selected)[:, 1]
train_auc = ras(y_train, train_pred)
print("Training AUC:", train_auc)
y_pred = random_search.predict_proba(X_test_selected)[:, 1]
test_auc = ras(y_test, y_pred)
print("Testing AUC:", test_auc)

Fitting 3 folds for each of 10 candidates, totalling 30 fits
[CV] END criterion=gini, max_depth=230, max_features=auto, min_samples_leaf=2, min_samples_split=14, n_estimators=200; total time=   0.1s
[CV] END criterion=gini, max_depth=230, max_features=auto, min_samples_leaf=2, min_samples_split=14, n_estimators=200; total time=   0.0s
[CV] END criterion=gini, max_depth=230, max_features=auto, min_samples_leaf=2, min_samples_split=14, n_estimators=200; total time=   0.0s
[CV] END criterion=entropy, max_depth=120, max_features=sqrt, min_samples_leaf=2, min_samples_split=10, n_estimators=200; total time=   0.1s
[CV] END criterion=entropy, max_depth=120, max_features=sqrt, min_samples_leaf=2, min_samples_split=10, n_estimators=200; total time=   0.0s
[CV] END criterion=entropy, max_depth=120, max_features=sqrt, min_samples_leaf=2, min_samples_split=10, n_estimators=200; total time=   0.0s
[CV] END criterion=entropy, max_depth=890, max_features=log2, min_samples_leaf=8, min_samples_split=14

In [33]:
from sklearn.metrics import accuracy_score
train_pred = random_search.predict(X_train)
train_accuracy = accuracy_score(y_train, train_pred)
print("Training Accuracy:", train_accuracy)

y_pred = random_search.predict(X_test)
test_accuracy = accuracy_score(y_test, y_pred)
print("Testing Accuracy:", test_accuracy)


Training Accuracy: 1.0
Testing Accuracy: 0.9933333333333333
