# 🔧 ML Hyperparameter Tuning - Fraud Detection
This notebook uses `GridSearchCV` and `RandomizedSearchCV` to tune the hyperparameters of Logistic Regression, Random Forest, XGBoost, and LightGBM models.

## 📥 Step 1: Load and Prepare Data

In [2]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

# Load enhanced dataset
df = pd.read_csv("enhanced_online_fraud_dataset.csv")

# Features and label
X = df.drop(columns=['isFraud', 'nameOrig', 'nameDest', 'step'])
y = df['isFraud']

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(X, y, stratify=y, test_size=0.2, random_state=42)

# Scale
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

## 🧪 Step 2: Hyperparameter Tuning

In [3]:
from sklearn.model_selection import GridSearchCV, RandomizedSearchCV
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier
from xgboost import XGBClassifier
from lightgbm import LGBMClassifier
from sklearn.metrics import roc_auc_score

# Logistic Regression
print("🔍 Tuning Logistic Regression...")
lr_params = {'C': [0.01, 0.1, 1, 10], 'penalty': ['l2'], 'solver': ['liblinear']}
lr = GridSearchCV(LogisticRegression(class_weight='balanced', max_iter=1000), lr_params, scoring='roc_auc', cv=3)
lr.fit(X_train_scaled, y_train)
print("Best LR params:", lr.best_params_)
print("ROC AUC:", roc_auc_score(y_test, lr.predict_proba(X_test_scaled)[:, 1]))

🔍 Tuning Logistic Regression...
Best LR params: {'C': 10, 'penalty': 'l2', 'solver': 'liblinear'}
ROC AUC: 0.9955687196950602


In [6]:
# Random Forest
print("🔍 Tuning Random Forest...")
rf_params = {
    'n_estimators': [100, 200],
    'max_depth': [None, 10, 20],
    'min_samples_split': [2, 5],
    'min_samples_leaf': [1, 2]
}
rf = RandomizedSearchCV(RandomForestClassifier(class_weight='balanced', random_state=42),
                        rf_params, n_iter=5, scoring='roc_auc', cv=3, random_state=42)
rf.fit(X_train, y_train)
print("Best RF params:", rf.best_params_)
print("ROC AUC:", roc_auc_score(y_test, rf.predict_proba(X_test)[:, 1]))

🔍 Tuning Random Forest...


11 fits failed out of a total of 15.
The score on these train-test partitions for these parameters will be set to nan.
If these failures are not expected, you can try to debug them by setting error_score='raise'.

Below are more details about the failures:
--------------------------------------------------------------------------------
1 fits failed with the following error:
Traceback (most recent call last):
  File "c:\Users\israila.dare\AppData\Local\anaconda3\envs\diabetes\lib\site-packages\sklearn\model_selection\_validation.py", line 729, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "c:\Users\israila.dare\AppData\Local\anaconda3\envs\diabetes\lib\site-packages\sklearn\base.py", line 1152, in wrapper
    return fit_method(estimator, *args, **kwargs)
  File "c:\Users\israila.dare\AppData\Local\anaconda3\envs\diabetes\lib\site-packages\sklearn\ensemble\_forest.py", line 456, in fit
    trees = Parallel(
  File "c:\Users\israila.dare\AppData\Local\anacond

Best RF params: {'n_estimators': 100, 'min_samples_split': 2, 'min_samples_leaf': 1, 'max_depth': 10}
ROC AUC: 0.9997082704834712


In [None]:
# XGBoost
print("🔍 Tuning XGBoost...")
xgb_params = {
    'n_estimators': [100, 200],
    'max_depth': [3, 6, 10],
    'learning_rate': [0.01, 0.1],
    'subsample': [0.8, 1.0]
}
xgb = RandomizedSearchCV(XGBClassifier(use_label_encoder=False, eval_metric='logloss'),
                         xgb_params, n_iter=5, scoring='roc_auc', cv=3, random_state=42)
xgb.fit(X_train, y_train)
print("Best XGB params:", xgb.best_params_)
print("ROC AUC:", roc_auc_score(y_test, xgb.predict_proba(X_test)[:, 1]))

In [None]:
# LightGBM
print("🔍 Tuning LightGBM...")
lgbm_params = {
    'n_estimators': [100, 200],
    'num_leaves': [31, 50],
    'learning_rate': [0.01, 0.1],
    'boosting_type': ['gbdt']
}
lgbm = RandomizedSearchCV(LGBMClassifier(class_weight='balanced'),
                          lgbm_params, n_iter=5, scoring='roc_auc', cv=3, random_state=42)
lgbm.fit(X_train, y_train)
print("Best LGBM params:", lgbm.best_params_)
print("ROC AUC:", roc_auc_score(y_test, lgbm.predict_proba(X_test)[:, 1]))