# Hyperparameter Tuning — Random Forest

This notebook performs baseline training, hyperparameter tuning using RandomizedSearchCV, and final evaluation for a RandomForestClassifier on the credit card fraud detection dataset. All cells are self-contained and `random_state=42` is used everywhere.

## 1) Imports

In [6]:
import os
import time
import numpy as np
import pandas as pd
import joblib
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import RandomizedSearchCV
from sklearn.metrics import (recall_score, precision_score, f1_score, roc_auc_score, average_precision_score)
import warnings
warnings.filterwarnings('ignore')

## 2) Load data and print shapes

In [7]:
# Paths (notebook lives in `notebooks/`)
data_dir = os.path.join('..', 'data', 'processed')
X_train = np.load(os.path.join(data_dir, 'X_train_transformed.npy'))
X_val = np.load(os.path.join(data_dir, 'X_val_transformed.npy'))
X_test = np.load(os.path.join(data_dir, 'X_test_transformed.npy'))
y_train = np.load(os.path.join(data_dir, 'y_train.npy'))
y_val = np.load(os.path.join(data_dir, 'y_val.npy'))
y_test = np.load(os.path.join(data_dir, 'y_test.npy'))
print('X_train', X_train.shape)
print('y_train', y_train.shape)
print('X_val', X_val.shape)
print('y_val', y_val.shape)
print('X_test', X_test.shape)
print('y_test', y_test.shape)

X_train (170883, 30)
y_train (170883,)
X_val (28481, 30)
y_val (28481,)
X_test (85443, 30)
y_test (85443,)


## 3) Baseline Random Forest (n_estimators=100) — Evaluate on VALIDATION set

In [8]:
# Helper to compute metrics
def compute_metrics(y_true, y_pred, y_proba):
    rec = recall_score(y_true, y_pred)
    prec = precision_score(y_true, y_pred)
    f1 = f1_score(y_true, y_pred)
    roc = roc_auc_score(y_true, y_proba[:, 1])
    pr = average_precision_score(y_true, y_proba[:, 1])
    return {
        'recall': rec,
        'precision': prec,
        'f1': f1,
        'roc_auc': roc,
        'pr_auc': pr
    }

# Baseline model
baseline_clf = RandomForestClassifier(n_estimators=100, class_weight='balanced', random_state=42, n_jobs=-1)
baseline_clf.fit(X_train, y_train)
y_val_pred_baseline = baseline_clf.predict(X_val)
y_val_proba_baseline = baseline_clf.predict_proba(X_val)
metrics_baseline_val = compute_metrics(y_val, y_val_pred_baseline, y_val_proba_baseline)
print('Baseline (validation) metrics:')
print(metrics_baseline_val)

Baseline (validation) metrics:
{'recall': 0.7959183673469388, 'precision': 0.975, 'f1': 0.8764044943820225, 'roc_auc': np.float64(0.9580944293868363), 'pr_auc': np.float64(0.8678746782602694)}


## 4) Hyperparameter tuning with RandomizedSearchCV
- `RandomizedSearchCV` is used because it's much faster than grid search for large search spaces.
- `cv=3`, `n_iter=15`, `scoring='f1'`, `random_state=42`.

In [9]:
# ============================================================
# FAST & STABLE RANDOMIZED SEARCH (WINDOWS-SAFE)
# ============================================================

import time
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import RandomizedSearchCV, StratifiedKFold

print("\n" + "="*70)
print("RANDOM FOREST HYPERPARAMETER TUNING (REDUCED & STABLE)")
print("="*70)

# Define reduced hyperparameter search space
param_distributions = {
    'n_estimators': [100, 200, 300],
    'max_depth': [None, 15, 30],
    'min_samples_split': [2, 5],
    'min_samples_leaf': [1, 2],
    'max_features': ['sqrt', 'log2']
}

# Base Random Forest model
base_rf = RandomForestClassifier(
    class_weight='balanced',
    random_state=42
)

# Stratified CV (important for imbalance)
cv_strategy = StratifiedKFold(
    n_splits=3,
    shuffle=True,
    random_state=42
)

# Randomized Search
rsearch = RandomizedSearchCV(
    estimator=base_rf,
    param_distributions=param_distributions,
    n_iter=10,              # reduced for speed
    scoring='f1',           # correct metric for fraud detection
    cv=cv_strategy,
    random_state=42,
    n_jobs=1,               # IMPORTANT: Windows stability
    verbose=2
)

# Run tuning
start_time = time.time()
rsearch.fit(X_train, y_train)
end_time = time.time()

# Best model
tuned_model = rsearch.best_estimator_

print("\n✓ RandomizedSearchCV completed successfully")
print(f"✓ Runtime: {end_time - start_time:.1f} seconds")

print("\nBEST HYPERPARAMETERS:")
for k, v in rsearch.best_params_.items():
    print(f"  {k:20s}: {v}")

print(f"\nBest CV F1-Score: {rsearch.best_score_:.4f}")



RANDOM FOREST HYPERPARAMETER TUNING (REDUCED & STABLE)
Fitting 3 folds for each of 10 candidates, totalling 30 fits
[CV] END max_depth=None, max_features=sqrt, min_samples_leaf=1, min_samples_split=5, n_estimators=200; total time= 1.9min
[CV] END max_depth=None, max_features=sqrt, min_samples_leaf=1, min_samples_split=5, n_estimators=200; total time= 2.0min
[CV] END max_depth=None, max_features=sqrt, min_samples_leaf=1, min_samples_split=5, n_estimators=200; total time= 2.1min
[CV] END max_depth=30, max_features=log2, min_samples_leaf=1, min_samples_split=2, n_estimators=300; total time= 2.4min
[CV] END max_depth=30, max_features=log2, min_samples_leaf=1, min_samples_split=2, n_estimators=300; total time= 4.7min
[CV] END max_depth=30, max_features=log2, min_samples_leaf=1, min_samples_split=2, n_estimators=300; total time= 2.7min
[CV] END max_depth=None, max_features=log2, min_samples_leaf=2, min_samples_split=2, n_estimators=100; total time=  47.7s
[CV] END max_depth=None, max_featur

## 5) Compare Baseline vs Tuned (VALIDATION metrics)

In [10]:
# Evaluate tuned model on validation set
y_val_pred_tuned = tuned_model.predict(X_val)
y_val_proba_tuned = tuned_model.predict_proba(X_val)
metrics_tuned_val = compute_metrics(y_val, y_val_pred_tuned, y_val_proba_tuned)

comparison_df = pd.DataFrame([
    dict(model='baseline', **metrics_baseline_val),
    dict(model='tuned', **metrics_tuned_val)
])
comparison_df = comparison_df.set_index('model')
comparison_df

Unnamed: 0_level_0,recall,precision,f1,roc_auc,pr_auc
model,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
baseline,0.795918,0.975,0.876404,0.958094,0.867875
tuned,0.77551,0.926829,0.844444,0.988114,0.872869


In [11]:
# Ensure results dir exists and save comparison
results_dir = os.path.join('..', 'results')
metrics_dir = os.path.join(results_dir, 'metrics')
os.makedirs(metrics_dir, exist_ok=True)
comparison_csv_path = os.path.join(metrics_dir, 'rf_tuning_comparison.csv')
comparison_df.to_csv(comparison_csv_path)
print('Saved validation comparison to', comparison_csv_path)

Saved validation comparison to ..\results\metrics\rf_tuning_comparison.csv


## 6) Final evaluation on TEST set (only once)

In [12]:
# Evaluate tuned model on test set
y_test_pred = tuned_model.predict(X_test)
y_test_proba = tuned_model.predict_proba(X_test)
metrics_tuned_test = compute_metrics(y_test, y_test_pred, y_test_proba)
print('Tuned model (test) metrics:')
print(metrics_tuned_test)

Tuned model (test) metrics:
{'recall': 0.7297297297297297, 'precision': 0.9391304347826087, 'f1': 0.8212927756653993, 'roc_auc': np.float64(0.9401630351261043), 'pr_auc': np.float64(0.8202488637943746)}


## 7) Save tuned model

In [13]:
# Save tuned model artifact
os.makedirs(results_dir, exist_ok=True)
model_path = os.path.join(results_dir, 'random_forest_tuned_model.pkl')
joblib.dump(tuned_model, model_path)
print('Saved tuned model to', model_path)
# Save test metrics alongside if desired
test_metrics_path = os.path.join(metrics_dir, 'rf_tuned_test_metrics.csv')
pd.DataFrame([dict(model='tuned_test', **metrics_tuned_test)]).set_index('model').to_csv(test_metrics_path)
print('Saved test metrics to', test_metrics_path)

Saved tuned model to ..\results\random_forest_tuned_model.pkl
Saved test metrics to ..\results\metrics\rf_tuned_test_metrics.csv


## 8) Explanations

**Why RandomizedSearchCV:** RandomizedSearchCV explores a wide hyperparameter space using a fixed number of iterations (`n_iter=15`) and is far faster than GridSearchCV for large search spaces while still finding strong configurations.

**Why F1 / PR-AUC:** For highly imbalanced classification, F1 balances precision and recall and focuses on the minority (fraud) class. PR-AUC (average precision) better reflects performance on imbalanced data than ROC-AUC because it emphasizes precision for the positive class.

**How data leakage is avoided:** All preprocessing was applied earlier and saved as transformed numpy arrays. This notebook only loads the preprocessed `X_train`, `X_val`, `X_test` and uses `X_train` for fitting/tuning and `X_val` for validation — the test set is used only once at the end. No information from `X_val` or `X_test` is used during fitting or hyperparameter selection beyond intended validation/evaluation steps.