## CatBoost Hyperparameter Tuning and Evaluation

1. **Data Split**  
   - The cleaned dataset (`creditcard_isoforest_cleaned_001.csv`) was split into training (80 %) and test (20 %) sets with stratification to preserve the fraud/non-fraud ratio.

2. **Class Weight Calculation**  
   - Computed `scale = (number of legit) / (number of fraud)` on the training set.  
   - This ratio is used in the search grid as `scale_pos_weight` to tell CatBoost how much more important fraud examples are.

3. **Hyperparameter Grid**  
   A randomized search was run over:  
   - `iterations`: [300, 500, 700] — number of boosting rounds.  
   - `depth`: [4, 6, 8] — maximum tree depth for capturing interactions.  
   - `learning_rate`: [0.01, 0.05, 0.1, 0.2] — step size shrinkage.  
   - `l2_leaf_reg`: [1, 3, 5, 7] — L2 regularization coefficient on leaf weights.  
   - `border_count`: [32, 64, 128] — number of splits for numerical features (controls binning granularity).  
   - `scale_pos_weight`: [scale, scale × 0.5, scale × 2] — balances the positive (fraud) class.  
   - `subsample`: [0.6, 0.8, 1.0] — fraction of samples per tree to add randomness.

4. **Search Strategy**  
   - Used `RandomizedSearchCV` with **30 iterations** and **3-fold cross-validation**.  
   - Optimized for **F1-score of the fraud class** via a custom scorer (`pos_label=1`).  

In [None]:
import os
import pandas as pd
import joblib
from sklearn.model_selection import train_test_split, RandomizedSearchCV
from sklearn.metrics import classification_report, confusion_matrix, make_scorer, f1_score
from catboost import CatBoostClassifier

# Load the cleaned dataset (without SMOTE)
df = pd.read_csv("data/creditcard_isoforest_cleaned_001.csv")

# Features and target
X = df.drop("Class", axis=1)
y = df["Class"]

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, stratify=y, random_state=42
)

# Calculate class weight (approximate scale_pos_weight)
neg, pos = (y_train == 0).sum(), (y_train == 1).sum()
scale = neg / pos

# Define hyperparameter grid
param_grid = {
    'iterations': [300, 500, 700],
    'depth': [4, 6, 8],
    'learning_rate': [0.01, 0.05, 0.1, 0.2],
    'l2_leaf_reg': [1, 3, 5, 7],
    'border_count': [32, 64, 128],
    'scale_pos_weight': [scale, scale * 0.5, scale * 2],
    'subsample': [0.6, 0.8, 1.0],
}

# Create CatBoost model
cat = CatBoostClassifier(
    verbose=0,
    random_seed=42,
    task_type='CPU',
    eval_metric='Logloss',
    loss_function='Logloss'
)

# Custom scorer to focus on fraud class
scorer = make_scorer(f1_score, pos_label=1)

# Run randomized search
search = RandomizedSearchCV(
    estimator=cat,
    param_distributions=param_grid,
    scoring=scorer,
    n_iter=30,
    cv=3,
    verbose=2,
    random_state=42,
    n_jobs=-1
)

search.fit(X_train, y_train)
print("Best params:", search.best_params_)
print("Best score:", search.best_score_)

# Evaluate best model
best_model = search.best_estimator_
y_prob = best_model.predict_proba(X_test)[:, 1]
threshold = 0.7
y_pred = (y_prob >= threshold).astype(int)

print("Confusion Matrix:\n", confusion_matrix(y_test, y_pred))
print("Classification Report:\n", classification_report(y_test, y_pred, digits=4, target_names=["Legit", "Fraud"]))

# Save model with auto-incrementing name
model_dir = "models"
os.makedirs(model_dir, exist_ok=True)
base_filename = "catboost_tuned_eval"
ext = ".pkl"
i = 0
while True:
    filename = f"{base_filename}{'' if i == 0 else f'_{i:02d}'}{ext}"
    filepath = os.path.join(model_dir, filename)
    if not os.path.exists(filepath):
        break
    i += 1

joblib.dump(best_model, filepath)
print(f"Model saved to {filepath}")

Fitting 3 folds for each of 30 candidates, totalling 90 fits
[CV] END border_count=128, depth=8, iterations=300, l2_leaf_reg=3, learning_rate=0.05, scale_pos_weight=1327.9296187683285, subsample=0.6; total time=  20.9s
[CV] END border_count=128, depth=8, iterations=300, l2_leaf_reg=3, learning_rate=0.05, scale_pos_weight=1327.9296187683285, subsample=0.6; total time=  21.5s
[CV] END border_count=128, depth=8, iterations=300, l2_leaf_reg=3, learning_rate=0.05, scale_pos_weight=1327.9296187683285, subsample=0.6; total time=  21.6s
[CV] END border_count=128, depth=6, iterations=500, l2_leaf_reg=1, learning_rate=0.01, scale_pos_weight=1327.9296187683285, subsample=0.6; total time=  29.8s
[CV] END border_count=128, depth=6, iterations=500, l2_leaf_reg=1, learning_rate=0.01, scale_pos_weight=1327.9296187683285, subsample=0.6; total time=  29.9s
[CV] END border_count=128, depth=6, iterations=500, l2_leaf_reg=1, learning_rate=0.01, scale_pos_weight=1327.9296187683285, subsample=0.6; total time

## LightGBM

- Switched from **CatBoost** to **LightGBM** as the main classifier.
- Used a different set of hyperparameters in the randomized search, specific to LightGBM:
  - Included `num_leaves`, `colsample_bytree`, and `max_depth`.
  - Removed CatBoost-specific parameters like `border_count`.
- Still used class imbalance ratio `scale_pos_weight` to guide training.
- Preserved the same threshold adjustment (0.7) and F1-score optimization for the fraud class.
- Final model was saved under a new name: `lightgbm_tuned_eval.pkl`.

In [None]:
import lightgbm as lgb

print(f"scale_pos_weight = {scale:.2f}")

# Define parameter grid
param_grid = {
    'n_estimators': [100, 200, 300, 500],
    'max_depth': [-1, 4, 6, 8],
    'learning_rate': [0.01, 0.05, 0.1, 0.2],
    'num_leaves': [15, 31, 63],
    'subsample': [0.6, 0.8, 1.0],
    'colsample_bytree': [0.6, 0.8, 1.0],
    'scale_pos_weight': [scale, scale * 0.5, scale * 2],
}

# Create LightGBM model
lgbm = lgb.LGBMClassifier(
    objective='binary',
    random_state=42,
    n_jobs=-1,
    class_weight=None
)

# Custom scorer for fraud class
scorer = make_scorer(f1_score, pos_label=1)

# 8. RandomizedSearchCV
search = RandomizedSearchCV(
    estimator=lgbm,
    param_distributions=param_grid,
    n_iter=30,
    scoring=scorer,
    cv=3,
    verbose=2,
    random_state=42,
    n_jobs=-1
)

# Train
search.fit(X_train, y_train)
print("Best params:", search.best_params_)
print("Best score:", search.best_score_)

# Predict with threshold
best_model = search.best_estimator_
y_prob = best_model.predict_proba(X_test)[:, 1]
threshold = 0.7
y_pred = (y_prob >= threshold).astype(int)

# Evaluation
print("Confusion Matrix:\n", confusion_matrix(y_test, y_pred))
print("Classification Report:\n", classification_report(y_test, y_pred, digits=4, target_names=["Legit", "Fraud"]))

# Save model
model_dir = "models"
os.makedirs(model_dir, exist_ok=True)
base_filename = "lightgbm_tuned_eval"
ext = ".pkl"
i = 0
while True:
    filename = f"{base_filename}{'' if i == 0 else f'_{i:02d}'}{ext}"
    filepath = os.path.join(model_dir, filename)
    if not os.path.exists(filepath):
        break
    i += 1

joblib.dump(best_model, filepath)
print(f"Model saved to {filepath}")

scale_pos_weight = 663.96
Fitting 3 folds for each of 30 candidates, totalling 90 fits
[LightGBM] [Info] Number of positive: 227, number of negative: 150941
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.043233 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 7650
[LightGBM] [Info] Number of data points in the train set: 151168, number of used features: 30
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.001502 -> initscore=-6.499694
[LightGBM] [Info] Start training from score -6.499694
[LightGBM] [Info] Number of positive: 228, number of negative: 150941
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.017419 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 7650
[LightGBM] [Info] Number of data points in the train set: 151169, number of used features: 30
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.001508 ->

## Conclusions CatBoost & LightGBM Models

1. **General Performance**
   - Both CatBoost and LightGBM models achieved **very high accuracy (~99.96–99.98%)**, **excellent ROC AUC (>0.997)**, and strong fraud detection metrics.
   - These results place them among the **top-performing models**, comparable to the best XGBoost variants.

2. **CatBoost Tuned Model (`catboost_tuned_eval.pkl`)**
   - Achieved **precision = 0.94**, **recall = 0.97**, and **F1-score = 0.95** on the fraud class.
   - The model made only **3 false negatives** and **6 false positives** out of ~57k samples.
   - ROC AUC = **0.9976**, indicating very strong discrimination capability.
   - **Excellent balance** between catching fraud and minimizing false alarms.

3. **LightGBM Tuned Model (`lightgbm_tuned_eval.pkl`)**
   - Achieved **recall = 0.97**, but slightly lower **precision = 0.81** on fraud class.
   - Produced more false positives (**21**) than CatBoost, which slightly lowered its F1-score (0.88).
   - ROC AUC = **0.9980**, still among the highest.
   - **Strong recall, moderate precision**, may be suitable where catching fraud is more critical than reducing false positives.

4. **Conclusion**
   - Both models are **highly effective** and can be used in production.
   - **CatBoost** outperforms LightGBM in **precision and overall balance**, making it slightly more robust.
   - **LightGBM** may still be preferred in cases requiring higher recall and model speed.