###  Model Building and Training
1. Data Preparation
2. Build Baseline Model
3. Build Ensemble Model
4. Cross-Validation (recommended)
5. Model Comparison and Selection

## Import Libraries

In [1]:
import pandas as pd
import numpy as np
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
import sys
sys.path.append('..')
from src.modeling import ModelTrainer

model_trainer = ModelTrainer()

## Load feature and target values data

In [2]:
X = np.load('../Data/processed/x_credit.npy', allow_pickle=True)
y = np.load('../Data/processed/y_credit.npy', allow_pickle=True)
type(X), type(y)

(numpy.ndarray, numpy.ndarray)

## Data Preprocessing

In [3]:
# STRATIFIED TRAIN-TEST SPLIT
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, stratify=y,random_state=42)

##  Baseline Model: Logistic Regression


In [4]:
lr = LogisticRegression(solver='saga',max_iter=3000,class_weight="balanced", random_state=42, n_jobs=-1)
lr = model_trainer.train_model(lr, X_train, y_train) # Train model
y_pred_lr, y_proba_lr = model_trainer.predict(lr, X_test) # Predictions
metrics_lr = model_trainer.evaluate_model(y_test, y_pred_lr, y_proba_lr) # Evaluation

F1-score: 0.10246913580246914
AUC-PR: 0.7346170640457217
Confusion Matrix:
 [[55209  1442]
 [   12    83]]




## Build Ensemble Model: Random Forest

In [11]:

rf = RandomForestClassifier(random_state=42,
                            n_estimators=200, 
                            max_depth=10,
                            class_weight="balanced",
                            min_samples_split=5,
                            n_jobs=-1)



## Train Final Model on Full Training Set

In [12]:
rf = model_trainer.train_model(rf, X_train, y_train) # Train model

## Evaluate ONCE on Test Set

In [13]:
y_pred_rf, y_proba_rf = model_trainer.predict(rf, X_test) # Predictions
metrics_rf = model_trainer.evaluate_model(y_test, y_pred_rf, y_proba_rf) # Evaluation

F1-score: 0.8
AUC-PR: 0.7758886978338179
Confusion Matrix:
 [[56641    10]
 [   25    70]]


### STRATIFIED K-FOLD CROSS-VALIDATION (k=5)

In [None]:
_, lr_cv_results = model_trainer.cross_validation(lr, X_train, y_train)
_, rf_cv_results = model_trainer.cross_validation(rf, X_train, y_train)

In [None]:
cv_results = {
    'Model':['Logistic Regression', 'Random Forest'],
    'F1 Score (mean ± std)':[
        f"{lr_cv_results['f1_mean']:.4f} ± {lr_cv_results['f1_std']:.4f}",
        f"{rf_cv_results['f1_mean']:.4f} ± {rf_cv_results['f1_std']:.4f}",
    ],
    'AUC-PR (mean ± std)':[
        f"{lr_cv_results['auc_pr_mean']:.4f} ± {lr_cv_results['auc_pr_std']:.4f}",
        f"{rf_cv_results['auc_pr_mean']:.4f} ± {rf_cv_results['auc_pr_std']:.4f}"
    ]
}
comparison_df = pd.DataFrame(cv_results)
print(comparison_df) 

                 Model F1 Score (mean ± std) AUC-PR (mean ± std)
0  Logistic Regression       0.1087 ± 0.0127     0.7505 ± 0.0288
1        Random Forest       0.8333 ± 0.0291     0.8120 ± 0.0269


In [None]:
recommended_model = "Random Forest" if rf_cv_results['f1_mean'] > lr_cv_results['f1_mean'] else "Logistic Regression"
print(f"Recommended model based on CV F1-score: {recommended_model}")


Recommended model based on CV F1-score: Random Forest


## Save model

In [None]:
import joblib
joblib.dump(rf, "../models/credit_model.joblib")



In [4]:
np.save('../Data/processed/y_credit_test.npy', y_test)
np.save('../Data/processed/x_credit_test.npy', X_test)
np.save('../Data/processed/x_credit_train.npy', X_train)

## Model Comparison and Recommendation
## Evaluation Strategy

Models were evaluated using Stratified K-Fold Cross-Validation (k = 5) to ensure reliable performance estimates while preserving the original class imbalance.
Given the highly imbalanced nature of fraud detection, F1-score and AUC-PR (Area Under Precision–Recall Curve) were  selected as the primary evaluation metrics:

F1-score balances precision and recall, which is critical for minimizing both missed fraud cases and false alarms.

AUC-PR provides a more informative assessment than ROC-AUC for rare-event classification problems.

Model Performance Summary (Cross-Validation)
Model	F1 Score (mean ± std)	AUC-PR (mean ± std)
Logistic Regression	0.1087 ± 0.0127	0.7505 ± 0.0288
Random Forest	0.8333 ± 0.0291	0.8120 ± 0.0269
## Model Interpretation
### Logistic Regression

Serves as a strong interpretable baseline due to its linear structure and clear feature coefficients.

Achieves a reasonable AUC-PR, indicating it can rank transactions by fraud risk to some extent.

However, the very low F1-score indicates poor balance between precision and recall.

Struggles with complex, non-linear fraud patterns and generates a high number of false positives.

Best suited for explainability and benchmarking, rather than deployment.

### Random Forest

Significantly outperforms Logistic Regression in both F1-score and AUC-PR.

Effectively captures non-linear relationships and feature interactions common in fraudulent behavior.

Demonstrates stable performance across folds (low standard deviation), indicating robustness and generalization ability.

Achieves a much better balance between detecting fraudulent transactions and limiting false alarms.

While less inherently interpretable, it can be explained using SHAP-based feature importance.

## Final Model Recommendation
### ✅ Selected Model: Random Forest

Justification:

Achieves the highest F1-score, which is essential for fraud detection where missing fraudulent transactions is costly.

Produces a higher AUC-PR, making it more reliable for highly imbalanced datasets.

Shows consistent cross-validation performance, indicating robustness and reduced overfitting risk.

Better suited for real-world fraud detection systems, where predictive performance is prioritized.

Logistic Regression remains valuable as a baseline and explanatory model, but Random Forest provides the best overall trade-off between detection accuracy, robustness, and practical applicability.