# 🏦 Credit Risk Modeling — Model Training Workflow

This section documents the model training workflow for the Credit Risk Modeling project after feature engineering is completed.

---

## ✅ Objective

To train machine learning models that are **robust to outliers**, using the final engineered dataset. Outliers are **not removed**, but are **flagged using an `is_outlier` column**.

---

## 📊 Data Preparation

- Dataset includes engineered features and an `is_outlier` flag (`1` = outlier, `0` = normal).
- Outliers are retained to preserve data distribution and potential signal.
- **Target Variable:** `TARGET`  
  - `1` → Default  
  - `0` → Non-default
- Features selected based on domain knowledge and feature importance insights.

---

## ⚙️ Modeling Strategy

### 🧱 Baseline Model

- **Logistic Regression with Regularization (L2 penalty)**  
  Used to establish a benchmark performance.

---

### 🌳 Robust Models Trained

| Model              | Robust to Outliers | Notes                        |
|-------------------|--------------------|------------------------------|
| Decision Tree      | ✅ Yes             | Simple and interpretable     |
| Random Forest      | ✅ Yes             | Ensemble of decision trees   |
| XGBoost            | ✅ Yes             | Gradient boosting framework  |
| LightGBM           | ✅ Yes             | Efficient gradient boosting  |
| CatBoost           | ✅ Yes             | Handles categorical features |
| Ridge / Lasso      | ⚠️ Moderate        | Requires feature scaling     |

> ❌ Models such as plain Logistic Regression, KNN, and SVM (RBF kernel) are generally **sensitive to outliers** and are avoided or handled with care.

---

## 🧮 Evaluation Metrics

Models are evaluated using the following metrics:

- **ROC-AUC Score**
- **Precision, Recall, F1-Score**
- **Confusion Matrix**
- **PR AUC (Precision-Recall Curve AUC)** — recommended for imbalanced classification problems.
---

## ✅ Next Steps

- Perform hyperparameter tuning (e.g., GridSearchCV, Optuna)
- Compare model performances and rank based on AUC/F1
- Select the best-performing model for deployment or integration into risk scoring systems
- Optionally explore SHAP or LIME for model explainability

---
