# 📊 Customer Churn Prediction Report: Transformer vs Random Forest (Evaluation and Recommendation)

## 📁 Dataset Summary

- **Binary Target Variable**: Churn (Yes/No)
- **Class Imbalance**: Slight imbalance → handled using **SMOTE**
- **Gender Distribution**: Equal distribution (Male ≈ Female)
- **Outliers**: No outliers detected in continuous variables (via boxplots)
- **Correlation**: 
  - `MonthlyCharges` and `TotalCharges` → positively correlated.
  - Customers with higher `MonthlyCharges` tend to **churn more**.
- **Tenure Insights**:
  - Churn is **very high during the first year**.
  - Very **low churn** beyond 4 years → indicates customer **loyalty increases with tenure**.

---

## 🧠 Model 1: Transformer-Based Model (TabTransformer)

### ⚙️ Architecture Overview
- Utilizes **self-attention layers** to capture dependencies among categorical features.
- Categorical features are **embedded** into dense vectors.
- These embeddings are **processed by Transformer blocks** (multi-head attention + feed-forward layers).
- Outputs are concatenated with normalized numerical features and passed through an MLP head.
- Trained using `Binary Cross-Entropy Loss` with the `Adam` optimizer.

### 📈 Performance
| Metric         | Class 0 (No) | Class 1 (Yes) |
|----------------|-------------|---------------|
| Precision      | 0.83        | 0.66          |
| Recall         | 0.91        | 0.50          |
| F1-Score       | 0.87        | 0.57          |

- **Accuracy**: 0.80  
- **Macro F1**: 0.72  
- **Weighted F1**: 0.79  
- **ROC AUC Score**: 0.8156 ✅

> 🔎 **Observations**:  
> - **Strong overall accuracy and precision**, especially for class 0 (non-churners).
> - Shows **significant improvement** in recall for churners ( **0.50**).
> - ROC AUC of **0.82** indicates good separation capability.
> - Still underperforms in detecting churners compared to Random Forest.

---

## 🌲 Model 2: Random Forest Classifier

### ⚙️ Config
```python
RandomForestClassifier(
    max_depth=10,
    max_features='log2',
    min_samples_leaf=2,
    n_estimators=200,
    random_state=0
)
```

### 📈 Performance
| Metric         | Class 0 (No) | Class 1 (Yes) |
|----------------|-------------|---------------|
| Precision      | 0.89        | 0.54          |
| Recall         | 0.77        | 0.72          |
| F1-Score       | 0.83        | 0.62          |

- **Accuracy**: 0.76  
- **Macro F1**: 0.72  
- **Weighted F1**: 0.77  

> 🔎 **Observations**:  
> - Slightly **lower overall accuracy** than Transformer (0.76 vs 0.80)
> - **Highest recall for churners (0.72)** among all models — critical for identifying at-risk customers.
> - More balanced F1 scores for both classes.
> - Offers interpretability through feature importance.

---

## 🔍 Feature Importance from Random Forest

| Rank | Feature                            | Importance |
|------|------------------------------------|------------|
| 1    | `tenure`                           | 0.1361     |
| 2    | `PaymentMethod_Electronic check`   | 0.1081     |
| 3    | `TotalCharges`                     | 0.1077     |
| 4    | `MonthlyCharges`                   | 0.0988     |
| 5    | `InternetService_Fiber optic`      | 0.0981     |
| 6    | `tenure_range`                     | 0.0866     |
| 7    | `Contract_Two year`                | 0.0802     |

> 📌 Key Insight:
> - **Tenure** is the top predictive feature — matches EDA findings.
> - Features like **payment method** and **monthly charges** significantly impact churn probability.

---

## ✅ Recommendation

While both models perform well, the **Random Forest** model remains the **preferred choice** for churn prediction:

### ✔ Why Random Forest?
- **Higher recall (0.72) for churners** — crucial for preventing revenue loss.
- Balanced performance with interpretability.
- Useful for **feature importance analysis** to inform business strategies.

### ⚠️ Considerations for Transformer:
- Better overall accuracy, but still **lags behind in identifying churners** effectively.

> 📝 **Final Verdict**:  
> For a churn prediction task, **recall for the positive class (churn)** is most important.  
> **Random Forest** provides a **better trade-off** between recall, interpretability, and real-world utility.
