# 📊 Task 4: Analysis, Comparison, and Future Steps

This final report synthesizes the results from the Predictive Deep Learning (DL) Model (Task 2) and the Prescriptive Offline Reinforcement Learning (RL) Agent (Task 3).

---

## 1. Key Results Summary

| Metric | Model 1: Deep Learning (DL) | Model 2: Offline RL (CQL) |
| :--- | :--- | :--- |
| **Primary Goal** | Predict Default Risk (Probability) | Maximize Financial Return (Policy) |
| **Key Metric** | **ROC AUC Score** | **Estimated Policy Value** |
| **Metric Value** | **0.7329** | **\$212.50** (per loan) |
| F1-Score (Class 1) | 0.2348 | N/A |
| Baseline (Historical) | N/A | **\$-1806.30** (per loan) |

---

## 2. Explaining the Metrics

The choice of metrics reflects the fundamental difference in model objectives:

### Why AUC vs. Policy Value

| Metric | DL Model (0.7329 AUC) | RL Agent (\$212.50 Policy Value) |
| :--- | :--- | :--- |
| **What it Measures** | **Quality of Ranking.** AUC measures the model's ability to correctly order applicants from highest risk to lowest risk. It is a passive predictor of *what* will happen. | **Expected Profit.** This measures the total expected reward (profit or loss) per loan if the new, smart policy is deployed. It is a prescriptive decision-maker concerned with *what to do* to maximize money. |
| **Justification** | AUC is the right metric for **prediction** tasks with **imbalanced data**. A high AUC proves the model is competent at identifying risk, even if the F1-Score (0.2348) is low due to the small number of actual defaults. | This is the ultimate **business metric**. The goal isn't just to predict default; it's to make more money. This metric directly quantifies the financial impact of the agent's Approve/Deny policy. |

---

## 3. Policy Comparison and Disagreement

The RL agent is designed to prioritize **profit** over simple prediction, leading to strategic disagreements with the DL model.

### Case Study: The "Risky but Profitable" Borrower

| Profile | DL Model's Decision | RL Agent's Decision | Explanation |
| :--- | :--- | :--- | :--- |
| **Low Principal, High Interest** (e.g., \$2,000 at 24% with low FICO/high DTI) | **DENY** | **APPROVE** | The **DL Model** (ROC AUC focus) sees the low FICO and high DTI, predicts a high default probability (e.g., 70% risk), and recommends denial. The **RL Agent** (Estimated Value focus) weighs the financial loss (≈ -\$2,000) against the high interest profit. It calculates that this small, high-return risk is an **acceptable gamble** that increases its overall average expected profit. |

---

## 4. Final Review and Next Steps

### Justification of Reward Function

The reward function was chosen to enforce the primary objective: **maximize profit, not just minimize error.** It ties reward to **Loan Amount** and **Interest Rate** to value high-profit loans and penalize catastrophic losses equally aggressively.

**Long-Term Business Risk:**  
The single biggest risk is that the reward function **ignores the loan duration (`term`)**. By treating a 36-month loan and a 60-month loan as equally profitable, the policy underestimates the increased **time-based risk** of longer loans. This bias could lead the deployed policy to overload the portfolio with high-risk, long-term products, resulting in systemic losses over time.

### Future Work

1. **Deployment:** Conduct a small, controlled **A/B test** of the RL policy on new loan applications to validate the \$212.50 estimate with real-world financial data.  
2. **Reward Function Refinement:** Integrate the **Time Value of Money (TVM)** into the reward and include a **risk penalty** for longer-duration loans.  
3. **Data Strategy:** Source data on **denied applications** to train the RL agent to understand counterfactual outcomes, mitigating the selection bias in the current training set.
