# FinRisk: Credit Risk & Fraud Detection - Notebook 7
## Phase 4: Portfolio Analysis & Deployment

**Objective:**
1.  **Portfolio Stress Testing:** Simulate the impact of an adverse economic scenario on our loan portfolio's default rate.
2.  **Model Governance & Deployment:** Outline the framework for versioning, deploying, and serving our models as a real-time API.
3.  **Monitoring & Reporting:** Define the key metrics for a business dashboard to track model performance and portfolio health.

In [3]:
# ==============================================================================
# 1. Import Libraries
# ==============================================================================
import pandas as pd
import numpy as np
import os
import joblib
import matplotlib.pyplot as plt
import seaborn as sns

# Settings
import warnings
warnings.filterwarnings('ignore')


sns.set_style("whitegrid")

print("Libraries imported successfully.")

Libraries imported successfully.


## 2. Portfolio Stress Testing

Stress testing is a forward-looking analysis used to determine the stability of a financial institution under unfavorable economic conditions. We will simulate a recessionary scenario and measure its impact on the predicted default rate of our entire portfolio using our trained credit risk model.

**Scenario:** A moderate recession hits, causing:
* A **15% decrease** in annual incomes across the board.
* A **20% increase** in the debt-to-income ratio for all customers due to increased living costs.
* A uniform **25-point drop** in credit scores as financial situations tighten.

In [None]:
# --- Load Model and Data ---
MODELS_PATH = '../models/'
PROCESSED_DATA_PATH = '../data/processed/'

try:
    # Load our champion credit model
    credit_model = joblib.load(os.path.join(MODELS_PATH, 'champion_credit_model.joblib'))
    
    # Load the full feature set for our portfolio
    portfolio_df = pd.read_csv(os.path.join(PROCESSED_DATA_PATH, 'credit_risk_features.csv'))
    
    print("Model and portfolio data loaded.")
except FileNotFoundError:
    assert False, "Model or data not found. Please run previous notebooks."

# --- Define the Stress Test Function ---
def run_stress_test(df, model):
    """Applies a predefined economic shock to the portfolio and predicts the new default rate."""
    
    # 1. Baseline Prediction
    X_baseline = df.drop(columns=[
        'application_id', 'customer_id', 'application_date', 'last_activity_date',
        'default_flag', 'application_status', 'city'
    ])
    baseline_probabilities = model.predict_proba(X_baseline)[:, 1]
    baseline_default_rate = np.mean(baseline_probabilities)
    
    # 2. Apply Economic Shock
    df_stressed = df.copy()
    df_stressed['annual_income_profile'] *= 0.85 # 15% decrease
    df_stressed['debt_to_income_ratio'] *= 1.20 # 20% increase
    df_stressed['credit_score_profile'] -= 25    # 25 point drop
    
    # 3. Post-Shock Prediction
    X_stressed = df_stressed.drop(columns=[
        'application_id', 'customer_id', 'application_date', 'last_activity_date',
        'default_flag', 'application_status', 'city'
    ])
    stressed_probabilities = model.predict_proba(X_stressed)[:, 1]
    stressed_default_rate = np.mean(stressed_probabilities)
    
    return baseline_default_rate, stressed_default_rate

# --- Execute and Report Results ---
baseline_rate, stressed_rate = run_stress_test(portfolio_df, credit_model)

print("\n--- Stress Test Results ---")
print(f"Baseline Predicted Portfolio Default Rate: {baseline_rate:.2%}")
print(f"Stressed Predicted Portfolio Default Rate:  {stressed_rate:.2%}")

increase_percentage = ((stressed_rate - baseline_rate) / baseline_rate) * 100
print(f"\nThis represents a {increase_percentage:.2f}% increase in expected defaults under the stress scenario.")

In [None]:
## 3. Model Governance & Deployment

Deploying a model into production requires a robust framework to ensure it is versioned, auditable, and can be served reliably.

### Model Governance
This involves:
* **Version Control:** Every trained model should be versioned. Our `champion_credit_model.joblib` should be named `champion_credit_model_v1.0.joblib`. When we retrain, the new model becomes `v1.1`. This ensures reproducibility.
* **Model Registry:** In a mature system, a tool like **MLflow** would be used. It acts as a central repository, logging model versions, parameters, performance metrics, and the data it was trained on, providing a full audit trail for regulatory compliance.
* **Documentation:** Every model version must be accompanied by documentation detailing its purpose, features used, performance metrics, and limitations.

### Deployment as a Real-Time API
The `real_time_scoring_api` function from our previous notebook needs to be wrapped in a web framework to become a live endpoint. **FastAPI** is an excellent modern choice due to its high performance.

**Example `main.py` using FastAPI:**
```python
# main.py - This would be a separate file, not run in the notebook

from fastapi import FastAPI
import joblib
import pandas as pd

# Initialize the API
app = FastAPI(title="FinRisk Scoring API")

# Load models on startup
credit_model = joblib.load('models/champion_credit_model.joblib')
# ... load other necessary data or models ...

# Define the input data structure (using Pydantic)
# ... Pydantic model definition would go here ...

@app.post("/score")
def score_customer(customer_data: dict):
    """
    API endpoint to score a customer for credit and fraud risk.
    """
    # 1. Convert incoming JSON data to a DataFrame
    input_df = pd.DataFrame([customer_data])
    
    # 2. Predict using the loaded model
    # (Note: This is a simplified example. The full feature lookup would be needed)
    try:
        probability = credit_model.predict_proba(input_df)[:, 1][0]
        return {"customer_id": customer_data.get("customer_id"), "default_probability": probability}
    except Exception as e:
        return {"error": str(e)}

In [None]:

---

### In[6]:
```markdown
## 4. Monitoring & Business Dashboard

Once deployed, the model's job has just begun. Continuous monitoring is essential to detect performance degradation or data drift. This information would be fed into a business dashboard (e.g., using Tableau, Power BI, or Plotly Dash).

**Key Dashboard Metrics:**

1.  **Model Performance:**
    * **AUC / Gini / KS Over Time:** Track our key metrics on a weekly or monthly basis to see if the model's predictive power is decaying.
    * **Actual vs. Predicted Default Rate:** Compare the model's predictions to what actually happened.

2.  **API Performance:**
    * **Latency:** Track the average API response time to ensure it stays below the 100ms target.
    * **Uptime:** Monitor API availability (target: 99.9%).
    * **Error Rate:** Track the percentage of API calls that fail.

3.  **Data Drift:**
    * **Feature Distribution:** Plot the distribution of key input features (e.g., `credit_score`, `dti`) for incoming applications and compare it to the training data's distribution. A significant shift can indicate that the model is seeing data it wasn't trained on, and a retrain may be necessary.

**Simulating Data for a Monitoring Dashboard:**
```python
# Simulate generating a weekly performance report
def generate_weekly_report(df, model, week_number):
    """Simulates calculating KS and AUC for a week's worth of data."""
    # For simulation, we'll just sample the data
    weekly_data = df.sample(n=1000, random_state=week_number)
    
    X_week = weekly_data.drop(columns=[
        'application_id', 'customer_id', 'application_date', 'last_activity_date',
        'default_flag', 'application_status', 'city'
    ])
    y_true_week = weekly_data['default_flag']
    
    probs = model.predict_proba(X_week)[:, 1]
    
    data = pd.DataFrame({'y_true': y_true_week, 'y_pred_proba': probs})
    data = data.sort_values(by='y_pred_proba', ascending=False)
    data['cumulative_good'] = (1 - data['y_true']).cumsum() / (1 - data['y_true']).sum()
    data['cumulative_bad'] = data['y_true'].cumsum() / data['y_true'].sum()
    ks_statistic = np.max(np.abs(data['cumulative_bad'] - data['cumulative_good']))
    
    return {"Week": week_number, "KS Statistic": ks_statistic * 100}

# Generate reports for 12 weeks
monitoring_data = [generate_weekly_report(portfolio_df, credit_model, i) for i in range(1, 13)]
monitoring_df = pd.DataFrame(monitoring_data)

# Plot the results
plt.figure(figsize=(12, 6))
sns.lineplot(data=monitoring_df, x='Week', y='KS Statistic', marker='o')
plt.axhline(40, color='r', linestyle='--', label='Success Threshold (KS > 40)')
plt.title('Weekly Model Performance Monitoring (KS Statistic)')
plt.ylabel('KS Statistic')
plt.xlabel('Week Number')
plt.legend()
plt.ylim(0, 100)
plt.grid(True)
plt.show()

print(monitoring_df)
