# üè¶‚≠ê‚≠ê‚≠ê Complete Use Case: Credit Scoring

<div style="background-color: #fff3e0; padding: 20px; border-radius: 5px; border-left: 5px solid #ff6f00;">
<b>üî• CRITICAL CASE STUDY - REAL PRODUCTION SCENARIO</b><br>
<b>Level:</b> Advanced - Production Ready<br>
<b>Duration:</b> 45-60 minutes<br>
<b>Dataset:</b> Credit Scoring (synthetic realistic)<br>
<b>Importance:</b> ‚≠ê‚≠ê‚≠ê ESSENTIAL - Real-world regulated application
</div>

---

## üéØ Objectives

By the end of this notebook, you will be able to:
- ‚úÖ Execute complete end-to-end ML validation workflow
- ‚úÖ Ensure regulatory compliance (EEOC, Fair Lending Laws)
- ‚úÖ Test all validation dimensions (robustness, fairness, uncertainty)
- ‚úÖ Generate professional audit documentation
- ‚úÖ Make data-driven deployment decisions
- ‚úÖ Apply this workflow to your own projects

---

## üìö Table of Contents

1. [Business Context & Problem](#context)
2. [Regulatory Requirements](#regulatory)
3. [Data Preparation](#data)
4. [Exploratory Data Analysis](#eda)
5. [Model Training](#training)
6. [Baseline Validation](#baseline)
7. [Regulatory Validation - Fairness](#fairness)
8. [Robustness Testing](#robustness)
9. [Uncertainty Quantification](#uncertainty)
10. [Drift Detection](#drift)
11. [Generate Audit Reports](#reports)
12. [Deployment Decision](#decision)
13. [Production Monitoring Plan](#monitoring)
14. [Conclusion](#conclusion)

<a id="context"></a>
## 1. üìñ Business Context & Problem

### üè¶ The Scenario

You are a **Senior ML Engineer** at **SecureBank**, a mid-sized financial institution.

**Your Task:**
> "We need an automated credit scoring model to decide loan approvals. The current manual process is slow, inconsistent, and expensive. Your model must be accurate, fair, robust, and compliant with all regulations before we deploy it."
> 
> ‚Äî VP of Risk Management

### üí∞ Business Impact

- **Cost Savings:** $2M/year from automation
- **Processing Time:** 3 days ‚Üí 3 seconds
- **Consistency:** Eliminate human bias and errors
- **Risk:** Model must not discriminate or fail unexpectedly

### ‚ö†Ô∏è Stakes Are High!

**If you deploy a bad model:**
- üí∏ **Fines:** $10M+ from regulators (EEOC, CFPB)
- ‚öñÔ∏è **Lawsuits:** Class action from discriminated applicants
- üìâ **Reputation:** Brand damage, customer loss
- üö´ **License:** Risk of losing operating license

**If you validate properly:**
- ‚úÖ Safe deployment
- ‚úÖ Regulatory approval
- ‚úÖ Business value realized
- ‚úÖ Your career advances!

### üéØ Success Criteria

1. ‚úÖ **Performance:** ROC AUC ‚â• 0.85, Accuracy ‚â• 0.80
2. ‚úÖ **Fairness:** Pass EEOC 80% Rule (gender, race, age)
3. ‚úÖ **Robustness:** Score ‚â• 0.85 (resist manipulation attempts)
4. ‚úÖ **Uncertainty:** Quantified for high-risk decisions
5. ‚úÖ **Drift:** Monitoring plan for production
6. ‚úÖ **Documentation:** Complete audit trail

**Let's get started!** üöÄ

<a id="regulatory"></a>
## 2. ‚öñÔ∏è Regulatory Requirements

### üá∫üá∏ US Federal Laws

Your model MUST comply with:

#### 1. Equal Credit Opportunity Act (ECOA)
- ‚ùå **Prohibited:** Discrimination based on race, color, religion, national origin, sex, marital status, age
- ‚úÖ **Required:** Adverse action notices, recordkeeping

#### 2. Fair Housing Act
- ‚ùå **Prohibited:** Housing credit discrimination
- ‚úÖ **Required:** Equal access to credit

#### 3. Fair Credit Reporting Act (FCRA)
- ‚úÖ **Required:** Accurate credit data, consumer rights

#### 4. EEOC Guidelines - 80% Rule
- üìê **Formula:** `Selection Rate(Unprivileged) / Selection Rate(Privileged) ‚â• 0.80`
- üéØ **Disparate Impact:** If ratio < 0.80, you have discriminatory impact!
- ‚ö†Ô∏è **Consequence:** Violation = regulatory action

### üîç Regulators Watching You

1. **CFPB** - Consumer Financial Protection Bureau
2. **OCC** - Office of the Comptroller of the Currency
3. **FDIC** - Federal Deposit Insurance Corporation
4. **FTC** - Federal Trade Commission

### üìã Model Documentation Requirements

You must document:
- ‚úÖ Model development process
- ‚úÖ Training data characteristics
- ‚úÖ Validation methodology
- ‚úÖ Fairness testing results
- ‚úÖ Ongoing monitoring plan
- ‚úÖ Model governance procedures

### üéì Key Principle

> **"Regulation doesn't ban ML in credit - it requires responsible, validated, fair ML!"**

**DeepBridge helps you achieve this!** ‚úÖ

<a id="data"></a>
## 3. üìä Data Preparation

### Setup

In [None]:
# Standard libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
from pathlib import Path
from datetime import datetime
import joblib

# sklearn
from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import (
    accuracy_score, roc_auc_score, f1_score, precision_score, recall_score,
    confusion_matrix, classification_report, roc_curve, auc
)

# DeepBridge - Your validation toolkit!
from deepbridge import DBDataset, Experiment

# Visualization
plt.style.use('seaborn-v0_8-whitegrid')
sns.set_palette("Set2")
%matplotlib inline

# Settings
warnings.filterwarnings('ignore')
pd.set_option('display.max_columns', None)
pd.set_option('display.precision', 3)

# Reproducibility
RANDOM_STATE = 42
np.random.seed(RANDOM_STATE)

print("‚úÖ Setup complete!")
print(f"üìÖ Date: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")
print(f"üè¶ Project: SecureBank Credit Scoring Model Validation")

### Generate Realistic Credit Scoring Dataset

For this demo, we'll create a synthetic but realistic dataset that mirrors real credit data.

In [None]:
print("üèóÔ∏è Generating realistic credit scoring dataset...\n")

np.random.seed(RANDOM_STATE)
n_samples = 5000

# Generate realistic features
df = pd.DataFrame({
    # Demographic (some are protected attributes!)
    'age': np.random.randint(21, 70, n_samples),
    'gender': np.random.choice(['Male', 'Female'], n_samples, p=[0.52, 0.48]),
    'race': np.random.choice(
        ['White', 'Black', 'Hispanic', 'Asian', 'Other'],
        n_samples,
        p=[0.60, 0.13, 0.18, 0.06, 0.03]
    ),
    
    # Financial features (legitimate for credit decisions)
    'annual_income': np.random.gamma(2, 30000, n_samples).clip(15000, 500000),
    'credit_score': np.random.randint(300, 850, n_samples),
    'employment_years': np.random.randint(0, 40, n_samples),
    'loan_amount': np.random.gamma(2, 15000, n_samples).clip(1000, 100000),
    'debt_to_income': np.random.beta(2, 5, n_samples),
    'num_credit_lines': np.random.randint(1, 15, n_samples),
    'num_delinquencies': np.random.poisson(0.5, n_samples),
    'has_mortgage': np.random.choice([0, 1], n_samples, p=[0.6, 0.4]),
    'has_cosigner': np.random.choice([0, 1], n_samples, p=[0.7, 0.3]),
    'employment_type': np.random.choice(
        ['Full-time', 'Part-time', 'Self-employed', 'Retired'],
        n_samples,
        p=[0.65, 0.15, 0.15, 0.05]
    ),
})

# Create realistic target (loan approval)
# Use legitimate features + small unintentional bias
approval_score = (
    # Legitimate factors (strong influence)
    (df['credit_score'] - 500) / 100 * 0.25 +
    (df['annual_income'] / 100000) * 0.20 +
    (df['employment_years'] / 20) * 0.12 +
    (1 - df['debt_to_income']) * 0.15 +
    (1 - df['num_delinquencies'] / 5) * 0.10 +
    df['has_cosigner'] * 0.08 +
    
    # Unintentional bias (small, but exists - we'll detect and fix!)
    (df['gender'] == 'Male') * 0.03 +  # ‚Üê Historical bias
    (df['race'] == 'White') * 0.02      # ‚Üê Historical bias
)

# Convert to binary with noise
df['approved'] = (approval_score + np.random.normal(0, 0.12, n_samples) > 0.52).astype(int)

print(f"‚úÖ Dataset created: {df.shape}")
print(f"\nüìä Approval rate: {df['approved'].mean():.1%}")
print(f"\nüìã Features:")
print(f"   Total: {len(df.columns) - 1}")
print(f"   Protected attributes: gender, race, age")
print(f"   Legitimate credit factors: income, credit_score, employment, etc.")

<a id="eda"></a>
## 4. üìä Exploratory Data Analysis

### Dataset Overview

In [None]:
print("üìä DATASET OVERVIEW")
print("=" * 80)

print(f"\nShape: {df.shape}")
print(f"\nData types:")
print(df.dtypes.value_counts())

print(f"\nMissing values: {df.isnull().sum().sum()}")

print(f"\nFirst 5 rows:")
display(df.head())

### Statistical Summary

In [None]:
# Numerical features
print("üìà NUMERICAL FEATURES SUMMARY")
display(df.describe())

### Target Distribution

In [None]:
# Approval distribution
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Count
df['approved'].value_counts().plot(kind='bar', ax=axes[0], color=['coral', 'lightgreen'], 
                                    edgecolor='black', alpha=0.8)
axes[0].set_title('Loan Approval Distribution', fontsize=13, fontweight='bold')
axes[0].set_xlabel('Approved', fontsize=11)
axes[0].set_ylabel('Count', fontsize=11)
axes[0].set_xticklabels(['Rejected (0)', 'Approved (1)'], rotation=0)
axes[0].grid(axis='y', alpha=0.3)

# Percentage
approval_pct = df['approved'].value_counts(normalize=True) * 100
approval_pct.plot(kind='pie', ax=axes[1], autopct='%1.1f%%', 
                   colors=['coral', 'lightgreen'], startangle=90)
axes[1].set_title('Approval Rate', fontsize=13, fontweight='bold')
axes[1].set_ylabel('')

plt.tight_layout()
plt.show()

print(f"\nüìä Approval Statistics:")
print(f"   Approved: {df['approved'].sum()} ({df['approved'].mean():.1%})")
print(f"   Rejected: {(1-df['approved']).sum()} ({(1-df['approved'].mean()):.1%})")

### Check for Disparities in Raw Data

‚ö†Ô∏è **CRITICAL:** Before training, check if data itself has disparities!

In [None]:
print("‚ö†Ô∏è  CHECKING FOR DISPARITIES IN RAW DATA")
print("=" * 80)

# By gender
print("\nüë• APPROVAL RATE BY GENDER:")
gender_approval = df.groupby('gender')['approved'].agg(['mean', 'count'])
gender_approval['mean'] = gender_approval['mean'] * 100
display(gender_approval)

# Calculate disparate impact
male_rate = df[df['gender'] == 'Male']['approved'].mean()
female_rate = df[df['gender'] == 'Female']['approved'].mean()
di_gender = female_rate / male_rate if male_rate > 0 else 0

print(f"\n   Disparate Impact (Gender): {di_gender:.3f}")
print(f"   {'‚úÖ PASS' if di_gender >= 0.8 else '‚ùå FAIL'} EEOC 80% Rule")

# By race
print("\nüåç APPROVAL RATE BY RACE:")
race_approval = df.groupby('race')['approved'].agg(['mean', 'count'])
race_approval['mean'] = race_approval['mean'] * 100
race_approval = race_approval.sort_values('mean', ascending=False)
display(race_approval)

# Check disparate impact for each race vs highest
max_race = race_approval['mean'].idxmax()
max_rate = race_approval.loc[max_race, 'mean'] / 100

print(f"\n   Reference group (highest): {max_race} ({max_rate:.1%})")
print(f"\n   Disparate Impact by Race:")
for race in race_approval.index:
    rate = race_approval.loc[race, 'mean'] / 100
    di = rate / max_rate if max_rate > 0 else 0
    status = "‚úÖ" if di >= 0.8 else "‚ùå"
    print(f"      {status} {race}: DI = {di:.3f}")

### Feature Distributions

In [None]:
# Key financial features
fig, axes = plt.subplots(2, 3, figsize=(16, 10))
axes = axes.ravel()

features_to_plot = ['credit_score', 'annual_income', 'debt_to_income', 
                     'employment_years', 'loan_amount', 'num_credit_lines']

for i, feature in enumerate(features_to_plot):
    axes[i].hist(df[feature], bins=30, color='steelblue', edgecolor='black', alpha=0.7)
    axes[i].set_title(f'{feature}', fontsize=12, fontweight='bold')
    axes[i].set_xlabel(feature)
    axes[i].set_ylabel('Frequency')
    axes[i].grid(axis='y', alpha=0.3)

plt.tight_layout()
plt.show()

### Correlation Analysis

In [None]:
# Select numerical features
num_features = ['age', 'annual_income', 'credit_score', 'employment_years',
                'loan_amount', 'debt_to_income', 'num_credit_lines', 
                'num_delinquencies', 'approved']

plt.figure(figsize=(12, 10))
correlation = df[num_features].corr()
sns.heatmap(correlation, annot=True, fmt='.2f', cmap='coolwarm', center=0,
            square=True, linewidths=1, cbar_kws={"shrink": 0.8})
plt.title('Feature Correlation Matrix', fontsize=14, fontweight='bold')
plt.tight_layout()
plt.show()

# Features most correlated with approval
print("\nüéØ FEATURES MOST CORRELATED WITH APPROVAL:")
approval_corr = correlation['approved'].sort_values(ascending=False)[1:]
print(approval_corr.head(10))

<div style="background-color: #d4edda; padding: 15px; border-radius: 5px; border-left: 5px solid #28a745;">
<b>‚úÖ EDA Complete!</b> We understand the data. Key findings:
<ul>
<li>Dataset is realistic with 5000 applicants</li>
<li>Approval rate is balanced</li>
<li>Some disparities exist in raw data (expected in real scenarios)</li>
<li>Credit score, income, and debt_to_income are strong predictors</li>
</ul>
</div>

<a id="training"></a>
## 5. ü§ñ Model Training

### Prepare Features for Modeling

In [None]:
print("üîß Preparing features for modeling...\n")

# Encode categorical features
df_encoded = df.copy()

# Gender encoding
df_encoded['gender_enc'] = (df['gender'] == 'Male').astype(int)

# Race encoding (one-hot)
df_encoded['race_White'] = (df['race'] == 'White').astype(int)
df_encoded['race_Black'] = (df['race'] == 'Black').astype(int)
df_encoded['race_Hispanic'] = (df['race'] == 'Hispanic').astype(int)
df_encoded['race_Asian'] = (df['race'] == 'Asian').astype(int)

# Employment type encoding
df_encoded['emp_fulltime'] = (df['employment_type'] == 'Full-time').astype(int)
df_encoded['emp_parttime'] = (df['employment_type'] == 'Part-time').astype(int)
df_encoded['emp_self'] = (df['employment_type'] == 'Self-employed').astype(int)

# Feature list
feature_cols = [
    'age', 'annual_income', 'credit_score', 'employment_years',
    'loan_amount', 'debt_to_income', 'num_credit_lines', 'num_delinquencies',
    'has_mortgage', 'has_cosigner',
    'gender_enc',  # ‚Üê Note: Including for now, will test fairness!
    'race_White', 'race_Black', 'race_Hispanic', 'race_Asian',
    'emp_fulltime', 'emp_parttime', 'emp_self'
]

X = df_encoded[feature_cols]
y = df_encoded['approved']

print(f"‚úÖ Features prepared!")
print(f"   Total features: {len(feature_cols)}")
print(f"   Samples: {len(X)}")

### Train/Test Split

In [None]:
# Stratified split
X_train, X_test, y_train, y_test = train_test_split(
    X, y,
    test_size=0.2,
    random_state=RANDOM_STATE,
    stratify=y
)

print(f"‚úÖ Data split:")
print(f"   Train: {X_train.shape} (approval rate: {y_train.mean():.1%})")
print(f"   Test: {X_test.shape} (approval rate: {y_test.mean():.1%})")

### Train Primary Model - Random Forest

In [None]:
print("üå≤ Training Random Forest Classifier...\n")

# Model configuration
model = RandomForestClassifier(
    n_estimators=200,
    max_depth=12,
    min_samples_split=10,
    min_samples_leaf=5,
    random_state=RANDOM_STATE,
    n_jobs=-1
)

# Train
model.fit(X_train, y_train)

print("‚úÖ Model trained!")
print(f"   Algorithm: RandomForestClassifier")
print(f"   Trees: {model.n_estimators}")
print(f"   Max depth: {model.max_depth}")

<a id="baseline"></a>
## 6. ‚úÖ Baseline Validation

### Performance Metrics

In [None]:
# Predictions
y_pred_train = model.predict(X_train)
y_pred_test = model.predict(X_test)
y_proba_train = model.predict_proba(X_train)[:, 1]
y_proba_test = model.predict_proba(X_test)[:, 1]

# Calculate metrics
metrics = {
    'Accuracy': [accuracy_score(y_train, y_pred_train), accuracy_score(y_test, y_pred_test)],
    'ROC AUC': [roc_auc_score(y_train, y_proba_train), roc_auc_score(y_test, y_proba_test)],
    'F1 Score': [f1_score(y_train, y_pred_train), f1_score(y_test, y_pred_test)],
    'Precision': [precision_score(y_train, y_pred_train), precision_score(y_test, y_pred_test)],
    'Recall': [recall_score(y_train, y_pred_train), recall_score(y_test, y_pred_test)]
}

metrics_df = pd.DataFrame(metrics, index=['Train', 'Test']).T

print("üìä BASELINE PERFORMANCE METRICS")
print("=" * 60)
display(metrics_df.style.format("{:.3f}").background_gradient(cmap='RdYlGn', axis=1))

# Check overfitting
overfit = (metrics_df['Train'] - metrics_df['Test']).abs()
print(f"\nüí° Overfitting Analysis:")
print(f"   Average Train-Test gap: {overfit.mean():.3f}")
if overfit.mean() < 0.05:
    print(f"   ‚úÖ Good generalization!")
else:
    print(f"   ‚ö†Ô∏è  Some overfitting detected")

### Confusion Matrix

In [None]:
# Confusion matrix
cm = confusion_matrix(y_test, y_pred_test)

plt.figure(figsize=(8, 6))
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues',
            xticklabels=['Rejected', 'Approved'],
            yticklabels=['Rejected', 'Approved'])
plt.title('Confusion Matrix - Test Set', fontsize=14, fontweight='bold')
plt.ylabel('Actual', fontsize=12)
plt.xlabel('Predicted', fontsize=12)
plt.tight_layout()
plt.show()

tn, fp, fn, tp = cm.ravel()
print(f"\nüìä Confusion Matrix Breakdown:")
print(f"   True Negatives (correctly rejected): {tn}")
print(f"   False Positives (incorrectly approved): {fp}")
print(f"   False Negatives (incorrectly rejected): {fn}")
print(f"   True Positives (correctly approved): {tp}")

### ROC Curve

In [None]:
# ROC curve
fpr, tpr, thresholds = roc_curve(y_test, y_proba_test)
roc_auc = auc(fpr, tpr)

plt.figure(figsize=(10, 6))
plt.plot(fpr, tpr, color='darkorange', lw=2, label=f'ROC curve (AUC = {roc_auc:.3f})')
plt.plot([0, 1], [0, 1], color='navy', lw=2, linestyle='--', label='Random classifier')
plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.05])
plt.xlabel('False Positive Rate', fontsize=12)
plt.ylabel('True Positive Rate', fontsize=12)
plt.title('Receiver Operating Characteristic (ROC) Curve', fontsize=14, fontweight='bold')
plt.legend(loc="lower right", fontsize=11)
plt.grid(alpha=0.3)
plt.tight_layout()
plt.show()

### Feature Importance

In [None]:
# Feature importance
importances = pd.DataFrame({
    'Feature': feature_cols,
    'Importance': model.feature_importances_
}).sort_values('Importance', ascending=False)

plt.figure(figsize=(12, 8))
plt.barh(importances['Feature'].head(15), importances['Importance'].head(15),
         color='steelblue', edgecolor='black', alpha=0.8)
plt.xlabel('Importance', fontsize=12)
plt.title('Top 15 Feature Importances', fontsize=14, fontweight='bold')
plt.gca().invert_yaxis()
plt.grid(axis='x', alpha=0.3)
plt.tight_layout()
plt.show()

print("\nüîù TOP 10 MOST IMPORTANT FEATURES:")
print(importances.head(10).to_string(index=False))

<div style="background-color: #d4edda; padding: 15px; border-radius: 5px; border-left: 5px solid #28a745;">
<b>‚úÖ Baseline Validation Complete!</b><br>
Model performance is good, but this is just the beginning. Now we need to validate for production:<br>
<ul>
<li>Fairness (regulatory requirement)</li>
<li>Robustness (resist manipulation)</li>
<li>Uncertainty (quantify confidence)</li>
<li>Drift detection (temporal stability)</li>
</ul>
</div>

<a id="fairness"></a>
## 7. ‚öñÔ∏è REGULATORY VALIDATION - FAIRNESS (CRITICAL!)

<div style="background-color: #ffebee; padding: 20px; border-radius: 5px; border-left: 5px solid #d32f2f;">
<b>üö® CRITICAL COMPLIANCE CHECK</b><br>
This section is MANDATORY for deployment. Failure = regulatory violation!
</div>

### Create DBDataset & Experiment

In [None]:
print("üî¨ Setting up DeepBridge for comprehensive validation...\n")

# Create DBDataset
dataset = DBDataset(
    data=df_encoded,
    target_column='approved',
    model=model,
    test_size=0.2,
    random_state=RANDOM_STATE,
    dataset_name='SecureBank Credit Scoring Model'
)

# Create Experiment with protected attributes
exp = Experiment(
    dataset=dataset,
    experiment_type='binary_classification',
    experiment_name='Credit Scoring Production Validation',
    protected_attributes=['gender', 'race'],  # ‚Üê CRITICAL!
    random_state=RANDOM_STATE
)

print("‚úÖ DeepBridge setup complete!")
print(f"   Protected attributes: {exp.protected_attributes}")
print(f"   Experiment: {exp.experiment_name}")

### Execute Complete Fairness Analysis

In [None]:
print("‚öñÔ∏è  EXECUTING COMPLETE FAIRNESS ANALYSIS")
print("=" * 80)
print("\n   üìä Calculating 15+ fairness metrics")
print("   üõ°Ô∏è  Verifying EEOC 80% Rule compliance")
print("   üë• Analyzing by group (gender, race)")
print("   üìà Threshold impact analysis")
print("\n‚è≥ This may take a few minutes...\n")

# Run fairness tests
fairness_result = exp.run_fairness_tests(config='full')

print("\n‚úÖ Fairness analysis complete!")

### EEOC 80% Rule Compliance Check

In [None]:
print("‚öñÔ∏è  EEOC 80% RULE COMPLIANCE CHECK")
print("=" * 80)

# Get predictions for test set
test_indices = X_test.index
test_gender = df.loc[test_indices, 'gender']
test_race = df.loc[test_indices, 'race']
y_pred_test = model.predict(X_test)

# Gender analysis
print("\nüë• GENDER ANALYSIS:")
male_approval = y_pred_test[test_gender == 'Male'].mean()
female_approval = y_pred_test[test_gender == 'Female'].mean()
di_gender = female_approval / male_approval if male_approval > 0 else 0

print(f"   Male approval rate: {male_approval:.1%}")
print(f"   Female approval rate: {female_approval:.1%}")
print(f"   Disparate Impact: {di_gender:.3f}")
print(f"   Status: {'‚úÖ PASS' if di_gender >= 0.8 else '‚ùå FAIL'} (threshold = 0.80)")

if di_gender < 0.8:
    print(f"   ‚ö†Ô∏è  ACTION REQUIRED: Model discriminates by gender!")
    print(f"   Gap: {abs(male_approval - female_approval):.1%}")

# Race analysis
print("\nüåç RACE ANALYSIS:")
race_groups = test_race.unique()
race_approvals = {}

for race in race_groups:
    mask = test_race == race
    if mask.sum() > 0:
        race_approvals[race] = y_pred_test[mask].mean()

# Reference: highest approval rate
max_race = max(race_approvals, key=race_approvals.get)
max_approval = race_approvals[max_race]

print(f"   Reference group: {max_race} ({max_approval:.1%})\n")

race_pass = True
for race in sorted(race_approvals.keys()):
    approval = race_approvals[race]
    di = approval / max_approval if max_approval > 0 else 0
    status = "‚úÖ" if di >= 0.8 else "‚ùå"
    if di < 0.8:
        race_pass = False
    print(f"   {status} {race}: {approval:.1%} (DI = {di:.3f})")

# Final verdict
print("\n" + "=" * 80)
all_pass = di_gender >= 0.8 and race_pass

if all_pass:
    print("\nüéâ ‚úÖ EEOC COMPLIANCE: APPROVED")
    print("   Model passes 80% Rule for gender and race")
    print("   ‚úÖ Model can proceed to production (from fairness perspective)")
else:
    print("\nüö® ‚ùå EEOC COMPLIANCE: FAILED")
    print("   ‚ö†Ô∏è  MODEL CANNOT GO TO PRODUCTION")
    print("\n   üìã Required Actions:")
    print("      1. Remove or reduce bias in training data")
    print("      2. Apply fairness constraints during training")
    print("      3. Use bias mitigation techniques (reweighting, etc.)")
    print("      4. Consider removing protected attributes from features")
    print("      5. Consult legal and compliance teams")
    print("      6. Re-validate after changes")

print("=" * 80)

### Confusion Matrices by Protected Group

In [None]:
# Confusion matrices for gender
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

y_test_subset = y_test.loc[test_indices]

for idx, gender in enumerate(['Male', 'Female']):
    mask = test_gender == gender
    cm = confusion_matrix(y_test_subset[mask], y_pred_test[mask])
    
    sns.heatmap(cm, annot=True, fmt='d', cmap='Blues',
                xticklabels=['Rejected', 'Approved'],
                yticklabels=['Rejected', 'Approved'],
                ax=axes[idx], cbar_kws={'label': 'Count'})
    axes[idx].set_title(f'Confusion Matrix - {gender}', fontsize=13, fontweight='bold')
    axes[idx].set_ylabel('Actual')
    axes[idx].set_xlabel('Predicted')

plt.tight_layout()
plt.show()

# Calculate TPR, FPR by gender
print("\nüìä PERFORMANCE METRICS BY GENDER:")
for gender in ['Male', 'Female']:
    mask = test_gender == gender
    cm = confusion_matrix(y_test_subset[mask], y_pred_test[mask])
    tn, fp, fn, tp = cm.ravel()
    
    tpr = tp / (tp + fn) if (tp + fn) > 0 else 0
    fpr = fp / (fp + tn) if (fp + tn) > 0 else 0
    precision = tp / (tp + fp) if (tp + fp) > 0 else 0
    
    print(f"\n   {gender}:")
    print(f"      True Positive Rate (Recall): {tpr:.3f}")
    print(f"      False Positive Rate: {fpr:.3f}")
    print(f"      Precision: {precision:.3f}")

## Part 2 - Continuing in next section...

Due to notebook length, continuing with:
- Section 8: Robustness Testing
- Section 9: Uncertainty Quantification
- Section 10: Drift Detection
- Section 11-14: Reports, Decision, Monitoring, Conclusion