[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/YOUR_USERNAME/Digital-Finance-Introduction/blob/main/day_02/notebooks/NB04_Credit_Scoring.ipynb)

# NB04: Credit Scoring with Alternative Data

**Topic 2.3: Data-Driven Finance**

## Learning Objectives
- Build a simple credit scoring model using machine learning
- Understand how feature selection impacts predictions
- Recognize potential sources of algorithmic bias
- Explore the role of alternative data in modern finance

---

## Introduction

Traditional credit scoring relies on:
- Credit history
- Income verification
- Employment records

**Alternative data** includes:
- Mobile phone usage patterns
- Social media activity
- Utility bill payments
- E-commerce behavior

This notebook demonstrates how algorithms make lending decisions and where bias can creep in.

## 1. Setup and Data Generation

In [None]:
# Install required packages (uncomment if needed)
# !pip install scikit-learn pandas numpy matplotlib seaborn

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report, confusion_matrix, roc_auc_score, roc_curve
from sklearn.preprocessing import StandardScaler

# Set random seed for reproducibility
np.random.seed(42)

# Configure plotting
plt.style.use('seaborn-v0_8-darkgrid')
sns.set_palette('Set2')

print("Libraries imported successfully!")
print(f"NumPy version: {np.__version__}")
print(f"Pandas version: {pd.__version__}")

## 2. Generate Synthetic Borrower Data

We'll create a realistic dataset of loan applicants with:
- **Demographic features**: Age, gender, location
- **Traditional credit features**: Income, employment, credit history
- **Alternative data features**: Mobile usage, social media, e-commerce behavior

### Why Synthetic Data?

Real credit data is highly sensitive and regulated. Synthetic data allows us to:
1. Learn without privacy concerns
2. Control and understand data relationships
3. Explore bias scenarios safely

In [None]:
def generate_borrower_data(n_samples=2000):
    """
    Generate synthetic borrower data for credit scoring.
    
    The data includes realistic correlations between features and default probability.
    Some features intentionally introduce potential for bias to demonstrate fairness issues.
    
    Args:
        n_samples: Number of loan applicants to generate
    
    Returns:
        DataFrame with borrower features and default outcome
    """
    np.random.seed(42)
    
    # ======================
    # Demographic Features
    # ======================
    
    # Age: 18-70, normally distributed around 35
    age = np.clip(np.random.normal(35, 12, n_samples), 18, 70).astype(int)
    
    # Gender: Binary for simplicity (in reality, more inclusive)
    gender = np.random.choice(['Male', 'Female'], n_samples, p=[0.52, 0.48])
    
    # Location: Urban vs Rural (affects access to services)
    location = np.random.choice(['Urban', 'Rural'], n_samples, p=[0.65, 0.35])
    
    # Region: Different economic conditions
    region = np.random.choice(['North', 'South', 'East', 'West'], n_samples, 
                              p=[0.25, 0.30, 0.20, 0.25])
    
    # ======================
    # Traditional Credit Features
    # ======================
    
    # Annual income (correlated with age and location)
    base_income = 30000 + (age - 18) * 800  # Income increases with age
    location_multiplier = np.where(location == 'Urban', 1.2, 0.9)
    income = (base_income * location_multiplier * 
              np.random.uniform(0.7, 1.3, n_samples)).astype(int)
    
    # Employment length (years, correlated with age)
    max_employment = np.clip(age - 18, 0, 40)
    employment_length = (max_employment * np.random.beta(2, 2, n_samples)).astype(int)
    
    # Existing debt-to-income ratio
    debt_to_income = np.clip(np.random.exponential(0.25, n_samples), 0, 0.8)
    
    # Number of existing credit accounts
    num_credit_accounts = np.random.poisson(3, n_samples)
    
    # Credit history length (months)
    credit_history_months = np.clip(
        np.random.normal(employment_length * 12 * 0.8, 24, n_samples), 0, 360
    ).astype(int)
    
    # Number of late payments in past 2 years
    late_payments = np.random.poisson(0.8, n_samples)
    
    # Previous loan defaults (binary)
    previous_default = np.random.choice([0, 1], n_samples, p=[0.92, 0.08])
    
    # Loan amount requested
    loan_amount = np.clip(
        income * np.random.uniform(0.3, 1.5, n_samples), 5000, 100000
    ).astype(int)
    
    # ======================
    # Alternative Data Features
    # ======================
    
    # Mobile phone usage (monthly bill amount) - proxy for stability
    mobile_bill = np.clip(np.random.normal(50, 25, n_samples), 10, 200).astype(int)
    
    # Phone contract type (postpaid suggests stability)
    phone_contract = np.random.choice(['Prepaid', 'Postpaid'], n_samples, 
                                       p=[0.4, 0.6])
    
    # Social media connections (network size)
    social_connections = np.clip(
        np.random.exponential(200, n_samples), 10, 2000
    ).astype(int)
    
    # Online shopping frequency (transactions per month)
    online_shopping_freq = np.clip(
        np.random.poisson(5, n_samples), 0, 30
    )
    
    # Utility bill payment consistency (0-100%)
    utility_payment_rate = np.clip(
        np.random.beta(8, 2, n_samples), 0.5, 1.0
    )
    
    # Device type (proxy for economic status - problematic!)
    device_type = np.random.choice(['Budget', 'Mid-range', 'Premium'], n_samples,
                                    p=[0.35, 0.45, 0.20])
    
    # App usage patterns (financial apps installed)
    financial_apps = np.random.poisson(2, n_samples)
    
    # ======================
    # Generate Default Outcome
    # ======================
    
    # Calculate default probability based on features
    # (This is a simplified model - real credit scoring is more complex)
    
    default_score = np.zeros(n_samples)
    
    # Traditional factors (strong predictors)
    default_score += debt_to_income * 2.0  # Higher debt = higher risk
    default_score += late_payments * 0.3  # Late payments increase risk
    default_score += previous_default * 1.5  # Previous default is major risk
    default_score -= employment_length * 0.03  # Employment stability reduces risk
    default_score -= np.log1p(income) * 0.15  # Higher income reduces risk
    default_score -= credit_history_months * 0.002  # Longer history reduces risk
    
    # Alternative data factors (moderate predictors)
    default_score -= utility_payment_rate * 0.5  # Consistent payments reduce risk
    default_score -= np.where(phone_contract == 'Postpaid', 0.2, 0)  # Postpaid = stability
    default_score -= financial_apps * 0.05  # Financial awareness reduces risk
    
    # Add noise
    default_score += np.random.normal(0, 0.3, n_samples)
    
    # Convert to probability and then binary outcome
    default_prob = 1 / (1 + np.exp(-default_score + 0.5))  # Sigmoid
    default = (np.random.random(n_samples) < default_prob).astype(int)
    
    # Create DataFrame
    df = pd.DataFrame({
        # Demographics
        'age': age,
        'gender': gender,
        'location': location,
        'region': region,
        
        # Traditional credit features
        'annual_income': income,
        'employment_length': employment_length,
        'debt_to_income': debt_to_income.round(3),
        'num_credit_accounts': num_credit_accounts,
        'credit_history_months': credit_history_months,
        'late_payments': late_payments,
        'previous_default': previous_default,
        'loan_amount': loan_amount,
        
        # Alternative data
        'mobile_bill': mobile_bill,
        'phone_contract': phone_contract,
        'social_connections': social_connections,
        'online_shopping_freq': online_shopping_freq,
        'utility_payment_rate': utility_payment_rate.round(3),
        'device_type': device_type,
        'financial_apps': financial_apps,
        
        # Outcome
        'default': default
    })
    
    return df

# Generate the dataset
df = generate_borrower_data(n_samples=2000)

print("Borrower Dataset Generated")
print("=" * 50)
print(f"Total applicants: {len(df):,}")
print(f"Default rate: {df['default'].mean():.1%}")
print(f"\nFeatures: {len(df.columns) - 1}")
print(f"  - Demographics: 4")
print(f"  - Traditional credit: 8")
print(f"  - Alternative data: 7")

print("\nSample data:")
df.head()

In [None]:
# Basic statistics
print("Dataset Overview")
print("=" * 50)

print("\nNumerical Features:")
print(df.describe().round(2))

print("\nCategorical Features:")
for col in ['gender', 'location', 'region', 'phone_contract', 'device_type']:
    print(f"\n{col}:")
    print(df[col].value_counts())

## 3. Exploratory Data Analysis

Before building models, we need to understand our data:
- Feature distributions
- Correlations between features
- Relationships with the default outcome

In [None]:
# Visualize key feature distributions
fig, axes = plt.subplots(2, 3, figsize=(15, 10))

# Income distribution by default status
ax = axes[0, 0]
for default_val, label in [(0, 'No Default'), (1, 'Default')]:
    subset = df[df['default'] == default_val]['annual_income'] / 1000
    ax.hist(subset, bins=30, alpha=0.6, label=label, density=True)
ax.set_xlabel('Annual Income ($K)')
ax.set_ylabel('Density')
ax.set_title('Income Distribution by Default Status')
ax.legend()

# Age distribution
ax = axes[0, 1]
for default_val, label in [(0, 'No Default'), (1, 'Default')]:
    subset = df[df['default'] == default_val]['age']
    ax.hist(subset, bins=25, alpha=0.6, label=label, density=True)
ax.set_xlabel('Age')
ax.set_ylabel('Density')
ax.set_title('Age Distribution by Default Status')
ax.legend()

# Debt-to-income ratio
ax = axes[0, 2]
for default_val, label in [(0, 'No Default'), (1, 'Default')]:
    subset = df[df['default'] == default_val]['debt_to_income']
    ax.hist(subset, bins=30, alpha=0.6, label=label, density=True)
ax.set_xlabel('Debt-to-Income Ratio')
ax.set_ylabel('Density')
ax.set_title('DTI Distribution by Default Status')
ax.legend()

# Late payments
ax = axes[1, 0]
default_rates = df.groupby('late_payments')['default'].mean()
ax.bar(default_rates.index, default_rates.values, color='coral', edgecolor='black')
ax.set_xlabel('Number of Late Payments')
ax.set_ylabel('Default Rate')
ax.set_title('Default Rate by Late Payments')

# Utility payment rate
ax = axes[1, 1]
df['utility_bin'] = pd.cut(df['utility_payment_rate'], bins=5)
utility_rates = df.groupby('utility_bin', observed=True)['default'].mean()
ax.bar(range(len(utility_rates)), utility_rates.values, color='steelblue', edgecolor='black')
ax.set_xticks(range(len(utility_rates)))
ax.set_xticklabels([f'{x.left:.0%}-{x.right:.0%}' for x in utility_rates.index], rotation=45)
ax.set_xlabel('Utility Payment Rate')
ax.set_ylabel('Default Rate')
ax.set_title('Default Rate by Utility Payment Consistency')

# Device type (alternative data)
ax = axes[1, 2]
device_rates = df.groupby('device_type')['default'].mean().reindex(['Budget', 'Mid-range', 'Premium'])
ax.bar(device_rates.index, device_rates.values, color=['#ff9999', '#99ccff', '#99ff99'], edgecolor='black')
ax.set_xlabel('Device Type')
ax.set_ylabel('Default Rate')
ax.set_title('Default Rate by Device Type (Alternative Data)')

plt.tight_layout()
plt.show()

# Clean up temporary column
df.drop('utility_bin', axis=1, inplace=True)

In [None]:
# Correlation analysis
# Select numerical features for correlation
numerical_cols = ['age', 'annual_income', 'employment_length', 'debt_to_income',
                  'num_credit_accounts', 'credit_history_months', 'late_payments',
                  'previous_default', 'loan_amount', 'mobile_bill', 'social_connections',
                  'online_shopping_freq', 'utility_payment_rate', 'financial_apps', 'default']

correlation_matrix = df[numerical_cols].corr()

# Plot correlation heatmap
plt.figure(figsize=(14, 10))
mask = np.triu(np.ones_like(correlation_matrix, dtype=bool))
sns.heatmap(correlation_matrix, mask=mask, annot=True, fmt='.2f', 
            cmap='RdBu_r', center=0, vmin=-1, vmax=1,
            linewidths=0.5, square=True)
plt.title('Feature Correlation Matrix', fontsize=14, fontweight='bold')
plt.tight_layout()
plt.show()

# Show correlations with default
print("\nCorrelations with Default (sorted by absolute value):")
print("=" * 50)
default_corr = correlation_matrix['default'].drop('default').sort_values(key=abs, ascending=False)
for feature, corr in default_corr.items():
    direction = 'increases' if corr > 0 else 'decreases'
    print(f"  {feature:<25} {corr:>6.3f}  (higher value {direction} default risk)")

In [None]:
# Demographic breakdown of default rates
print("Default Rates by Demographic Groups")
print("=" * 50)

fig, axes = plt.subplots(1, 4, figsize=(16, 4))

# By gender
ax = axes[0]
gender_rates = df.groupby('gender')['default'].agg(['mean', 'count'])
bars = ax.bar(gender_rates.index, gender_rates['mean'], color=['#ff9999', '#99ccff'], edgecolor='black')
ax.set_ylabel('Default Rate')
ax.set_title('Default Rate by Gender')
for bar, count in zip(bars, gender_rates['count']):
    ax.text(bar.get_x() + bar.get_width()/2, bar.get_height() + 0.01, 
            f'n={count}', ha='center', fontsize=9)

# By location
ax = axes[1]
location_rates = df.groupby('location')['default'].agg(['mean', 'count'])
bars = ax.bar(location_rates.index, location_rates['mean'], color=['#99ff99', '#ffcc99'], edgecolor='black')
ax.set_ylabel('Default Rate')
ax.set_title('Default Rate by Location')
for bar, count in zip(bars, location_rates['count']):
    ax.text(bar.get_x() + bar.get_width()/2, bar.get_height() + 0.01, 
            f'n={count}', ha='center', fontsize=9)

# By region
ax = axes[2]
region_rates = df.groupby('region')['default'].agg(['mean', 'count'])
bars = ax.bar(region_rates.index, region_rates['mean'], color=sns.color_palette('Set2', 4), edgecolor='black')
ax.set_ylabel('Default Rate')
ax.set_title('Default Rate by Region')
for bar, count in zip(bars, region_rates['count']):
    ax.text(bar.get_x() + bar.get_width()/2, bar.get_height() + 0.01, 
            f'n={count}', ha='center', fontsize=9)

# By age group
ax = axes[3]
df['age_group'] = pd.cut(df['age'], bins=[18, 25, 35, 45, 55, 70], 
                         labels=['18-25', '26-35', '36-45', '46-55', '56-70'])
age_rates = df.groupby('age_group', observed=True)['default'].agg(['mean', 'count'])
bars = ax.bar(age_rates.index, age_rates['mean'], color=sns.color_palette('viridis', 5), edgecolor='black')
ax.set_ylabel('Default Rate')
ax.set_title('Default Rate by Age Group')
ax.tick_params(axis='x', rotation=45)
for bar, count in zip(bars, age_rates['count']):
    ax.text(bar.get_x() + bar.get_width()/2, bar.get_height() + 0.01, 
            f'n={count}', ha='center', fontsize=8)

plt.tight_layout()
plt.show()

# Print the statistics
print("\nGender:")
print(gender_rates.round(3))
print("\nLocation:")
print(location_rates.round(3))
print("\nRegion:")
print(region_rates.round(3))

## 4. Traditional Credit Score Model

First, we'll build a credit scoring model using only **traditional features** that are commonly used in credit decisions:
- Income
- Employment history
- Credit history
- Existing debt

We'll use **Logistic Regression** - a standard model in credit scoring because:
1. It's interpretable (we can see feature importance)
2. It outputs probabilities
3. It's required by regulations in many jurisdictions

In [None]:
# Define traditional features (no alternative data)
traditional_features = [
    'annual_income',
    'employment_length',
    'debt_to_income',
    'num_credit_accounts',
    'credit_history_months',
    'late_payments',
    'previous_default',
    'loan_amount'
]

print("Traditional Credit Features:")
print("=" * 50)
for i, feature in enumerate(traditional_features, 1):
    print(f"  {i}. {feature}")

# Prepare data
X_traditional = df[traditional_features].copy()
y = df['default'].copy()

# Train-test split (stratified to maintain class balance)
X_train_trad, X_test_trad, y_train, y_test = train_test_split(
    X_traditional, y, test_size=0.25, random_state=42, stratify=y
)

print(f"\nTraining set: {len(X_train_trad)} samples")
print(f"Test set: {len(X_test_trad)} samples")
print(f"Default rate in training: {y_train.mean():.1%}")
print(f"Default rate in test: {y_test.mean():.1%}")

In [None]:
# Standardize features for logistic regression
scaler_trad = StandardScaler()
X_train_trad_scaled = scaler_trad.fit_transform(X_train_trad)
X_test_trad_scaled = scaler_trad.transform(X_test_trad)

# Train logistic regression model
model_traditional = LogisticRegression(random_state=42, max_iter=1000)
model_traditional.fit(X_train_trad_scaled, y_train)

# Make predictions
y_pred_trad = model_traditional.predict(X_test_trad_scaled)
y_prob_trad = model_traditional.predict_proba(X_test_trad_scaled)[:, 1]

# Evaluate
print("Traditional Model Performance")
print("=" * 50)
print("\nClassification Report:")
print(classification_report(y_test, y_pred_trad, target_names=['No Default', 'Default']))

# Calculate AUC
auc_traditional = roc_auc_score(y_test, y_prob_trad)
print(f"\nAUC-ROC Score: {auc_traditional:.4f}")

In [None]:
# Feature importance for traditional model
feature_importance_trad = pd.DataFrame({
    'Feature': traditional_features,
    'Coefficient': model_traditional.coef_[0],
    'Abs_Coefficient': np.abs(model_traditional.coef_[0])
}).sort_values('Abs_Coefficient', ascending=True)

plt.figure(figsize=(10, 6))
colors = ['green' if x < 0 else 'red' for x in feature_importance_trad['Coefficient']]
plt.barh(feature_importance_trad['Feature'], feature_importance_trad['Coefficient'], color=colors)
plt.xlabel('Coefficient (Positive = Increases Default Risk)')
plt.title('Traditional Model: Feature Importance', fontsize=14, fontweight='bold')
plt.axvline(x=0, color='black', linestyle='-', linewidth=0.5)
plt.tight_layout()
plt.show()

print("\nFeature Interpretation:")
print("=" * 50)
for _, row in feature_importance_trad.sort_values('Abs_Coefficient', ascending=False).iterrows():
    direction = 'INCREASES' if row['Coefficient'] > 0 else 'DECREASES'
    print(f"  {row['Feature']:<25} {direction} default risk (coef: {row['Coefficient']:.3f})")

## 5. Alternative Data Model

Now let's add **alternative data features** to our model. These are increasingly used by:
- FinTech lenders
- Microfinance institutions
- Developing market lenders

### Benefits of Alternative Data:
- Can score "thin file" applicants (limited credit history)
- More real-time signals
- Potentially better predictions

### Risks of Alternative Data:
- Privacy concerns
- Potential for bias (e.g., device type correlates with income)
- Less regulatory clarity

In [None]:
# Define alternative data features
alternative_features = [
    'mobile_bill',
    'social_connections',
    'online_shopping_freq',
    'utility_payment_rate',
    'financial_apps'
]

# Encode categorical alternative features
df_encoded = df.copy()
df_encoded['phone_contract_postpaid'] = (df_encoded['phone_contract'] == 'Postpaid').astype(int)
df_encoded['device_premium'] = (df_encoded['device_type'] == 'Premium').astype(int)
df_encoded['device_budget'] = (df_encoded['device_type'] == 'Budget').astype(int)

# Combined features: traditional + alternative
all_features = traditional_features + alternative_features + [
    'phone_contract_postpaid', 'device_premium', 'device_budget'
]

print("Combined Feature Set (Traditional + Alternative):")
print("=" * 50)
print("\nTraditional Features:")
for f in traditional_features:
    print(f"  - {f}")
print("\nAlternative Data Features:")
for f in alternative_features + ['phone_contract_postpaid', 'device_premium', 'device_budget']:
    print(f"  - {f}")

print(f"\nTotal features: {len(all_features)}")

In [None]:
# Prepare combined dataset
X_combined = df_encoded[all_features].copy()

# Use same train-test indices for fair comparison
X_train_comb, X_test_comb, _, _ = train_test_split(
    X_combined, y, test_size=0.25, random_state=42, stratify=y
)

# Standardize
scaler_comb = StandardScaler()
X_train_comb_scaled = scaler_comb.fit_transform(X_train_comb)
X_test_comb_scaled = scaler_comb.transform(X_test_comb)

# Train combined model
model_combined = LogisticRegression(random_state=42, max_iter=1000)
model_combined.fit(X_train_comb_scaled, y_train)

# Make predictions
y_pred_comb = model_combined.predict(X_test_comb_scaled)
y_prob_comb = model_combined.predict_proba(X_test_comb_scaled)[:, 1]

# Evaluate
print("Combined Model Performance (Traditional + Alternative Data)")
print("=" * 60)
print("\nClassification Report:")
print(classification_report(y_test, y_pred_comb, target_names=['No Default', 'Default']))

# Calculate AUC
auc_combined = roc_auc_score(y_test, y_prob_comb)
print(f"\nAUC-ROC Score: {auc_combined:.4f}")

In [None]:
# Feature importance for combined model
feature_importance_comb = pd.DataFrame({
    'Feature': all_features,
    'Coefficient': model_combined.coef_[0],
    'Abs_Coefficient': np.abs(model_combined.coef_[0]),
    'Type': ['Traditional'] * len(traditional_features) + 
            ['Alternative'] * (len(all_features) - len(traditional_features))
}).sort_values('Abs_Coefficient', ascending=True)

plt.figure(figsize=(12, 8))
colors = ['#2ecc71' if row['Type'] == 'Traditional' and row['Coefficient'] < 0 else
          '#e74c3c' if row['Type'] == 'Traditional' and row['Coefficient'] > 0 else
          '#3498db' if row['Type'] == 'Alternative' and row['Coefficient'] < 0 else
          '#9b59b6' for _, row in feature_importance_comb.iterrows()]

bars = plt.barh(feature_importance_comb['Feature'], feature_importance_comb['Coefficient'], color=colors)
plt.xlabel('Coefficient (Positive = Increases Default Risk)')
plt.title('Combined Model: Feature Importance', fontsize=14, fontweight='bold')
plt.axvline(x=0, color='black', linestyle='-', linewidth=0.5)

# Add legend
from matplotlib.patches import Patch
legend_elements = [
    Patch(facecolor='#2ecc71', label='Traditional (reduces risk)'),
    Patch(facecolor='#e74c3c', label='Traditional (increases risk)'),
    Patch(facecolor='#3498db', label='Alternative (reduces risk)'),
    Patch(facecolor='#9b59b6', label='Alternative (increases risk)')
]
plt.legend(handles=legend_elements, loc='lower right')
plt.tight_layout()
plt.show()

## 6. Model Comparison

Let's compare the traditional and alternative data models using:
- **ROC Curves**: Trade-off between true positive and false positive rates
- **AUC Scores**: Area Under the ROC Curve (higher is better)
- **Precision-Recall**: Important for imbalanced datasets

In [None]:
# Calculate ROC curves
fpr_trad, tpr_trad, thresholds_trad = roc_curve(y_test, y_prob_trad)
fpr_comb, tpr_comb, thresholds_comb = roc_curve(y_test, y_prob_comb)

# Plot ROC curves
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# ROC Curve comparison
ax = axes[0]
ax.plot(fpr_trad, tpr_trad, label=f'Traditional Model (AUC = {auc_traditional:.3f})', 
        linewidth=2, color='steelblue')
ax.plot(fpr_comb, tpr_comb, label=f'Combined Model (AUC = {auc_combined:.3f})', 
        linewidth=2, color='coral')
ax.plot([0, 1], [0, 1], 'k--', label='Random Classifier', linewidth=1)
ax.set_xlabel('False Positive Rate', fontsize=12)
ax.set_ylabel('True Positive Rate', fontsize=12)
ax.set_title('ROC Curve Comparison', fontsize=14, fontweight='bold')
ax.legend(loc='lower right')
ax.grid(True, alpha=0.3)

# AUC improvement visualization
ax = axes[1]
models = ['Traditional\nModel', 'Combined Model\n(+Alternative Data)']
aucs = [auc_traditional, auc_combined]
colors = ['steelblue', 'coral']
bars = ax.bar(models, aucs, color=colors, edgecolor='black', linewidth=1.5)
ax.set_ylabel('AUC-ROC Score', fontsize=12)
ax.set_title('Model Performance Comparison', fontsize=14, fontweight='bold')
ax.set_ylim(0.5, 1.0)

# Add value labels
for bar, auc in zip(bars, aucs):
    ax.text(bar.get_x() + bar.get_width()/2, bar.get_height() + 0.01, 
            f'{auc:.3f}', ha='center', fontsize=12, fontweight='bold')

# Add improvement annotation
improvement = (auc_combined - auc_traditional) / auc_traditional * 100
ax.annotate(f'+{improvement:.1f}%', xy=(1, auc_combined), xytext=(1.3, (auc_combined + auc_traditional)/2),
            fontsize=12, color='green', fontweight='bold',
            arrowprops=dict(arrowstyle='->', color='green'))

plt.tight_layout()
plt.show()

print("\nModel Comparison Summary")
print("=" * 50)
print(f"Traditional Model AUC: {auc_traditional:.4f}")
print(f"Combined Model AUC:    {auc_combined:.4f}")
print(f"Improvement:           +{improvement:.2f}%")

In [None]:
# Confusion matrices comparison
fig, axes = plt.subplots(1, 2, figsize=(12, 5))

# Traditional model confusion matrix
cm_trad = confusion_matrix(y_test, y_pred_trad)
sns.heatmap(cm_trad, annot=True, fmt='d', cmap='Blues', ax=axes[0],
            xticklabels=['No Default', 'Default'],
            yticklabels=['No Default', 'Default'])
axes[0].set_xlabel('Predicted')
axes[0].set_ylabel('Actual')
axes[0].set_title('Traditional Model\nConfusion Matrix', fontsize=12, fontweight='bold')

# Combined model confusion matrix
cm_comb = confusion_matrix(y_test, y_pred_comb)
sns.heatmap(cm_comb, annot=True, fmt='d', cmap='Oranges', ax=axes[1],
            xticklabels=['No Default', 'Default'],
            yticklabels=['No Default', 'Default'])
axes[1].set_xlabel('Predicted')
axes[1].set_ylabel('Actual')
axes[1].set_title('Combined Model\nConfusion Matrix', fontsize=12, fontweight='bold')

plt.tight_layout()
plt.show()

# Calculate error metrics
print("\nError Analysis")
print("=" * 50)
print("\nTraditional Model:")
print(f"  False Positives (denied good borrowers): {cm_trad[0, 1]}")
print(f"  False Negatives (approved bad borrowers): {cm_trad[1, 0]}")
print(f"\nCombined Model:")
print(f"  False Positives (denied good borrowers): {cm_comb[0, 1]}")
print(f"  False Negatives (approved bad borrowers): {cm_comb[1, 0]}")

## 7. Algorithmic Bias Analysis

Credit scoring algorithms can inadvertently discriminate against protected groups. This can happen through:

1. **Direct discrimination**: Using protected attributes (illegal)
2. **Proxy discrimination**: Using features correlated with protected attributes
3. **Historical bias**: Training on biased historical data

Let's analyze our models for potential bias across demographic groups.

In [None]:
# Add predictions to test data for bias analysis
test_df = df.iloc[X_test_trad.index].copy()
test_df['pred_traditional'] = y_pred_trad
test_df['prob_traditional'] = y_prob_trad
test_df['pred_combined'] = y_pred_comb
test_df['prob_combined'] = y_prob_comb
test_df['actual'] = y_test.values

print("Bias Analysis Setup")
print("=" * 50)
print(f"Test set size: {len(test_df)}")
print(f"\nProtected groups analyzed:")
print("  - Gender (Male/Female)")
print("  - Location (Urban/Rural)")
print("  - Age Group")

In [None]:
def analyze_group_bias(df, group_col, actual_col, pred_col, prob_col):
    """
    Analyze model bias across a demographic group.
    
    Returns metrics for each group:
    - Approval rate (predicted as no default)
    - Actual default rate
    - False positive rate
    - False negative rate
    - Average predicted probability
    """
    results = []
    
    for group in df[group_col].unique():
        group_df = df[df[group_col] == group]
        
        n = len(group_df)
        approval_rate = (group_df[pred_col] == 0).mean()  # Approved = predicted no default
        actual_default_rate = group_df[actual_col].mean()
        avg_prob = group_df[prob_col].mean()
        
        # Calculate FPR and FNR
        true_neg = ((group_df[actual_col] == 0) & (group_df[pred_col] == 0)).sum()
        false_pos = ((group_df[actual_col] == 0) & (group_df[pred_col] == 1)).sum()
        true_pos = ((group_df[actual_col] == 1) & (group_df[pred_col] == 1)).sum()
        false_neg = ((group_df[actual_col] == 1) & (group_df[pred_col] == 0)).sum()
        
        fpr = false_pos / (false_pos + true_neg) if (false_pos + true_neg) > 0 else 0
        fnr = false_neg / (false_neg + true_pos) if (false_neg + true_pos) > 0 else 0
        
        results.append({
            'Group': group,
            'N': n,
            'Approval_Rate': approval_rate,
            'Actual_Default_Rate': actual_default_rate,
            'Avg_Risk_Score': avg_prob,
            'FPR': fpr,
            'FNR': fnr
        })
    
    return pd.DataFrame(results)

# Analyze bias by gender
print("Bias Analysis: Gender")
print("=" * 70)

gender_bias_trad = analyze_group_bias(test_df, 'gender', 'actual', 'pred_traditional', 'prob_traditional')
gender_bias_comb = analyze_group_bias(test_df, 'gender', 'actual', 'pred_combined', 'prob_combined')

print("\nTraditional Model:")
print(gender_bias_trad.round(3).to_string(index=False))
print("\nCombined Model (with Alternative Data):")
print(gender_bias_comb.round(3).to_string(index=False))

In [None]:
# Visualize bias across demographic groups
fig, axes = plt.subplots(2, 3, figsize=(15, 10))

# Gender bias
gender_bias_trad_plot = gender_bias_trad.set_index('Group')
gender_bias_comb_plot = gender_bias_comb.set_index('Group')

ax = axes[0, 0]
x = np.arange(len(gender_bias_trad_plot))
width = 0.35
ax.bar(x - width/2, gender_bias_trad_plot['Approval_Rate'], width, label='Traditional', color='steelblue')
ax.bar(x + width/2, gender_bias_comb_plot['Approval_Rate'], width, label='Combined', color='coral')
ax.set_xticks(x)
ax.set_xticklabels(gender_bias_trad_plot.index)
ax.set_ylabel('Approval Rate')
ax.set_title('Approval Rate by Gender', fontweight='bold')
ax.legend()
ax.axhline(y=test_df['pred_traditional'].mean(), color='steelblue', linestyle='--', alpha=0.5)

# Location bias
location_bias_trad = analyze_group_bias(test_df, 'location', 'actual', 'pred_traditional', 'prob_traditional')
location_bias_comb = analyze_group_bias(test_df, 'location', 'actual', 'pred_combined', 'prob_combined')

ax = axes[0, 1]
x = np.arange(len(location_bias_trad))
ax.bar(x - width/2, location_bias_trad['Approval_Rate'], width, label='Traditional', color='steelblue')
ax.bar(x + width/2, location_bias_comb['Approval_Rate'], width, label='Combined', color='coral')
ax.set_xticks(x)
ax.set_xticklabels(location_bias_trad['Group'])
ax.set_ylabel('Approval Rate')
ax.set_title('Approval Rate by Location', fontweight='bold')
ax.legend()

# Region bias
region_bias_trad = analyze_group_bias(test_df, 'region', 'actual', 'pred_traditional', 'prob_traditional')
region_bias_comb = analyze_group_bias(test_df, 'region', 'actual', 'pred_combined', 'prob_combined')

ax = axes[0, 2]
x = np.arange(len(region_bias_trad))
ax.bar(x - width/2, region_bias_trad['Approval_Rate'], width, label='Traditional', color='steelblue')
ax.bar(x + width/2, region_bias_comb['Approval_Rate'], width, label='Combined', color='coral')
ax.set_xticks(x)
ax.set_xticklabels(region_bias_trad['Group'])
ax.set_ylabel('Approval Rate')
ax.set_title('Approval Rate by Region', fontweight='bold')
ax.legend()

# False Positive Rates (denied good borrowers)
ax = axes[1, 0]
ax.bar(x - width/2, gender_bias_trad_plot['FPR'], width, label='Traditional', color='steelblue')
ax.bar(x + width/2, gender_bias_comb_plot['FPR'], width, label='Combined', color='coral')
ax.set_xticks(x)
ax.set_xticklabels(gender_bias_trad_plot.index)
ax.set_ylabel('False Positive Rate')
ax.set_title('FPR by Gender\n(Good borrowers denied)', fontweight='bold')
ax.legend()

# False Negative Rates (approved bad borrowers)
ax = axes[1, 1]
ax.bar(x - width/2, gender_bias_trad_plot['FNR'], width, label='Traditional', color='steelblue')
ax.bar(x + width/2, gender_bias_comb_plot['FNR'], width, label='Combined', color='coral')
ax.set_xticks(x)
ax.set_xticklabels(gender_bias_trad_plot.index)
ax.set_ylabel('False Negative Rate')
ax.set_title('FNR by Gender\n(Bad borrowers approved)', fontweight='bold')
ax.legend()

# Average risk scores
ax = axes[1, 2]
ax.bar(x - width/2, gender_bias_trad_plot['Avg_Risk_Score'], width, label='Traditional', color='steelblue')
ax.bar(x + width/2, gender_bias_comb_plot['Avg_Risk_Score'], width, label='Combined', color='coral')
ax.set_xticks(x)
ax.set_xticklabels(gender_bias_trad_plot.index)
ax.set_ylabel('Average Risk Score')
ax.set_title('Average Predicted Risk by Gender', fontweight='bold')
ax.legend()

plt.tight_layout()
plt.show()

In [None]:
# Detailed bias report
print("\n" + "=" * 70)
print("ALGORITHMIC BIAS REPORT")
print("=" * 70)

def calculate_disparity(df_bias, metric='Approval_Rate'):
    """Calculate disparity ratio between groups."""
    values = df_bias[metric].values
    if len(values) == 2:
        return min(values) / max(values)
    return values.min() / values.max()

print("\n1. GENDER DISPARITIES")
print("-" * 50)
approval_disparity_trad = calculate_disparity(gender_bias_trad)
approval_disparity_comb = calculate_disparity(gender_bias_comb)
print(f"Traditional Model - Approval Rate Ratio: {approval_disparity_trad:.3f}")
print(f"Combined Model - Approval Rate Ratio: {approval_disparity_comb:.3f}")
print(f"(Ratio of 1.0 = perfect parity, <0.8 may indicate bias)")

print("\n2. LOCATION DISPARITIES (Urban vs Rural)")
print("-" * 50)
loc_disparity_trad = calculate_disparity(location_bias_trad)
loc_disparity_comb = calculate_disparity(location_bias_comb)
print(f"Traditional Model - Approval Rate Ratio: {loc_disparity_trad:.3f}")
print(f"Combined Model - Approval Rate Ratio: {loc_disparity_comb:.3f}")

print("\n3. REGIONAL DISPARITIES")
print("-" * 50)
reg_disparity_trad = calculate_disparity(region_bias_trad)
reg_disparity_comb = calculate_disparity(region_bias_comb)
print(f"Traditional Model - Approval Rate Ratio: {reg_disparity_trad:.3f}")
print(f"Combined Model - Approval Rate Ratio: {reg_disparity_comb:.3f}")

print("\n" + "=" * 70)
print("KEY OBSERVATIONS")
print("=" * 70)
print("""
1. Adding alternative data can either increase or decrease disparities
   depending on how the alternative features correlate with protected groups.

2. Features like 'device_type' may serve as proxies for economic status,
   which can correlate with protected characteristics.

3. Location-based disparities may reflect historical inequalities in
   access to financial services (financial exclusion).

4. The "four-fifths rule" (80% rule) is a common legal threshold:
   if a protected group's approval rate is less than 80% of the
   highest group, adverse impact may be present.
""")

## 8. Fairness Metrics

There are several mathematical definitions of fairness. Unfortunately, they often conflict with each other!

### Key Fairness Metrics:

1. **Statistical Parity**: Equal approval rates across groups
2. **Equalized Odds**: Equal TPR and FPR across groups  
3. **Predictive Parity**: Equal precision across groups
4. **Calibration**: Predicted probabilities match actual rates

In [None]:
def calculate_fairness_metrics(df, group_col, actual_col, pred_col, prob_col):
    """
    Calculate comprehensive fairness metrics for a model.
    """
    groups = df[group_col].unique()
    metrics = {}
    
    for group in groups:
        group_df = df[df[group_col] == group]
        
        # Basic counts
        n = len(group_df)
        tp = ((group_df[actual_col] == 1) & (group_df[pred_col] == 1)).sum()
        tn = ((group_df[actual_col] == 0) & (group_df[pred_col] == 0)).sum()
        fp = ((group_df[actual_col] == 0) & (group_df[pred_col] == 1)).sum()
        fn = ((group_df[actual_col] == 1) & (group_df[pred_col] == 0)).sum()
        
        # Rates
        tpr = tp / (tp + fn) if (tp + fn) > 0 else 0  # Sensitivity/Recall
        fpr = fp / (fp + tn) if (fp + tn) > 0 else 0  # False Positive Rate
        precision = tp / (tp + fp) if (tp + fp) > 0 else 0
        approval_rate = (group_df[pred_col] == 0).mean()  # Approved = no default predicted
        
        metrics[group] = {
            'n': n,
            'approval_rate': approval_rate,  # Statistical Parity
            'tpr': tpr,  # Equalized Odds - component 1
            'fpr': fpr,  # Equalized Odds - component 2
            'precision': precision,  # Predictive Parity
            'base_rate': group_df[actual_col].mean()  # For calibration context
        }
    
    return pd.DataFrame(metrics).T

# Calculate fairness metrics for both models
print("Fairness Metrics Analysis")
print("=" * 70)

fairness_trad_gender = calculate_fairness_metrics(
    test_df, 'gender', 'actual', 'pred_traditional', 'prob_traditional'
)
fairness_comb_gender = calculate_fairness_metrics(
    test_df, 'gender', 'actual', 'pred_combined', 'prob_combined'
)

print("\nTraditional Model - Gender Fairness:")
print(fairness_trad_gender.round(3))

print("\nCombined Model - Gender Fairness:")
print(fairness_comb_gender.round(3))

In [None]:
# Visualize fairness metrics
fig, axes = plt.subplots(2, 2, figsize=(14, 10))

# Prepare data for plotting
metrics_to_plot = ['approval_rate', 'tpr', 'fpr', 'precision']
metric_names = ['Statistical Parity\n(Approval Rate)', 'True Positive Rate\n(Catches defaults)', 
                'False Positive Rate\n(Denies good borrowers)', 'Predictive Parity\n(Precision)']

for idx, (metric, name) in enumerate(zip(metrics_to_plot, metric_names)):
    ax = axes[idx // 2, idx % 2]
    
    groups = fairness_trad_gender.index
    x = np.arange(len(groups))
    width = 0.35
    
    ax.bar(x - width/2, fairness_trad_gender[metric], width, 
           label='Traditional', color='steelblue', edgecolor='black')
    ax.bar(x + width/2, fairness_comb_gender[metric], width, 
           label='Combined', color='coral', edgecolor='black')
    
    ax.set_xticks(x)
    ax.set_xticklabels(groups)
    ax.set_ylabel(metric.replace('_', ' ').title())
    ax.set_title(name, fontsize=12, fontweight='bold')
    ax.legend()
    
    # Add parity line
    if metric == 'approval_rate':
        overall_rate = test_df['pred_traditional'].mean()
        ax.axhline(y=1-overall_rate, color='gray', linestyle='--', alpha=0.5, label='Overall')

plt.tight_layout()
plt.show()

In [None]:
# Calculate and display fairness disparities
def calculate_fairness_disparities(fairness_df):
    """
    Calculate disparity ratios for each fairness metric.
    Closer to 1.0 = more fair
    """
    disparities = {}
    for col in fairness_df.columns:
        if col != 'n':
            values = fairness_df[col].values
            if max(values) > 0:
                disparities[col] = min(values) / max(values)
            else:
                disparities[col] = 1.0
    return disparities

print("\nFairness Disparity Ratios (closer to 1.0 = more fair)")
print("=" * 70)
print("\nThreshold: Values < 0.80 may indicate unfair treatment (four-fifths rule)")
print("-" * 70)

disp_trad = calculate_fairness_disparities(fairness_trad_gender)
disp_comb = calculate_fairness_disparities(fairness_comb_gender)

print(f"\n{'Metric':<25} {'Traditional':<15} {'Combined':<15} {'Status'}")
print("-" * 70)

for metric in disp_trad.keys():
    trad_val = disp_trad[metric]
    comb_val = disp_comb[metric]
    
    # Determine which is better
    better = "Combined" if comb_val > trad_val else "Traditional" if trad_val > comb_val else "Equal"
    
    # Check if either violates four-fifths rule
    warning = ""
    if trad_val < 0.80 or comb_val < 0.80:
        warning = " [!]"
    
    print(f"{metric:<25} {trad_val:<15.3f} {comb_val:<15.3f} {better}{warning}")

print("\n[!] = Below 0.80 threshold (potential adverse impact)")

In [None]:
# The Fairness-Accuracy Trade-off
print("\n" + "=" * 70)
print("THE FAIRNESS-ACCURACY TRADE-OFF")
print("=" * 70)
print("""
A fundamental challenge in fair ML: we often cannot satisfy all fairness
criteria simultaneously while maximizing accuracy.

Key Trade-offs:

1. STATISTICAL PARITY vs ACCURACY
   - Enforcing equal approval rates may require approving higher-risk
     applicants from disadvantaged groups, reducing accuracy.

2. EQUALIZED ODDS vs STATISTICAL PARITY
   - If base rates differ between groups (they often do due to
     historical inequality), equal TPR/FPR means unequal approval rates.

3. INDIVIDUAL vs GROUP FAIRNESS
   - Group fairness (equal rates across groups) may conflict with
     treating similar individuals similarly.

4. SHORT-TERM vs LONG-TERM FAIRNESS
   - Relaxing standards for disadvantaged groups now may perpetuate
     stereotypes or lead to higher default rates, harming them long-term.

There is no universally "correct" answer - these are value judgments
that should involve stakeholders, regulators, and affected communities.
""")

## 9. Challenge Exercises

Apply what you've learned with these hands-on challenges!

### Challenge 1: Build a Fair Credit Score

Implement a credit scoring model that achieves better fairness metrics while maintaining reasonable accuracy. Try techniques like:
- Removing biased features
- Reweighting training samples
- Post-processing predictions

In [None]:
# Challenge 1: Your code here

def build_fair_model(X_train, y_train, X_test, y_test, protected_group, fairness_weight=0.5):
    """
    Build a credit scoring model with fairness constraints.
    
    Args:
        X_train, y_train: Training data
        X_test, y_test: Test data
        protected_group: Series indicating protected group membership
        fairness_weight: How much to prioritize fairness vs accuracy (0-1)
    
    Returns:
        Trained model, predictions, fairness metrics
    
    TODO: Implement a fair model using one of these approaches:
    1. Pre-processing: Remove or modify biased features
    2. In-processing: Add fairness constraints to the optimization
    3. Post-processing: Adjust predictions to achieve fairness
    """
    # Hints:
    # - Try removing features correlated with protected attributes
    # - Try reweighting samples to balance group representation
    # - Try adjusting decision thresholds per group
    
    pass  # Replace with your implementation

# Test your fair model
# fair_model, fair_preds, fair_metrics = build_fair_model(...)
# Compare with original model

### Challenge 2: Alternative Data Feature Engineering

Create new features from the alternative data that might be more predictive or less biased.

In [None]:
# Challenge 2: Your code here

def engineer_alternative_features(df):
    """
    Create new features from alternative data.
    
    Args:
        df: DataFrame with alternative data features
    
    Returns:
        DataFrame with new engineered features
    
    TODO: Create features like:
    - Mobile bill to income ratio
    - Financial app engagement score
    - Digital footprint score (combination of online behaviors)
    - Payment consistency index
    """
    # Hints:
    # - Combine related features (e.g., mobile_bill / annual_income)
    # - Create interaction terms
    # - Bin continuous features into categories
    # - Create aggregate "digital stability" scores
    
    pass  # Replace with your implementation

# Test your features
# df_engineered = engineer_alternative_features(df)
# Train a model with new features and compare performance

### Challenge 3: Explainable Credit Decisions

Build a system that provides explanations for credit decisions - crucial for regulatory compliance and customer trust.

In [None]:
# Challenge 3: Your code here

def explain_credit_decision(model, scaler, features, applicant_data, feature_names):
    """
    Generate a human-readable explanation for a credit decision.
    
    Args:
        model: Trained logistic regression model
        scaler: StandardScaler used for training
        features: Feature values for the applicant
        feature_names: Names of features
    
    Returns:
        Dictionary with decision and explanation
    
    TODO: Implement explanation logic:
    1. Get prediction and probability
    2. Identify top factors contributing to the decision
    3. Generate actionable feedback for rejected applicants
    """
    # Hints:
    # - Use model coefficients to identify important features
    # - Compare applicant values to approval thresholds
    # - Suggest specific improvements (e.g., "reduce debt by $X")
    # - Highlight both positive and negative factors
    
    pass  # Replace with your implementation

# Test your explanation system
# Pick a rejected applicant and generate explanation
# explanation = explain_credit_decision(model_combined, scaler_comb, ...)
# print(explanation)

## Summary

In this notebook, you learned about credit scoring with machine learning:

### Key Concepts Covered

1. **Credit Scoring Fundamentals**:
   - Traditional features: income, employment, credit history
   - Alternative data: mobile usage, social signals, digital footprint
   - Logistic regression for interpretable credit models

2. **Model Development**:
   - Feature engineering and selection
   - Train-test splitting with stratification
   - Standardization for logistic regression

3. **Model Evaluation**:
   - ROC curves and AUC scores
   - Confusion matrices
   - Precision, recall, and F1 scores

4. **Algorithmic Bias**:
   - Sources of bias in credit models
   - Proxy discrimination through correlated features
   - The four-fifths (80%) rule

5. **Fairness Metrics**:
   - Statistical parity (equal approval rates)
   - Equalized odds (equal TPR and FPR)
   - Predictive parity (equal precision)
   - The impossibility of satisfying all metrics simultaneously

### Real-World Applications

- **Consumer Lending**: Credit cards, personal loans
- **Mortgage Underwriting**: Home loan decisions
- **FinTech Lending**: Alternative data for thin-file borrowers
- **Microfinance**: Credit scoring in emerging markets
- **Buy-Now-Pay-Later**: Real-time credit decisions

### Regulatory Considerations

- **Equal Credit Opportunity Act (ECOA)**: Prohibits discrimination in lending
- **Fair Credit Reporting Act (FCRA)**: Regulates credit information use
- **EU AI Act**: Classifies credit scoring as high-risk AI
- **Right to Explanation**: Many jurisdictions require explainable decisions

### Ethical Considerations

- Alternative data can either reduce or amplify existing inequalities
- Accuracy improvements must be weighed against fairness implications
- Transparency and explainability are crucial for trust
- Historical data reflects historical biases

### Further Reading

- [Fairlearn: ML Fairness Library](https://fairlearn.org/)
- [AI Fairness 360 (IBM)](https://aif360.mybluemix.net/)
- [The Lending Club Dataset](https://www.kaggle.com/wordsforthewise/lending-club)
- [CFPB Reports on Fair Lending](https://www.consumerfinance.gov/data-research/research-reports/)
- ["Fairness and Machine Learning" (Barocas, Hardt, Narayanan)](https://fairmlbook.org/)