# German Credit Risk Analysis: Complete Walkthrough

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/GlassAlpha/glassalpha/blob/main/examples/notebooks/german_credit_walkthrough.ipynb)

**Complete ML audit workflow**: Data exploration → Model training → Fairness analysis → SHAP explanations → Calibration → Professional PDF report

**Dataset**: German Credit (1000 applications) | **Protected Attributes**: Gender, Age, Foreign Worker

**API Reference**: [`from_model()` documentation](https://glassalpha.com/reference/api/api-audit/) | [User Guide](https://glassalpha.com/getting-started/quickstart/)

## Step 1: Installation

In [None]:
%pip install -q glassalpha[explain,xgboost]

In [None]:
"""Environment verification for reproducibility"""
import sys, platform, random, numpy as np, pandas as pd, matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from xgboost import XGBClassifier
import glassalpha as ga

SEED = 42
random.seed(SEED)
np.random.seed(SEED)

print({
    "python": sys.version.split()[0],
    "platform": platform.platform(),
    "glassalpha": getattr(ga, "__version__", "dev"),
    "seed": SEED
})

## Step 2: Load Data

In [None]:
df = ga.datasets.load_german_credit()
print(f"Dataset: {df.shape[0]} samples, {df.shape[1]} features")
print(f"Target balance: {df['credit_risk'].mean():.1%} good credit")
df.head()

## Step 3: Train Models

In [None]:
protected_attrs = ['gender', 'age_group', 'foreign_worker']
feature_cols = [c for c in df.columns if c not in ['credit_risk'] + protected_attrs]
X, y = df[feature_cols], df['credit_risk']
protected_data = df[protected_attrs]

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=SEED, stratify=y)
print(f"Train: {len(X_train)} | Test: {len(X_test)}")

In [None]:
rf = RandomForestClassifier(n_estimators=100, max_depth=5, random_state=SEED).fit(X_train, y_train)
xgb = XGBClassifier(n_estimators=100, max_depth=3, random_state=SEED, eval_metric='logloss').fit(X_train, y_train)

print(f"RandomForest test acc: {rf.score(X_test, y_test):.3f}")
print(f"XGBoost test acc: {xgb.score(X_test, y_test):.3f}")
model = xgb if xgb.score(X_test, y_test) > rf.score(X_test, y_test) else rf
print(f"\n✓ Selected: {'XGBoost' if model == xgb else 'RandomForest'}")

## Step 4: Generate Audit

In [None]:
result = ga.audit.from_model(
    model=model,
    X_test=X_test,
    y_test=y_test,
    protected_attributes={
        'gender': protected_data.loc[X_test.index, 'gender'],
        'age_group': protected_data.loc[X_test.index, 'age_group'],
        'foreign_worker': protected_data.loc[X_test.index, 'foreign_worker']
    },
    random_seed=SEED
)
result  # Display inline

## Step 5: Performance Analysis

In [None]:
print(f"Accuracy: {result.performance.accuracy:.3f}")
print(f"AUC-ROC: {result.performance.auc_roc:.3f}")
print(f"Precision: {result.performance.precision:.3f}")
print(f"Recall: {result.performance.recall:.3f}")

fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 4))
result.performance.plot_confusion_matrix(ax=ax1)
result.performance.plot_roc_curve(ax=ax2)
plt.tight_layout(); plt.show()

## Step 6: Fairness Analysis

In [None]:
print(f"Demographic Parity: {result.fairness.demographic_parity_difference:.3f}")
print(f"Equal Opportunity: {result.fairness.equal_opportunity_difference:.3f}")
print(f"\nBias detected: {'⚠️ YES' if result.fairness.has_bias(0.10) else '✓ NO'} (10% threshold)")

result.fairness.plot_group_metrics()
plt.title('Fairness Across Protected Groups')
plt.show()

## Step 7: Calibration

In [None]:
print(f"Expected Calibration Error: {result.calibration.expected_calibration_error:.4f}")
print(f"Brier Score: {result.calibration.brier_score:.4f}")
print(f"\nCalibration: {'✓ PASS' if result.calibration.expected_calibration_error < 0.05 else '⚠️ WARNING'} (ECE < 0.05 target)")

result.calibration.plot()
plt.show()

## Step 8: SHAP Explanations

In [None]:
print("Top 10 Important Features:\n")
print(result.explanations.feature_importance.head(10))

fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 5))
result.explanations.plot_importance(top_n=10, ax=ax1)
result.explanations.plot_summary(ax=ax2)
plt.tight_layout(); plt.show()

## Step 9: Export Audit Report

In [None]:
result.to_pdf('german_credit_audit.pdf')
result.to_json('metrics.json')
result.to_config('audit_config.yaml')
print('✓ Exported: PDF report, metrics JSON, config YAML')

## Summary

**Performance**: Strong accuracy and AUC-ROC
**Fairness**: Analyzed across gender, age, foreign worker status
**Calibration**: ECE indicates prediction reliability
**Explainability**: SHAP values provide feature attribution

**Next Steps**: Review PDF report, address any fairness gaps, monitor in production