# German Credit Regulatory Implications

This project implemented a Basel III-compliant Internal Ratings-Based (IRB) credit risk system to assess capital requirements for a loan portfolio. The model calculates Probability of Default (PD), Risk-Weighted Assets (RWA), and regulatory capital while incorporating stress testing to evaluate resilience under adverse economic conditions.

This project  developed a comprehensive, Basel III-compliant credit risk framework that transforms raw loan data into actionable regulatory capital insights. By integrating machine learning with financial risk modeling, the system provides banks with a powerful tool for default prediction, capital adequacy assessment, and stress testing.

#### Strategic Value

This system enables banks to:

- Proactively manage risk through PD monitoring
- Optimize capital allocation per Basel requirements
- Demonstrate regulatory compliance with auditable calculations

While the framework provides a robust foundation for internal ratings-based approaches, its true value will emerge through iterative refinement using real-world portfolio data. The project demonstrates how machine learning and regulatory finance can converge to create smarter risk management systems.

In [1]:
"""
IRB Credit Risk System - Basel III Compliant Capital Calculator

This system calculates regulatory capital requirements using the Internal Ratings-Based (IRB) approach
with the following components:
1. Data preprocessing and feature engineering
2. Probability of Default (PD) modeling
3. Risk Weighted Assets (RWA) calculation
4. Stress testing capabilities
5. Reporting and analysis tools
"""

import pandas as pd
import numpy as np
from scipy.stats import norm
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import roc_auc_score
from sklearn.utils.class_weight import compute_sample_weight
from sklearn.base import BaseEstimator, TransformerMixin
from sklearn.exceptions import NotFittedError

## Data Loading & Preprocessing

Loads raw credit data, cleans it, and engineers financial risk features.

In [2]:
class CreditDataPreprocessor:
    """Handles all data loading and preprocessing operations"""
    
    DEFAULT_COLUMN_MAPPING = {
        'laufkont': 'Status',
        'laufzeit': 'Duration',
        'moral': 'CreditHistory',
        'verw': 'Purpose',
        'hoehe': 'Amount',
        'sparkont': 'Savings',
        'beszeit': 'EmploymentDuration',
        'rate': 'InstallmentRate',
        'famges': 'PersonalStatus',
        'buerge': 'OtherDebtors',
        'wohnzeit': 'ResidenceDuration',
        'verm': 'Property',
        'alter': 'Age',
        'weitkred': 'OtherInstallments',
        'wohn': 'Housing',
        'bishkred': 'NumCredits',
        'beruf': 'Job',
        'pers': 'Dependents',
        'telef': 'Telephone',
        'gastarb': 'ForeignWorker',
        'kredit': 'Default'
    }
    
    def __init__(self, column_mapping=None, min_bin_size=50):
        """
        Args:
            column_mapping: Dictionary for renaming columns
            min_bin_size: Minimum samples per bin for numerical features
        """
        self.column_mapping = column_mapping or self.DEFAULT_COLUMN_MAPPING
        self.min_bin_size = min_bin_size
        self.feature_stats_ = {}
        
    def load_data(self, filepath):
        """Load and validate credit data"""
        df = pd.read_csv(filepath)
        
        # Rename columns using mapping
        df = df.rename(columns={k: v for k, v in self.column_mapping.items() 
                               if k in df.columns})
        
        # Validate required columns
        required_columns = ['Duration', 'Amount', 'Age', 'Default']
        missing = [col for col in required_columns if col not in df.columns]
        if missing:
            raise ValueError(f"Missing required columns: {missing}")
            
        return df
    
    def preprocess_data(self, df):
        """Clean and transform raw data"""
        # Convert target: 1=default, 0=non-default
        if df['Default'].max() == 2:  # German credit data format
            df['Default'] = df['Default'] - 1
            
        # Handle missing values
        df = df.dropna()
        
        # Add financial ratios
        df = self._add_financial_features(df)
        
        # Store feature statistics
        self._store_feature_stats(df)
        
        return df
    
    def _add_financial_features(self, df):
        """Create financial ratios and risk indicators"""
        df = df.copy()
        
        # Liquidity ratios
        df['DebtToIncome'] = df['Amount'] / (df['Duration'] + 1e-6)
        df['InstallmentBurden'] = df['InstallmentRate'] / (df['Amount'] + 1e-6)
        
        # Stability indicators
        df['AgeSquared'] = df['Age'] ** 2
        df['LogAmount'] = np.log(df['Amount'] + 1)
        
        return df
    
    def _store_feature_stats(self, df):
        """Store descriptive statistics for features"""
        self.feature_stats_ = {
            'mean': df.mean(),
            'std': df.std(),
            'min': df.min(),
            'max': df.max()
        }

## Weight of Evidence (WoE) Transformation

Transforms categorical/numerical features into WoE values for better model interpretability.

In [3]:
class WoETransformer(BaseEstimator, TransformerMixin):
    """Weight of Evidence transformation for credit risk features"""
    
    def __init__(self, n_bins=5, min_bin_size=50, epsilon=1e-6):
        """
        Args:
            n_bins: Number of bins for numerical features
            min_bin_size: Minimum samples per bin
            epsilon: Small constant for numerical stability
        """
        self.n_bins = n_bins
        self.min_bin_size = min_bin_size
        self.epsilon = epsilon
        self.bin_edges_ = {}
        self.woe_dict_ = {}
        self.iv_dict_ = {}
        
    def fit(self, X, y):
        """Calculate WoE and IV for all features"""
        X = pd.DataFrame(X).copy()
        y = pd.Series(y).copy()
        
        for col in X.columns:
            if pd.api.types.is_numeric_dtype(X[col]):
                self._fit_numeric(X[col], y, col)
            else:
                self._fit_categorical(X[col], y, col)
                
            self._check_monotonicity(col)
            
        return self
    
    def transform(self, X):
        """Transform features to WoE values"""
        if not hasattr(self, 'woe_dict_'):
            raise NotFittedError("Transformer not fitted yet")
            
        X = pd.DataFrame(X).copy()
        X_woe = pd.DataFrame(index=X.index)
        
        for col in self.woe_dict_:
            if col not in X.columns:
                raise ValueError(f"Feature {col} missing in transform data")
                
            if col in self.bin_edges_:
                binned = pd.cut(X[col], bins=self.bin_edges_[col], include_lowest=True)
                X_woe[col] = binned.astype(str).map(self.woe_dict_[col]['woe'])
            else:
                X_woe[col] = X[col].astype(str).map(self.woe_dict_[col]['woe'])
                
            # Handle unseen categories/missing
            X_woe[col] = X_woe[col].fillna(self.woe_dict_[col]['woe'].mean())
            
        return X_woe
    
    def _fit_numeric(self, x, y, col):
        """Calculate WoE for numeric features"""
        try:
            # Try quantile binning first
            bins = pd.qcut(x, q=self.n_bins, duplicates='drop', retbins=True)[1]
            
            # Check bin sizes
            bin_counts = pd.cut(x, bins=bins).value_counts()
            if any(bin_counts < self.min_bin_size):
                bins = np.histogram_bin_edges(x, bins='doane')
                
            self.bin_edges_[col] = bins
            binned = pd.cut(x, bins=bins, include_lowest=True)
            crosstab, iv = self._calculate_woe(binned.astype(str), y)
            
        except Exception as e:
            print(f"Warning: Could not bin {col} with quantiles: {str(e)}")
            # Fall back to equal-width bins
            bins = np.linspace(x.min(), x.max(), self.n_bins + 1)
            self.bin_edges_[col] = bins
            binned = pd.cut(x, bins=bins, include_lowest=True)
            crosstab, iv = self._calculate_woe(binned.astype(str), y)
            
        self.woe_dict_[col] = crosstab
        self.iv_dict_[col] = iv
        
    def _fit_categorical(self, x, y, col):
        """Calculate WoE for categorical features"""
        crosstab, iv = self._calculate_woe(x.astype(str), y)
        self.woe_dict_[col] = crosstab
        self.iv_dict_[col] = iv
        
    def _calculate_woe(self, x, y):
        """Calculate Weight of Evidence and Information Value"""
        crosstab = pd.crosstab(x, y, margins=False)
        
        # Ensure both classes exist
        if len(crosstab.columns) < 2:
            missing_class = 1 if 0 in crosstab.columns else 0
            crosstab[missing_class] = self.epsilon
            
        crosstab.columns = ['good', 'bad']
        crosstab['total'] = crosstab['good'] + crosstab['bad']
        
        # Stable WoE calculation
        crosstab['p_good'] = (crosstab['good'] + 0.5) / (crosstab['good'].sum() + 0.5)
        crosstab['p_bad'] = (crosstab['bad'] + 0.5) / (crosstab['bad'].sum() + 0.5)
        crosstab['woe'] = np.log(crosstab['p_good'] / crosstab['p_bad'])
        crosstab['iv'] = (crosstab['p_good'] - crosstab['p_bad']) * crosstab['woe']
        
        return crosstab.sort_values('woe'), crosstab['iv'].sum()
    
    def _check_monotonicity(self, col):
        """Check for monotonic WoE patterns"""
        woe = self.woe_dict_[col]['woe']
        if not (woe.is_monotonic_increasing or woe.is_monotonic_decreasing):
            print(f"Warning: Non-monotonic WoE for {col} - consider manual binning")
            
    def get_feature_importance(self):
        """Return feature importance based on Information Value"""
        iv_df = pd.DataFrame({
            'Feature': list(self.iv_dict_.keys()),
            'IV': list(self.iv_dict_.values())
        }).sort_values('IV', ascending=False)
        
        iv_df['Strength'] = pd.cut(
            iv_df['IV'],
            bins=[-np.inf, 0.02, 0.1, 0.3, np.inf],
            labels=['Unpredictive', 'Weak', 'Medium', 'Strong']
        )
        return iv_df

## Probability of Default (PD) Modeling 

Trains a machine learning model to predict the likelihood of default.

This also includes:

- Capital Calculation (Basel III Formula): Computes Risk-Weighted Assets (RWA) and Regulatory Capital using Basel III rules.
- Stress Testing: Simulates how capital requirements change under economic stress (e.g., recession).
- - Portfolio Analysis & Reporting: Generates risk insights and regulatory reports.

In [4]:
class IRBCreditModel:
    """Basel III IRB Approach Credit Risk Model"""
    
    def __init__(self, lgd=0.45, confidence=0.999, maturity=2.5):
        """
        Args:
            lgd: Loss Given Default (45%)
            confidence: Confidence level (99.9% for Basel III)
            maturity: Effective maturity in years
        """
        self.lgd = lgd
        self.confidence = confidence
        self.maturity = maturity
        self.pd_model = None
        self.preprocessor = CreditDataPreprocessor()
        self.woe_transformer = None
        
    def load_and_preprocess(self, filepath):
        """Load and preprocess credit data"""
        df = self.preprocessor.load_data(filepath)
        df = self.preprocessor.preprocess_data(df)
        return df
    
    def train_pd_model(self, X, y):
        """Train Probability of Default model"""
        sample_weights = compute_sample_weight('balanced', y)
        
        model = Pipeline([
            ('scaler', StandardScaler()),
            ('classifier', GradientBoostingClassifier(
                n_estimators=150,
                max_depth=3,
                min_samples_leaf=50,
                subsample=0.8,
                random_state=42
            ))
        ])
        
        model.fit(X, y, classifier__sample_weight=sample_weights)
        return model
    
    def calculate_capital(self, df, exposure_col='Amount'):
        """Calculate capital requirements for portfolio"""
        # Select features
        features = ['Amount', 'Duration', 'Age', 'DebtToIncome', 'InstallmentBurden']
        features = [f for f in features if f in df.columns]
        
        X = df[features]
        y = df['Default']
        
        # Train model if not already trained
        if self.pd_model is None:
            print("Training PD model...")
            self.pd_model = self.train_pd_model(X, y)
            
        # Predict PDs
        df['PD'] = self.pd_model.predict_proba(X)[:, 1]
        
        # Calculate capital requirements
        df['RWA'] = df.apply(lambda row: self._calculate_rwa(row['PD'], row[exposure_col]), axis=1)
        df['Capital'] = df['RWA'] * 0.08  # 8% of RWA
        
        # Portfolio summary
        summary = {
            'TotalExposure': df[exposure_col].sum(),
            'AvgPD': (df['PD'] * df[exposure_col]).sum() / df[exposure_col].sum(),
            'TotalRWA': df['RWA'].sum(),
            'TotalCapital': df['Capital'].sum(),
            'CapitalRatio': df['Capital'].sum() / df[exposure_col].sum(),
            'ModelAUC': roc_auc_score(y, df['PD'])
        }
        
        return df, summary
    
    def _calculate_rwa(self, pd, ead):
        """Calculate Risk Weighted Assets per Basel III formula"""
        # Ensure PD is within reasonable bounds
        pd = np.clip(pd, 0.0001, 0.9999)
        
        # Correlation factor
        r = (0.12 * (1 - np.exp(-50 * pd)) / (1 - np.exp(-50))) + \
            (0.24 * (1 - (1 - np.exp(-50 * pd)) / (1 - np.exp(-50))))
        
        # Maturity adjustment
        b = (0.11852 - 0.05478 * np.log(pd)) ** 2
        maturity_adj = (1 + (self.maturity - 2.5) * b) / (1 - 1.5 * b)
        
        # Capital requirement
        capital = (self.lgd * norm.cdf(
            (norm.ppf(pd) + np.sqrt(r) * norm.ppf(self.confidence)) / np.sqrt(1 - r)
        ) - pd * self.lgd) * 1.06 * maturity_adj
        
        # Convert to RWA
        rwa = capital * ead * 12.5
        return rwa
    
    def stress_test(self, df, scenario_params):
        """Apply stress scenario to portfolio"""
        stressed_df = df.copy()
        
        # Apply PD shock
        if 'pd_shock' in scenario_params:
            stressed_df['PD'] *= scenario_params['pd_shock']
            stressed_df['PD'] = np.clip(stressed_df['PD'], 0, 1)
            
        # Apply LGD shock
        stressed_lgd = min(1.0, self.lgd * scenario_params.get('lgd_shock', 1.0))
        
        # Apply EAD shock if specified
        exposure_col = 'Amount'
        if 'ead_shock' in scenario_params:
            stressed_df[exposure_col] *= scenario_params['ead_shock']
            
        # Store original LGD
        original_lgd = self.lgd
        self.lgd = stressed_lgd
        
        # Recalculate capital
        stressed_df['RWA'] = stressed_df.apply(
            lambda row: self._calculate_rwa(row['PD'], row[exposure_col]), axis=1
        )
        stressed_df['Capital'] = stressed_df['RWA'] * 0.08
        
        # Reset LGD
        self.lgd = original_lgd
        
        # Stressed summary
        summary = {
            'TotalExposure': stressed_df[exposure_col].sum(),
            'AvgPD': (stressed_df['PD'] * stressed_df[exposure_col]).sum() / 
                    stressed_df[exposure_col].sum(),
            'TotalRWA': stressed_df['RWA'].sum(),
            'TotalCapital': stressed_df['Capital'].sum(),
            'CapitalRatio': stressed_df['Capital'].sum() / stressed_df[exposure_col].sum(),
            'StressedLGD': stressed_lgd
        }
        
        return stressed_df, summary
    
    def analyze_portfolio(self, df):
        """Generate portfolio analysis reports"""
        reports = {}
        
        # Risk grade distribution
        df['RiskGrade'] = pd.qcut(
            df['PD'], 
            q=5, 
            labels=['A (Lowest)', 'B', 'C', 'D', 'E (Highest)']
        )
        reports['RiskGrades'] = df['RiskGrade'].value_counts().sort_index()
        
        # Top risky exposures
        reports['TopRisky'] = df.nlargest(5, 'PD')[
            ['Duration', 'Amount', 'Age', 'PD', 'Capital']
        ]
        
        return reports

- Risk Grading: Loans are bucketed into A (safest) to E (riskiest).
- Top Risky Loans: Identifies high-PD exposures needing attention.

## Reporting

In [5]:
def main():
    """Main execution function"""
    try:
        # Initialize the IRB system
        irb = IRBCreditModel(
            lgd=0.45,
            confidence=0.999,
            maturity=2.5
        )
        
        # Load and preprocess data
        print("Loading data...")
        df = irb.load_and_preprocess("german_credit_data.csv")
        print(f"Data loaded with {len(df)} records. Default rate: {df['Default'].mean():.2%}")
        
        # Baseline capital calculation
        print("\nCalculating baseline capital...")
        portfolio, baseline = irb.calculate_capital(df)
        
        print("\n=== BASELINE RESULTS ===")
        print(f"Total Exposure: €{baseline['TotalExposure']:,.2f}")
        print(f"Average PD: {baseline['AvgPD']:.2%}")
        print(f"Risk-Weighted Assets: €{baseline['TotalRWA']:,.2f}")
        print(f"Required Capital: €{baseline['TotalCapital']:,.2f}")
        print(f"Capital Ratio: {baseline['CapitalRatio']:.2%} (Min 8%)")
        print(f"Model AUC: {baseline['ModelAUC']:.3f}")
        
        # Stress testing
        print("\nRunning stress tests...")
        scenarios = {
            "Mild Recession": {"pd_shock": 1.5, "lgd_shock": 1.1},
            "Severe Crisis": {"pd_shock": 2.5, "lgd_shock": 1.3, "ead_shock": 0.9}
        }
        
        for name, params in scenarios.items():
            _, stressed = irb.stress_test(portfolio, params)
            print(f"\n{name} Scenario:")
            print(f"Capital Increase: +{(stressed['TotalCapital']/baseline['TotalCapital']-1):.1%}")
            print(f"New Capital Ratio: {stressed['CapitalRatio']:.2%}")
        
        # Portfolio analysis
        print("\nGenerating portfolio reports...")
        reports = irb.analyze_portfolio(portfolio)
        print("\nRisk Grade Distribution:")
        print(reports['RiskGrades'])
        print("\nTop 5 Riskiest Exposures:")
        print(reports['TopRisky'].to_string(index=False))
        
        # Save results
        portfolio.to_csv("irb_results.csv", index=False)
        print("\nResults saved to 'irb_results.csv'")
        
    except Exception as e:
        print(f"\nError: {str(e)}")
        print("Troubleshooting steps:")
        print("1. Verify input file exists and is accessible")
        print("2. Check required columns are present")
        print("3. Ensure target variable has values 0 (good) and 1 (bad)")


if __name__ == "__main__":
    main()

Loading data...
Data loaded with 1000 records. Default rate: 70.00%

Calculating baseline capital...
Training PD model...

=== BASELINE RESULTS ===
Total Exposure: €3,271,248.00
Average PD: 49.01%
Risk-Weighted Assets: €6,759,498.12
Required Capital: €540,759.85
Capital Ratio: 16.53% (Min 8%)
Model AUC: 0.855

Running stress tests...

Mild Recession Scenario:
Capital Increase: +-28.6%
New Capital Ratio: 11.80%

Severe Crisis Scenario:
Capital Increase: +-63.9%
New Capital Ratio: 6.63%

Generating portfolio reports...

Risk Grade Distribution:
RiskGrade
A (Lowest)     200
B              200
C              200
D              200
E (Highest)    200
Name: count, dtype: int64

Top 5 Riskiest Exposures:
 Duration  Amount  Age       PD   Capital
        6    1750   45 0.975950 20.022263
        6     753   64 0.972155  9.952769
        7    2329   45 0.971672 31.309427
        6    1595   51 0.969263 23.234769
        6    1898   34 0.967133 29.529686

Results saved to 'irb_results.csv'


### Business Implications

**Portfolio Risk**
- This appears to be a "toxic" portfolio requiring immediate attention
- The bank would need to either:
- Increase capital reserves significantly
- Reduce exposure to high-risk borrowers

**Regulatory Concerns**
- Baseline capital is adequate (16.53% > 8%)
- But severe stress scenario breaches minimum requirements
- Would likely fail regulatory stress tests

## Key Achievements

- Basel III Compliance: The model follows regulatory guidelines for capital calculation, including correlation and maturity adjustments.
- Machine Learning Integration: A Gradient Boosting Classifier (AUC: 0.855) effectively predicts PD, demonstrating strong discriminatory power.
- Stress Testing Framework: The system evaluates capital adequacy under different economic scenarios, though results indicated potential methodological refinements.
- Risk Segmentation: Loans were categorized into risk grades (A-E), helping identify high-risk exposures.

### Findings & Insights

- The portfolio exhibited an extremely high default rate (70%), suggesting either a high-risk segment or possible data anomalies.
- The baseline capital ratio (16.53%) exceeded the minimum 8% requirement, but stress tests revealed vulnerabilities under severe crises.
- Stress Test Anomalies: Capital requirements unexpectedly decreased under stress—likely due to implementation issues that need review.

## Recommendations for Improvement

- Data Validation: Verify default definitions and ensure realistic risk distributions.
- Stress Test Debugging: Fix capital calculation logic to ensure proper sensitivity to PD/LGD shocks.
- Model Calibration: Test PD predictions against actual default rates for better reliability.
- Regulatory Reporting: Enhance documentation for compliance audits.