# Seminar QF: Credit Risk Analysis Pipeline

This notebook implements a comprehensive pipeline for credit risk analysis using various volatility models:
1.  **Merton Model**: To estimate asset values and volatility from equity data.
2.  **GARCH(1,1)**: Single-regime volatility modeling.
3.  **Regime-Switching (Hamilton Filter)**: Volatility modeling with regime changes.
4.  **MS-GARCH**: Optimized Markov-Switching GARCH model.
5.  **Monte Carlo Simulation**: To forecast future asset values.
6.  **CDS Spread Calculation**: Estimating credit default swap spreads based on simulated default probabilities.

### 0. Setup
Initialize the environment, add `src` to the system path, and define configuration.

In [2]:
# Setup
import sys
import os
import shutil
from pathlib import Path
import pandas as pd
import numpy as np

# Add project root to path so we can import src
# Assuming notebook is in notebooks/, project root is parent
project_root = Path.cwd().parent
if str(project_root) not in sys.path:
    sys.path.append(str(project_root))

# Import config and modules
try:
    from src.utils import config
    from src.data.data_processing import load_and_preprocess_data, run_merton_estimation, load_interest_rates
    from src.models.garch_model import run_garch_estimation
    from src.models.regime_switching import run_regime_switching_estimation
    from src.models.ms_garch_optimized import run_ms_garch_estimation_optimized
    from src.models.probability_of_default import run_pd_pipeline, calculate_merton_pd_normal
    from src.analysis.result_summary import generate_results_summary
    from src.analysis.monte_carlo_garch import monte_carlo_garch_1year
    from src.analysis.volatility_diagnostics import run_volatility_diagnostics, filter_problematic_firms
    from src.analysis.monte_carlo_regime_switching import monte_carlo_regime_switching_1year
    from src.analysis.monte_carlo_ms_garch import monte_carlo_ms_garch_1year
    from src.analysis.cds_spread_calculator import CDSSpreadCalculator

    print("Imports successful.")
    print(f"Data Directory: {config.DATA_DIR}")
except ImportError as e:
    print(f"Import Error: {e}")
    # print(f"sys.path: {sys.path}")

Imports successful.
Data Directory: C:\Users\Chase\Downloads\Seminar QF\Seminar QF\data


### 1. Cache Cleanup
Remove intermediate files.

In [3]:
# Cache Cleanup
print("Cleaning up cache files...")
cache_dir = config.INTERMEDIATES_DIR
cache_files = [
    'merton_results_cache.pkl',
    'mc_garch_cache.csv'
]

for cache_file in cache_files:
    cache_path = cache_dir / cache_file
    if cache_path.exists():
        try:
            os.remove(cache_path)
            print(f"✓ Deleted: {cache_path}")
        except Exception as e:
            print(f"⚠ Could not delete {cache_path}: {e}")
    else:
        print(f"  (No cached file: {cache_path})")

print("Cache cleanup complete.")


Cleaning up cache files...
✓ Deleted: C:\Users\Chase\Downloads\Seminar QF\Seminar QF\data\intermediates\merton_results_cache.pkl
  (No cached file: C:\Users\Chase\Downloads\Seminar QF\Seminar QF\data\intermediates\mc_garch_cache.csv)
Cache cleanup complete.


### 2. Data Loading & Merton Model Estimation
*   **Load Interest Rates**: From ECB data.
*   **Load Equity Data**: From Excel inputs.
*   **Run Merton Model**: Solves for Asset Value ($V_t$) and Asset Volatility ($\sigma_A$) using the iterative approach.
*   **Output**: `merged_data_with_merton.csv` and `daily_asset_returns.csv`.

In [None]:
# Load Interest Rates
interest_rates_df = load_interest_rates()
print(f"Loaded {len(interest_rates_df)} months of interest rate data")

# Load Equity/Liability Data
df = load_and_preprocess_data()

# Run Merton Model
df_merged, daily_returns_df = run_merton_estimation(df, interest_rates_df)

# Save Results
df_merged.to_csv(config.OUTPUT_DIR / "merged_data_with_merton.csv", index=False)
daily_returns_df.to_csv(config.OUTPUT_DIR / "daily_asset_returns.csv", index=False)

print(f"Saved to {config.OUTPUT_DIR}")

Loaded 384 months of interest rate data
Loading equity data...
Removed 16 flagged companies (gvkeys: [101248, 25466, 203053, 245663, 19349, 243774, 17828, 333645, 101305, 61214, 15181, 14140, 100312, 101276, 100737, 214881])
Remaining firms: 34

Loading liability data...
Loaded liability data

MERTON MODEL ESTIMATION (Vectorized + Parallelized - EXACT ndtr)

Processing 34 firms with -1 parallel jobs...

Starting parallel Merton estimation...


[Parallel(n_jobs=-1)]: Using backend LokyBackend with 12 concurrent workers.
[Parallel(n_jobs=-1)]: Done   1 tasks      | elapsed:   16.0s
[Parallel(n_jobs=-1)]: Done   8 tasks      | elapsed:   18.6s
[Parallel(n_jobs=-1)]: Done  15 out of  34 | elapsed:   29.8s remaining:   37.8s
[Parallel(n_jobs=-1)]: Done  19 out of  34 | elapsed:   31.7s remaining:   25.0s
[Parallel(n_jobs=-1)]: Done  23 out of  34 | elapsed:   34.4s remaining:   16.4s
[Parallel(n_jobs=-1)]: Done  27 out of  34 | elapsed:   41.3s remaining:   10.6s
[Parallel(n_jobs=-1)]: Done  31 out of  34 | elapsed:   42.5s remaining:    4.0s
[Parallel(n_jobs=-1)]: Done  34 out of  34 | elapsed:   42.8s finished



✓ Parallel Merton complete in 0:00:43

Caching 125,648 Merton results...
✓ Cached to: C:\Users\Chase\Downloads\Seminar QF\Seminar QF\data\intermediates\merton_results_cache.pkl

Merton Estimation Complete
Total time: 0:00:43
Firms processed: 34
Daily results: 125,614

Saved to C:\Users\Chase\Downloads\Seminar QF\Seminar QF\data\output


### 3. GARCH(1,1) Estimation
Estimates a standard GARCH(1,1) model on the daily asset returns derived from the Merton model.
*   **Input**: `daily_asset_returns.csv`
*   **Output**: `daily_asset_returns_with_garch.csv` (adds conditional volatility columns).

In [None]:
final_daily_returns = run_garch_estimation(daily_returns_df)

# Save Results
final_daily_returns.to_csv(config.OUTPUT_DIR / "daily_asset_returns_with_garch.csv", index=False)
print("Saved 'daily_asset_returns_with_garch.csv'")

Estimating GARCH(1,1) with t-distribution on DAILY Asset Returns...
Processing GARCH for 34 firms (Daily Data)...
Processed GARCH for 10 firms...
Processed GARCH for 20 firms...
Processed GARCH for 30 firms...
GARCH estimation complete.
Saved 'daily_asset_returns_with_garch.csv'


### 4. Regime-Switching Model (Hamilton Filter)
Estimates a 2-state Markov Switching model on returns (High Volatility / Low Volatility states), independent of GARCH dynamics inside the states.
*   **Output**: `daily_asset_returns_with_regime.csv`.

In [None]:
final_daily_returns_rs = run_regime_switching_estimation(daily_returns_df)

# Save Results
final_daily_returns_rs.to_csv(config.OUTPUT_DIR / "daily_asset_returns_with_regime.csv", index=False)
print("Saved 'daily_asset_returns_with_regime.csv'")

Estimating Regime Switching Model (2-Regime Markov) on DAILY Returns...
(Hamilton Filter)

Processing Regime Switching for 34 firms...

  ✓ Firm 1/34: gvkey=14447
      Observations: 3762
      Regime 0: μ=0.499866, σ=0.008699
      Regime 1: μ=0.499871, σ=0.009390
      Transition: P(0→0)=0.316, P(1→1)=0.688

  ✓ Firm 2/34: gvkey=15532
      Observations: 3762
      Regime 0: μ=0.499996, σ=0.098713
      Regime 1: μ=0.499997, σ=0.097689
      Transition: P(0→0)=0.501, P(1→1)=0.548

  ✓ Firm 3/34: gvkey=15549
      Observations: 3762
      Regime 0: μ=0.500000, σ=0.177171
      Regime 1: μ=0.500000, σ=0.180112
      Transition: P(0→0)=0.564, P(1→1)=0.429

  ✓ Firm 4/34: gvkey=15617
      Observations: 3762
      Regime 0: μ=0.500000, σ=0.102694
      Regime 1: μ=0.500000, σ=0.091557
      Transition: P(0→0)=0.476, P(1→1)=0.535

  ✓ Firm 5/34: gvkey=15677
      Observations: 3762
      Regime 0: μ=0.499938, σ=0.005658
      Regime 1: μ=0.499940, σ=0.002316
      Transition: P(0→0)=0.179

### 5. Optimized MS-GARCH Estimation
Estimates a "True" MS-GARCH model where each regime has its own GARCH(1,1) process.
*   **Optimizations**: Uses Warm Start, JIT Compilation (Numba), and Numerical Optimizations.
*   **Output**: `daily_asset_returns_with_msgarch.csv` and parameter file.

In [8]:
print("Running MS-GARCH Estimation...")
final_daily_returns_msgarch = run_ms_garch_estimation_optimized(
    daily_returns_df,
    output_file=str(config.OUTPUT_DIR / "ms_garch_parameters.csv")
)

# Save Results
final_daily_returns_msgarch.to_csv(config.OUTPUT_DIR / "daily_asset_returns_with_msgarch.csv", index=False)
print("Saved 'daily_asset_returns_with_msgarch.csv'")

Running MS-GARCH Estimation...

OPTIMIZED MS-GARCH ESTIMATION
Optimizations enabled:
  ✓ GARCH(1,1) warm start for initial parameters
  ✓ Numba JIT-compiled Hamilton filter
  ✓ Cached intermediate results
  ✓ L-BFGS-B optimizer with tuned settings

Processing 34 firms...

[1/34] Processing 14447
  Fitting OPTIMIZED MS-GARCH with t-distribution...
    → Getting GARCH(1,1) warm start parameters...


    → Warm start: omega=1.74e-06, alpha=0.043, beta=0.949, nu=4.4
    → Running L-BFGS-B optimization...
  MLE converged: True
  Log-likelihood: 11383.77
  Regime 0 (low vol): omega=0.000001, alpha=0.0412, beta=0.9575, nu=4.56
  Regime 1 (high vol): omega=0.000003, alpha=0.0550, beta=0.9102, nu=4.47
  Persistence: p00=0.9534, p11=0.9523
  ✓ Successfully estimated MS-GARCH for 14447

[2/34] Processing 15532
  Fitting OPTIMIZED MS-GARCH with t-distribution...
    → Getting GARCH(1,1) warm start parameters...
    → Warm start: omega=6.33e-05, alpha=0.092, beta=0.898, nu=30.0
    → Running L-BFGS-B optimization...
  MLE converged: True
  Log-likelihood: 12426.35
  Regime 0 (low vol): omega=0.000002, alpha=0.0612, beta=0.6165, nu=28.97
  Regime 1 (high vol): omega=0.000186, alpha=0.1396, beta=0.8508, nu=28.90
  Persistence: p00=0.9574, p11=0.9489
  ✓ Successfully estimated MS-GARCH for 15532

[3/34] Processing 15549
  Fitting OPTIMIZED MS-GARCH with t-distribution...
    → Getting GARCH(1,1

### 6. Probability of Default (PD) Calculation
Calculates PD using the Merton Model formula but substituting the volatility estimates from GARCH, RS, and MS-GARCH models.
*   **Benchmark**: Calculates standard Merton PD assuming Normal distribution.
*   **Output**: `daily_pd_results.csv`.

In [9]:
pd_results = run_pd_pipeline(
    str(config.OUTPUT_DIR / 'daily_asset_returns_with_garch.csv'), 
    str(config.OUTPUT_DIR / 'daily_asset_returns_with_regime.csv'), 
    str(config.OUTPUT_DIR / 'daily_asset_returns_with_msgarch.csv')
)

pd_results.to_csv(config.OUTPUT_DIR / "daily_pd_results.csv", index=False)
print("Saved 'daily_pd_results.csv'")

# Merton PD (Benchmark)
merton_normal_pd = calculate_merton_pd_normal(str(config.OUTPUT_DIR / 'daily_asset_returns.csv'))
merton_normal_pd.to_csv(config.OUTPUT_DIR / "daily_pd_results_merton_normal.csv", index=False)
print("Saved 'daily_pd_results_merton_normal.csv'")


PROBABILITY OF DEFAULT CALCULATION (Multi-Model, Daily Data)
Loading auxiliary data (liabilities and interest rates)...
  Loading liabilities...
    ✓ Loaded 741 liability records
  Loading interest rates...
    ✓ Loaded 384 months of interest rate data

Calculating PD for each model...

  Processing GARCH...
    ✓ GARCH: Calculated PD for 125,614 observations
       PD column name: 'pd_garch'

  Processing Regime Switching...
    ✗ Volatility column 'garch_volatility' not found in C:\Users\Chase\Downloads\Seminar QF\Seminar QF\data\output\daily_asset_returns_with_regime.csv.
       Available columns: ['gvkey', 'date', 'asset_return_daily', 'asset_value', 'asset_volatility', 'regime_state', 'regime_probability_0', 'regime_probability_1', 'fyear', 'liabilities_total', 'month_year', 'risk_free_rate']
    ⚠ Regime Switching model skipped

  Processing MS-GARCH...
    ✗ Volatility column 'msgarch_volatility' not found in C:\Users\Chase\Downloads\Seminar QF\Seminar QF\data\output\daily_ass

### 7. Monte Carlo Simulation (GARCH) & Diagnostics
Simulates future asset values for 1 year (252 days) using GARCH volatility dynamics.
Also runs **Volatility Diagnostics** to identify firms with explosive volatility that might distort results.
*   **Output**: `daily_monte_carlo_garch_results.csv` and diagnostic files in `data/diagnostics/`.

In [None]:
print("Run Monte Carlo GARCH (1 year)...")
mc_results = monte_carlo_garch_1year(
    str(config.OUTPUT_DIR / 'daily_asset_returns_with_garch.csv'), 
    gvkey_selected=None, 
    num_simulations=1000,
    num_days=252
)
mc_results.to_csv(config.OUTPUT_DIR / "daily_monte_carlo_garch_results.csv", index=False)

# Diagnostics
print("Running Volatility Diagnostics...")
diagnostics_results = run_volatility_diagnostics(
    garch_file=str(config.OUTPUT_DIR / 'daily_asset_returns_with_garch.csv'),
    mc_garch_file=str(config.OUTPUT_DIR / 'daily_monte_carlo_garch_results.csv'),
    output_dir=str(config.DIAGNOSTICS_DIR)
)

PROBLEMATIC_FIRMS = diagnostics_results['problematic_firms']
CLEAN_FIRMS = diagnostics_results['clean_firms']
print(f"Problematic Firms: {len(PROBLEMATIC_FIRMS)}")

Run Monte Carlo GARCH (1 year)...

MONTE CARLO GARCH 1-YEAR CUMULATIVE VOLATILITY FORECAST

✓ Loaded 125,614 total observations
✓ Unique dates: 4015
✓ Unique firms: 34
✓ Date range: 2010-08-02 to 2025-12-19
✓ Expected output rows: ~136,510

Progress: 1/4015 (2010-08-02)


### 8. Monte Carlo Simulation (Regime Switching & MS-GARCH)
Runs MC simulations for the Regime-Switching and MS-GARCH models.
*   **Regime Switching**: Simulates regime changes and draws returns based on state-specific volatility (no GARCH).
*   **MS-GARCH**: Simulates regime changes AND GARCH dynamics within each state.

In [None]:
print("Run MC Regime Switching...")
mc_rs_results = monte_carlo_regime_switching_1year(
    garch_file=str(config.OUTPUT_DIR / 'daily_asset_returns_with_garch.csv'),
    regime_params_file=str(config.OUTPUT_DIR / 'regime_switching_parameters.csv'),
    gvkey_selected=None,
    num_simulations=1000,
    num_days=252
)
mc_rs_results.to_csv(config.OUTPUT_DIR / "daily_monte_carlo_regime_switching_results.csv", index=False)

print("Run MC MS-GARCH...")
mc_msgarch_results = monte_carlo_ms_garch_1year(
    daily_returns_file=str(config.OUTPUT_DIR / 'daily_asset_returns_with_msgarch.csv'),
    ms_garch_params_file=str(config.OUTPUT_DIR / 'ms_garch_parameters.csv'),
    gvkey_selected=None,
    num_simulations=1000,
    num_days=252
)
mc_msgarch_results.to_csv(config.OUTPUT_DIR / "daily_monte_carlo_ms_garch_results.csv", index=False)

### 9. CDS Spread Calculation
Calculates Model-Implied CDS Spreads for 1, 3, and 5-year horizons.
Input: Monte Carlo results from all 3 models.
Output: Separate CSV files for CDS spreads (cleaned of problematic firms).

In [None]:
cds_calc = CDSSpreadCalculator(maturity_horizons=[1, 3, 5])

# GARCH
print("CDS Spreads: GARCH")
df_cds_spreads_garch_all = cds_calc.calculate_cds_spreads_from_mc_garch(
    mc_garch_file=str(config.OUTPUT_DIR / 'daily_monte_carlo_garch_results.csv'),
    daily_returns_file=str(config.OUTPUT_DIR / 'daily_asset_returns.csv'),
    merton_file=str(config.OUTPUT_DIR / 'merged_data_with_merton.csv'),
    output_file=str(config.OUTPUT_DIR / 'cds_spreads_garch_mc_all_firms.csv')
)

# Regime Switching
print("CDS Spreads: RS")
df_cds_spreads_rs_all = cds_calc.calculate_cds_spreads_from_mc_garch(
    mc_garch_file=str(config.OUTPUT_DIR / 'daily_monte_carlo_regime_switching_results.csv'),
    daily_returns_file=str(config.OUTPUT_DIR / 'daily_asset_returns.csv'),
    merton_file=str(config.OUTPUT_DIR / 'merged_data_with_merton.csv'),
    output_file=str(config.OUTPUT_DIR / 'cds_spreads_regime_switching_mc_all_firms.csv')
)

# MS-GARCH
print("CDS Spreads: MS-GARCH")
df_cds_spreads_msgarch_all = cds_calc.calculate_cds_spreads_from_mc_garch(
    mc_garch_file=str(config.OUTPUT_DIR / 'daily_monte_carlo_ms_garch_results.csv'),
    daily_returns_file=str(config.OUTPUT_DIR / 'daily_asset_returns.csv'),
    merton_file=str(config.OUTPUT_DIR / 'merged_data_with_merton.csv'),
    output_file=str(config.OUTPUT_DIR / 'cds_spreads_ms_garch_mc_all_firms.csv'),
    volatility_column='mc_msgarch_integrated_variance'
)