# 01 — Through-the-Cycle (TtC) PD demo

This notebook demonstrates a simple pipeline to go from Point-in-Time (PiT) observed defaults to a Through-the-Cycle (TtC) PD estimate. It is intentionally small and educational — suitable for a portfolio README / interview demo.

Workflow:
1. Load sample PiT dataset
2. Compute vintage / origination-period default rates (observed PiT)
3. Fit a logistic PiT model with time (month) dummies
4. Produce TtC PDs by neutralizing the time (macro / cycle) effect

Notes: this is a toy pipeline. For production you'd add robust feature engineering, sample weighting, proper train/validation splits, PSI/K-S monitoring, calibration and documentation of assumptions.

In [None]:
# --- Imports
import pandas as pd
import numpy as np
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import roc_auc_score, brier_score_loss
import matplotlib.pyplot as plt
import seaborn as sns
sns.set(style='whitegrid')

# For reproducibility
RND = 42
np.random.seed(RND)

In [None]:
# --- Load sample data (raw CSV from this repo)
url = 'https://raw.githubusercontent.com/deveshusg/credit-risk-portfolio/main/data/sample_pit.csv'
df = pd.read_csv(url)
df.head()

## Data assumptions (sample)
- `id`: account/customer id
- `feature_a`, `feature_b`: simple numeric features
- `label`: binary default indicator (1 = default within observation window)

We'll create an `origination_month` to simulate time / cycle effects so we can show PiT vs TtC.

In [None]:
# Create a synthetic origination_month to illustrate time effects
n = len(df)
# simulate origination months across 12 months
months = pd.date_range(start='2019-01-01', periods=12, freq='MS').strftime('%Y-%m')
df['orig_month'] = np.random.choice(months, size=n, replace=True)

# Introduce a simple time-cycle signal into label probability (so PiT varies by month)
month_effect = {m: (0.02 + 0.06*np.sin(i/12*2*np.pi)) for i,m in enumerate(months)}

# convert label into probabilistic outcome influenced by features + month effect
base_logit = -3 + 0.05 * df['feature_a'] + 1.5 * df['feature_b']
p = 1/(1+np.exp(-base_logit))
p = p + df['orig_month'].map(month_effect)
p = np.clip(p, 0.001, 0.99)

df['label_sim'] = np.random.binomial(1, p)
df['label'] = df['label'].fillna(0).astype(int)  # preserve anything real, but we'll use label_sim
df['label'] = df['label_sim']

print('Records:', len(df))
df[['id','feature_a','feature_b','orig_month','label']].head()

## Observed vintage (PiT) default rates — by origination month
This is the classic empirical PiT vintage PD: observed default rate for each origination period.

In [None]:
vintage = (df
           .groupby('orig_month')
           .agg(n=('id','size'), defaults=('label','sum'))
           .assign(pit_pd=lambda d: d['defaults']/d['n'])
           .reset_index()
           .sort_values('orig_month'))

vintage.plot(x='orig_month', y='pit_pd', kind='bar', figsize=(10,4), title='Observed PiT vintage PD by origination month')


## Fit a simple PiT logistic model
We include features and **month dummies**. The month dummies capture the Point-in-Time cycle effect. To get a TtC PD we will neutralize these month dummies (set them to their average or zero).

In [None]:
# Prepare model matrix
X = df[['feature_a','feature_b']].copy()
month_dummies = pd.get_dummies(df['orig_month'], prefix='m')
X = pd.concat([X, month_dummies], axis=1)
y = df['label']

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=RND, stratify=y)

model = LogisticRegression(max_iter=1000)
model.fit(X_train, y_train)

y_pred_proba = model.predict_proba(X_test)[:,1]
print('AUC (PiT model):', round(roc_auc_score(y_test, y_pred_proba),3))
print('Brier score:', round(brier_score_loss(y_test, y_pred_proba),3))

## Generate PiT PD predictions and then convert to TtC
- **PiT PDs**: model output with month dummies as-is
- **TtC PDs**: set month dummy part to the *mean* (i.e., neutralize the current cycle). Practically this is done by zeroing the month-dummy coefficients contribution (or setting dummies to their mean frequency).

In [None]:
# Full X (all rows)
X_full = X.copy()
piT_proba = model.predict_proba(X_full)[:,1]

# Approach to get TtC: zero out month-dummies (set to 0)
month_cols = [c for c in X_full.columns if c.startswith('m_')]
X_ttc = X_full.copy()
X_ttc[month_cols] = 0.0
ttc_proba = model.predict_proba(X_ttc)[:,1]

df_out = df.copy()
df_out['pit_pred'] = piT_proba
df_out['ttc_pred'] = ttc_proba

# Summarize by vintage
summary = (df_out
           .groupby('orig_month')
           .agg(n=('id','size'), observed_pd=('label','mean'), pit_pd_pred=('pit_pred','mean'), ttc_pd_pred=('ttc_pred','mean'))
           .reset_index()
           .sort_values('orig_month'))
summary

In [None]:
# Plot observed PiT, model PiT, and derived TtC
plt.figure(figsize=(11,5))
plt.plot(summary['orig_month'], summary['observed_pd'], marker='o', label='Observed PiT')
plt.plot(summary['orig_month'], summary['pit_pd_pred'], marker='o', label='Model PiT (avg predicted)')
plt.plot(summary['orig_month'], summary['ttc_pd_pred'], marker='o', label='Derived TtC (month-neutral)')
plt.xticks(rotation=45)
plt.ylabel('PD')
plt.title('Observed PiT vs Model PiT vs Derived TtC by origination month')
plt.legend()
plt.show()

## Notes, next steps and caveats
- The TtC method above neutralizes month dummies — a simple and interpretable approach. In practice you'd:
  - Use macroeconomic variables (GDP, unemployment) rather than pure month dummies.
  - To get TtC, predict with macro variables set to long-run averages or using stress/scenario assumptions.
  - Calibrate outputs to observed long-run default rates and ensure PD floor/cap rules.
- Validate: PSI, KS across time, backtest with holdout vintages, and produce explanation notes for model governance.

If you'd like, I can:
- Replace month dummies with a small synthetic macro time series and show TtC by setting macro=mean.
- Export a cleaned notebook file that you can directly push to the GitHub repo.
- Add a short section on calibration (Platt scaling / isotonic).