# Notebook 02 — Exploratory Data Analysis

## Credit Risk and Dollarization in Cambodia: Dual-Currency Spread Analysis

This notebook performs comprehensive exploratory analysis of the USD and KHR interest rate spreads (term loan rate − term deposit rate) computed in Notebook 01.

**Contents:**
1. Descriptive Statistics (full sample & sub-period)
2. Normality Tests (Shapiro-Wilk, Jarque-Bera)
3. Stationarity Tests (Augmented Dickey-Fuller)
4. Autocorrelation Analysis (ACF/PACF)
5. Correlation Analysis
6. Publication-Quality Visualizations (Figures 1–5)

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
from matplotlib.patches import Rectangle
from scipy import stats
from statsmodels.tsa.stattools import adfuller, acf, pacf
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
import warnings
warnings.filterwarnings('ignore')

# Publication-quality plot settings
plt.rcParams.update({
    'figure.figsize': (12, 6),
    'figure.dpi': 150,
    'savefig.dpi': 300,
    'font.size': 11,
    'axes.titlesize': 14,
    'axes.labelsize': 12,
    'legend.fontsize': 10,
    'xtick.labelsize': 10,
    'ytick.labelsize': 10,
    'font.family': 'serif'
})

print('Libraries loaded successfully.')

In [None]:
# ─── Load Data ───────────────────────────────────────────────────────────────
usd = pd.read_csv('../data/processed/spreads_usd_new_amount.csv', parse_dates=['date'], index_col='date')
khr = pd.read_csv('../data/processed/spreads_khr_new_amount.csv', parse_dates=['date'], index_col='date')
rates = pd.read_csv('../data/processed/all_rates_wide_new_amount.csv', parse_dates=['Date'], index_col='Date')

# Combine spreads into a single DataFrame
spreads = pd.DataFrame({
    'USD_Spread': usd['spread'],
    'KHR_Spread': khr['spread']
})

print(f'Sample period: {spreads.index[0].strftime("%b %Y")} – {spreads.index[-1].strftime("%b %Y")}')
print(f'Total observations: {len(spreads)}')
spreads.head()

---
## 1. Descriptive Statistics

In [None]:
# ─── Full-Sample Descriptive Statistics ──────────────────────────────────────
def descriptive_stats(series, name):
    """Compute comprehensive descriptive statistics for a spread series."""
    return pd.Series({
        'Mean (%)': series.mean(),
        'Std. Dev. (%)': series.std(),
        'Min (%)': series.min(),
        'Max (%)': series.max(),
        'Median (%)': series.median(),
        'Skewness': series.skew(),
        'Kurtosis': series.kurtosis(),
        'P5 (%)': series.quantile(0.05),
        'P25 (%)': series.quantile(0.25),
        'P50 (%)': series.quantile(0.50),
        'P75 (%)': series.quantile(0.75),
        'P95 (%)': series.quantile(0.95),
        'N': len(series)
    }, name=name)

stats_table = pd.DataFrame([
    descriptive_stats(spreads['USD_Spread'], 'USD Spread'),
    descriptive_stats(spreads['KHR_Spread'], 'KHR Spread')
]).T

print('\n══════════════════════════════════════════════════════════════')
print('          TABLE 1: Descriptive Statistics — Full Sample')
print('══════════════════════════════════════════════════════════════')
print(stats_table.round(4).to_string())
print('══════════════════════════════════════════════════════════════')

### Interpretation — Table 1: Full-Sample Descriptive Statistics

The descriptive statistics reveal **striking differences** between the two currency segments:

**1. Level Difference:** The KHR spread averages **11.34%**, nearly **1.7 times** the USD spread average of **6.72%**. This gap reflects the **exchange rate risk premium** embedded in riel-denominated lending — borrowers taking KHR loans face currency depreciation risk, and banks compensate with wider margins. It also captures the less competitive, less mature KHR lending market compared to the deeper, more standardized USD segment.

**2. Volatility Difference:** KHR spread volatility (std = **7.11%**) is **3.5 times** higher than USD (std = **2.02%**). This is the single most important finding for credit risk: the KHR segment exhibits **substantially more risk instability**. The KHR spread ranges from 4.24% to 26.65% (a 22.4 pp range), versus USD's 2.88% to 11.30% (an 8.4 pp range).

**3. Mean vs. Median Divergence:** The KHR median (6.95%) is **far below** its mean (11.34%), indicating the distribution is **heavily right-skewed** — dominated by the early-sample high values when KHR spreads exceeded 20%. By contrast, the USD median (6.04%) is closer to its mean, suggesting a more symmetric distribution around typical values.

**4. Skewness and Kurtosis:** Both spreads are positively skewed (USD: 0.74, KHR: 0.76), meaning extreme **widenings** are more common than extreme compressions — consistent with credit risk theory where risk materializes through sudden spread blowouts. The negative kurtosis for KHR (−1.12) indicates a **platykurtic** distribution — fat but spread out, without the extreme tails one might expect. This suggests risk in the KHR segment manifests as **sustained elevated spreads** rather than sudden spikes.

**5. Percentile Analysis:** The 95th percentile values (USD: 10.73%, KHR: 23.94%) will serve as **crisis thresholds** in Notebook 04. When spreads exceed these levels, the CRI will signal elevated credit risk conditions.

In [None]:
# ─── Sub-Period Descriptive Statistics ────────────────────────────────────────
periods = {
    'Pre-COVID (2013–2019)': ('2013-01-01', '2019-12-31'),
    'COVID (2020–2021)':     ('2020-01-01', '2021-12-31'),
    'Post-COVID (2022–2025)':('2022-01-01', '2025-12-31')
}

sub_stats = []
for pname, (start, end) in periods.items():
    mask = (spreads.index >= start) & (spreads.index <= end)
    sub = spreads[mask]
    row = {
        'Period': pname,
        'N': len(sub),
        'USD Mean (%)': sub['USD_Spread'].mean(),
        'USD Std (%)': sub['USD_Spread'].std(),
        'KHR Mean (%)': sub['KHR_Spread'].mean(),
        'KHR Std (%)': sub['KHR_Spread'].std(),
        'Correlation': sub['USD_Spread'].corr(sub['KHR_Spread'])
    }
    sub_stats.append(row)

sub_df = pd.DataFrame(sub_stats).set_index('Period')
print('\n═══════════════════════════════════════════════════════════════════════')
print('         TABLE 1b: Sub-Period Summary Statistics')
print('═══════════════════════════════════════════════════════════════════════')
print(sub_df.round(4).to_string())
print('═══════════════════════════════════════════════════════════════════════')

### Interpretation — Table 1b: Sub-Period Summary Statistics

The sub-period breakdown reveals **dramatic structural shifts** across the three eras:

**1. KHR Spread Compression — The Dominant Story:**
The KHR mean spread **collapsed** from 16.02% (pre-COVID) to 6.12% (COVID) to 5.76% (post-COVID) — a **64% decline** over the sample. This compression reflects the **maturation of Cambodia's financial sector**: increasing banking competition, improved credit assessment capabilities, NBC's de-dollarization policies promoting riel lending, and greater confidence in the domestic currency. The KHR volatility similarly plummeted from 6.76% to just 0.82% during COVID and 0.70% post-COVID — the riel market has become **qualitatively different** from 2013 to 2025.

**2. USD Spread Stability and Compression:**
The USD spread also declined but more modestly: from 7.98% (pre-COVID) to 5.77% (COVID) to 5.01% (post-COVID) — a **37% decline**. This reflects global factors (prolonged low interest rate environment 2013–2021) combined with Cambodia-specific banking sector deepening. USD volatility dropped from 1.91% pre-COVID to just 0.36% during COVID, reflecting the **NBC loan restructuring program** that stabilized lending conditions.

**3. Correlation Breakdown During COVID:**
The correlation between USD and KHR spreads **collapsed** from 0.73 (pre-COVID) to just **0.11** during COVID, then partially recovered to 0.41 post-COVID. This is a critical finding: **during the crisis, the two currency segments decoupled**. This means a single-currency credit risk framework would miss important divergent dynamics. It also suggests that COVID-specific policies (NBC restructuring, Fed rate cuts) affected the two segments through different channels — validating the dual-currency approach of this paper.

**4. Convergence of Spread Levels:**
By the post-COVID period, the gap between USD and KHR spreads has narrowed to just ~0.75 pp (5.01% vs 5.76%), compared to ~8.0 pp in the pre-COVID era. This **convergence** is consistent with NBC's de-dollarization progress — as KHR lending matures, its risk pricing approaches that of the more established USD segment.

---
## 2. Normality Tests

In [None]:
# ─── Normality Tests ─────────────────────────────────────────────────────────
normality_results = []
for col, label in [('USD_Spread', 'USD Spread'), ('KHR_Spread', 'KHR Spread')]:
    sw_stat, sw_p = stats.shapiro(spreads[col])
    jb_stat, jb_p = stats.jarque_bera(spreads[col])
    normality_results.append({
        'Series': label,
        'Shapiro-Wilk Stat': sw_stat,
        'Shapiro-Wilk p-value': sw_p,
        'Jarque-Bera Stat': jb_stat,
        'Jarque-Bera p-value': jb_p,
        'Normal at 5%?': 'Yes' if (sw_p > 0.05 and jb_p > 0.05) else 'No'
    })

norm_df = pd.DataFrame(normality_results).set_index('Series')
print('\n═══════════════════════════════════════════════════════════════')
print('                    Normality Test Results')
print('═══════════════════════════════════════════════════════════════')
print(norm_df.round(4).to_string())
print('═══════════════════════════════════════════════════════════════')

### Interpretation — Normality Tests

Both the Shapiro-Wilk and Jarque-Bera tests **strongly reject normality** for both spread series (p < 0.001 in all cases).

**Why this matters:** The OU model assumes normally distributed innovations (the $dW_t$ driving noise). The rejection of unconditional normality does **not** invalidate the OU model — it actually supports it. Here's why:

- The **unconditional** distribution of an OU process is only normal if the process is in its stationary regime with constant parameters. Our data spans a period of **major structural change** (KHR spread compressed from 24% to 4%), which creates the right-skewed distribution we observe.
- What matters for the OU model is whether the **conditional innovations** (one-step-ahead residuals) are approximately normal. This is tested in Notebook 03's model diagnostics.
- The positive skewness (0.74 for USD, 0.76 for KHR) is consistent with credit risk theory: **spread widenings (risk events) tend to be larger and more sudden than compressions** — a well-documented asymmetry in financial markets.

**Implication for the paper:** We should mention this non-normality as a known limitation — the OU model captures the mean-reverting dynamics well, but may underestimate tail risks. The stress testing framework in Notebook 05 provides a complementary approach by directly examining how CRI responds to extreme scenarios.

---
## 3. Stationarity Tests (ADF)

In [None]:
# ─── Augmented Dickey-Fuller Tests ───────────────────────────────────────────
adf_results = []
for col, label in [('USD_Spread', 'USD Spread'), ('KHR_Spread', 'KHR Spread')]:
    result = adfuller(spreads[col], autolag='AIC')
    adf_results.append({
        'Series': label,
        'ADF Statistic': result[0],
        'p-value': result[1],
        'Lags Used': result[2],
        'Critical 1%': result[4]['1%'],
        'Critical 5%': result[4]['5%'],
        'Critical 10%': result[4]['10%'],
        'Stationary at 5%?': 'Yes' if result[1] < 0.05 else 'No'
    })

adf_df = pd.DataFrame(adf_results).set_index('Series')
print('\n═══════════════════════════════════════════════════════════════════')
print('              Augmented Dickey-Fuller Test Results')
print('═══════════════════════════════════════════════════════════════════')
print(adf_df.round(4).to_string())
print('═══════════════════════════════════════════════════════════════════')

### Interpretation — ADF Stationarity Tests

The ADF test **fails to reject** the unit root null hypothesis for both series:
- **USD Spread**: ADF statistic = −0.67, p-value = 0.86 (12 lags selected by AIC)
- **KHR Spread**: ADF statistic = −1.69, p-value = 0.44 (3 lags selected)

**This is a nuanced but important result.** At first glance, failure to reject a unit root appears to contradict the mean-reverting OU model. However, there are several key reasons why this does NOT invalidate our approach:

**1. Structural Breaks Reduce ADF Power:**
The KHR spread experienced a massive structural compression from ~24% to ~5% over the sample. The ADF test has **well-known low power** in the presence of structural breaks (Perron, 1989). The test interprets the downward trend as evidence of non-stationarity, when in reality the series may be mean-reverting around a **shifting** equilibrium — exactly what our rolling window analysis in Notebook 07 will capture.

**2. Slow Mean Reversion ≠ Unit Root:**
The OU model estimated in Notebook 03 shows KHR has κ = 0.46 (half-life ≈ 18 months). Slow mean reversion is **difficult to distinguish from a unit root** in finite samples — a classic identification problem in financial econometrics (Phillips, 1987). The ADF test lacks the statistical power to distinguish between a near-unit-root process and a slowly mean-reverting one with 156 observations.

**3. High Lag Selection for USD (12 lags):**
The AIC selected 12 lags for the USD spread, consuming degrees of freedom and further reducing test power. This high lag count itself indicates complex serial dependence patterns consistent with a persistent AR process.

**4. What Supports Mean Reversion Instead:**
- Interest rate spreads have a **natural economic floor** (banks cannot sustain negative spreads) and **competitive ceiling** (abnormally high spreads attract new entrants)
- The ACF analysis below shows significant but **decaying** autocorrelation — characteristic of mean reversion, not a random walk
- The MLE estimation in Notebook 03 confirms positive κ values for both currencies
- The sub-period analysis shows spreads **converging** to a new equilibrium rather than wandering unboundedly

**For the paper:** We should discuss the ADF results honestly, acknowledge the low power in the presence of structural shifts, and note that the economic argument for mean reversion is strong. The rolling window approach (Notebook 07) addresses this by allowing parameters to vary over time.

---
## 4. Autocorrelation Analysis

In [None]:
# ─── ACF / PACF Plots ────────────────────────────────────────────────────────
fig, axes = plt.subplots(2, 2, figsize=(14, 8))

plot_acf(spreads['USD_Spread'], lags=30, ax=axes[0, 0], color='#2196F3', alpha=0.05)
axes[0, 0].set_title('ACF — USD Spread', fontweight='bold')

plot_pacf(spreads['USD_Spread'], lags=30, ax=axes[0, 1], color='#2196F3', alpha=0.05)
axes[0, 1].set_title('PACF — USD Spread', fontweight='bold')

plot_acf(spreads['KHR_Spread'], lags=30, ax=axes[1, 0], color='#E91E63', alpha=0.05)
axes[1, 0].set_title('ACF — KHR Spread', fontweight='bold')

plot_pacf(spreads['KHR_Spread'], lags=30, ax=axes[1, 1], color='#E91E63', alpha=0.05)
axes[1, 1].set_title('PACF — KHR Spread', fontweight='bold')

for ax in axes.flat:
    ax.set_xlabel('Lag (months)')

plt.tight_layout()
plt.savefig('../figures/fig_acf_pacf.png', dpi=300, bbox_inches='tight')
plt.show()

print(f'\nAR(1) autocorrelation at lag 1:')
print(f'  USD Spread: {spreads["USD_Spread"].autocorr(lag=1):.4f}')
print(f'  KHR Spread: {spreads["KHR_Spread"].autocorr(lag=1):.4f}')

### Interpretation — ACF/PACF Analysis

The autocorrelation analysis provides **strong support** for the mean-reverting OU model specification:

**1. High Lag-1 Autocorrelation:**
- **USD Spread**: ρ₁ = **0.87** — strong persistence from one month to the next
- **KHR Spread**: ρ₁ = **0.97** — extremely high persistence, nearly a unit root

These values directly map to the OU model: the AR(1) coefficient b = e^(−κΔt), so b = 0.87 implies κ ≈ 1.7 (fast reversion) for USD, while b = 0.97 implies κ ≈ 0.4 (slow reversion) for KHR — consistent with the MLE results in Notebook 03.

**2. Slowly Decaying ACF Pattern:**
Both series show ACF that remains significant for many lags but **gradually decays** — the hallmark of a mean-reverting process. A pure random walk would show ACF near 1.0 at all lags, while white noise would show no significant autocorrelation. The observed pattern is exactly what an OU process produces.

**3. PACF Drops After Lag 1:**
The PACF for both series shows a **dominant spike at lag 1** followed by a rapid drop to near zero. This is the signature of an **AR(1) process** — the discrete-time equivalent of the OU model. There is no evidence of higher-order autoregressive structure (AR(2), AR(3), etc.), confirming that the OU model with three parameters (κ, θ, σ) is an appropriate and parsimonious specification.

**4. USD vs. KHR Persistence Difference:**
The KHR lag-1 autocorrelation of 0.97 (vs. USD's 0.87) means KHR spread shocks are **much more persistent** — a shock to the KHR spread takes roughly **4x longer** to dissipate than a USD shock. This has direct implications for credit risk: KHR credit conditions, once deteriorated, **remain elevated for much longer**, making the riel segment more vulnerable to prolonged stress periods.

**For the paper:** The ACF/PACF evidence provides the **strongest statistical justification** for the OU model choice. We can state that the data are consistent with a first-order autoregressive / mean-reverting process, and that the OU model captures the essential dynamics without over-parameterization.

---
## 5. Correlation Analysis

In [None]:
# ─── Correlation Analysis ────────────────────────────────────────────────────
full_corr = spreads['USD_Spread'].corr(spreads['KHR_Spread'])
print(f'Full-sample Pearson correlation: {full_corr:.4f}')
print(f'\nSub-period correlations:')
for pname, (start, end) in periods.items():
    mask = (spreads.index >= start) & (spreads.index <= end)
    sub = spreads[mask]
    corr = sub['USD_Spread'].corr(sub['KHR_Spread'])
    print(f'  {pname}: {corr:.4f}')

### Interpretation — Correlation Analysis

**Full-Sample Correlation: ρ = 0.84** — USD and KHR spreads are **strongly correlated** but not perfectly so. This dual nature is central to the paper's contribution:

- A correlation of 0.84 means the two segments share **common macroeconomic drivers** (GDP growth, global liquidity conditions, banking sector competition), which push both spreads in the same direction
- But the remaining 16% unexplained variance reflects **currency-specific factors**: exchange rate risk, NBC policy differences between currencies, and different borrower profiles in each segment

**The Sub-Period Correlation Story Is Dramatic:**

| Period | Correlation | Interpretation |
|--------|:-----------:|----------------|
| Pre-COVID | 0.73 | Strong co-movement in normal times |
| COVID | **0.11** | Near-complete decoupling during crisis |
| Post-COVID | 0.41 | Partial recovery, but weaker than before |

The **collapse to 0.11 during COVID** is one of the most important findings for the paper. It means:

1. **A single-currency analysis would be deeply misleading** — COVID affected USD and KHR credit risk through entirely different channels
2. The USD segment was stabilized by **global Fed policy** (emergency rate cuts to 0%), while KHR was influenced by **domestic NBC policies** (loan restructuring program, riel liquidity support)
3. This **correlation breakdown under stress** is a classic feature of financial crises (see Forbes & Rigobon, 2002) and is exactly why a **dual-currency framework** is necessary

The only partial recovery to 0.41 post-COVID suggests the relationship between the two segments may have **permanently shifted** — consistent with the KHR spread's structural compression to levels similar to USD.

---
## 6. Publication-Quality Visualizations

### Figure 1: Dual Time Series with COVID Shading

In [None]:
# ─── FIGURE 1: Dual Time Series ──────────────────────────────────────────────
fig, ax = plt.subplots(figsize=(14, 6))

ax.plot(spreads.index, spreads['USD_Spread'], color='#1565C0', linewidth=1.5,
        label='USD Spread', alpha=0.9)
ax.plot(spreads.index, spreads['KHR_Spread'], color='#C62828', linewidth=1.5,
        label='KHR Spread', alpha=0.9)

# COVID shading
ax.axvspan(pd.Timestamp('2020-01-01'), pd.Timestamp('2021-12-31'),
           alpha=0.12, color='grey', label='COVID-19 Period')

# Event annotations
events = [
    ('2020-03-01', 'COVID-19\nOnset', 15),
    ('2022-03-01', 'Fed Rate\nHikes Begin', 18),
    ('2023-07-01', 'Fed Peak\nRate', 15),
    ('2024-09-01', 'Fed Rate\nCuts Begin', 18),
]
for date, label, ypos in events:
    ax.annotate(label, xy=(pd.Timestamp(date), ypos),
                fontsize=7.5, ha='center', va='bottom',
                bbox=dict(boxstyle='round,pad=0.3', facecolor='lightyellow',
                          edgecolor='grey', alpha=0.8),
                arrowprops=dict(arrowstyle='->', color='grey', lw=0.8))

ax.set_xlabel('Date')
ax.set_ylabel('Interest Rate Spread (%)')
ax.set_title('Figure 1: USD and KHR Interest Rate Spreads (Jan 2013 – Dec 2025)',
             fontweight='bold', fontsize=13)
ax.legend(loc='upper right', framealpha=0.9)
ax.grid(True, alpha=0.3)
ax.xaxis.set_major_locator(mdates.YearLocator())
ax.xaxis.set_major_formatter(mdates.DateFormatter('%Y'))
plt.xticks(rotation=45)

plt.tight_layout()
plt.savefig('../figures/fig1_spread_timeseries.png', dpi=300, bbox_inches='tight')
plt.show()
print('Saved: fig1_spread_timeseries.png')

### Interpretation — Figure 1: Time Series of Interest Rate Spreads

Figure 1 tells the **central story** of Cambodia's dual-currency credit risk evolution:

**Phase 1 — High KHR Divergence (2013–2016):** The KHR spread fluctuated between 13–27%, dwarfing the USD spread of 5–11%. This massive gap — sometimes exceeding **15 percentage points** — reflects the early-stage KHR lending market, where banks charged enormous risk premiums for riel-denominated loans due to limited riel liquidity, high perceived exchange rate risk, and the dominance of USD in banking.

**Phase 2 — KHR Compression (2017–2019):** The KHR spread underwent **rapid compression**, falling from ~15% to ~5%. This coincides with NBC's active de-dollarization efforts (higher USD reserve requirements, riel lending incentives) and the general maturation of the banking sector. The gap between currencies narrowed dramatically.

**Phase 3 — COVID Stability (2020–2021):** Surprisingly, both spreads **remained remarkably stable** during the pandemic, with USD holding at ~5.5% and KHR at ~6%. This reflects the NBC's aggressive **loan restructuring program** that prevented banks from repricing risk upward — essentially masking underlying credit deterioration.

**Phase 4 — Post-COVID Convergence (2022–2025):** Post-COVID, both spreads settled into a narrow 4–7% band, with intermittent volatility linked to the **Fed tightening cycle** (2022–2023). The convergence of USD and KHR spreads to similar levels is historical — by 2025, the centuries-old KHR risk premium has effectively disappeared in the term lending market.

**Key Observation:** The anomalous KHR spike to ~27% in January 2017 is a notable outlier — likely a data artifact or a single large transaction distorting the weighted average. This does not invalidate the overall trend.

### Figure 2: Spread Histograms with Normal Overlay

In [None]:
# ─── FIGURE 2: Histograms ────────────────────────────────────────────────────
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 5))

for ax, col, color, label in [(ax1, 'USD_Spread', '#1565C0', 'USD'),
                               (ax2, 'KHR_Spread', '#C62828', 'KHR')]:
    data = spreads[col]
    ax.hist(data, bins=25, density=True, alpha=0.6, color=color, edgecolor='white')
    
    # Normal overlay
    x = np.linspace(data.min() - 1, data.max() + 1, 200)
    ax.plot(x, stats.norm.pdf(x, data.mean(), data.std()),
            color='black', linewidth=1.5, linestyle='--', label='Normal fit')
    
    ax.set_xlabel(f'{label} Spread (%)')
    ax.set_ylabel('Density')
    ax.set_title(f'{label} Spread Distribution', fontweight='bold')
    ax.legend()
    ax.grid(True, alpha=0.3)
    
    # Stats annotation
    textstr = f'μ = {data.mean():.2f}%\nσ = {data.std():.2f}%\nSkew = {data.skew():.2f}\nKurt = {data.kurtosis():.2f}'
    ax.text(0.97, 0.97, textstr, transform=ax.transAxes, fontsize=9,
            verticalalignment='top', horizontalalignment='right',
            bbox=dict(boxstyle='round', facecolor='wheat', alpha=0.5))

fig.suptitle('Figure 2: Distribution of Interest Rate Spreads', fontweight='bold', fontsize=13, y=1.02)
plt.tight_layout()
plt.savefig('../figures/fig2_spread_histograms.png', dpi=300, bbox_inches='tight')
plt.show()
print('Saved: fig2_spread_histograms.png')

### Interpretation — Figure 2: Distribution Histograms

The histograms visually confirm the non-normality detected by the statistical tests:

**USD Spread:** The distribution is **moderately right-skewed** with a concentration of observations between 4–7%. The bulk of the data clusters below the mean of 6.72%, with a long right tail extending to ~11%. This asymmetry arises because the USD spread spent most of the post-2018 period at lower levels (4–6%) but had higher values in the early sample (2013–2016). The normal overlay shows reasonable fit in the center of the distribution but underestimates the right tail — confirming that extreme spread widenings occur more frequently than a normal model predicts.

**KHR Spread:** The distribution is **bimodal or heavily skewed** — a large mass of observations at 4–7% (the post-2018 regime) and a wide, flat tail extending to 24–27% (the pre-2018 regime). The normal overlay is a poor fit, vastly underestimating both the left concentration and the right tail. This is a visual representation of the **structural break** — the KHR spread has effectively operated in two distinct regimes. For modeling purposes, this explains why a rolling window approach (Notebook 07) is valuable: parameters estimated on the full sample will be pulled toward a "compromise" that fits neither regime well.

### Figure 3: Raw Rates (Underlying Components)

In [None]:
# ─── FIGURE 3: Raw Rates ─────────────────────────────────────────────────────
fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(14, 9), sharex=True)

# USD rates
ax1.plot(rates.index, rates['USD_Term_Loans'], color='#1565C0', linewidth=1.3,
         label='USD Term Loan Rate')
ax1.plot(rates.index, rates['USD_Term_Deposits'], color='#64B5F6', linewidth=1.3,
         label='USD Term Deposit Rate', linestyle='--')
ax1.fill_between(rates.index, rates['USD_Term_Deposits'], rates['USD_Term_Loans'],
                 alpha=0.1, color='#1565C0')
ax1.axvspan(pd.Timestamp('2020-01-01'), pd.Timestamp('2021-12-31'),
            alpha=0.1, color='grey')
ax1.set_ylabel('Rate (%)')
ax1.set_title('USD — Term Loan and Term Deposit Rates', fontweight='bold')
ax1.legend(loc='upper right')
ax1.grid(True, alpha=0.3)

# KHR rates
ax2.plot(rates.index, rates['KHR_Term_Loans'], color='#C62828', linewidth=1.3,
         label='KHR Term Loan Rate')
ax2.plot(rates.index, rates['KHR_Term_Deposits'], color='#EF9A9A', linewidth=1.3,
         label='KHR Term Deposit Rate', linestyle='--')
ax2.fill_between(rates.index, rates['KHR_Term_Deposits'], rates['KHR_Term_Loans'],
                 alpha=0.1, color='#C62828')
ax2.axvspan(pd.Timestamp('2020-01-01'), pd.Timestamp('2021-12-31'),
            alpha=0.1, color='grey')
ax2.set_xlabel('Date')
ax2.set_ylabel('Rate (%)')
ax2.set_title('KHR — Term Loan and Term Deposit Rates', fontweight='bold')
ax2.legend(loc='upper right')
ax2.grid(True, alpha=0.3)

ax2.xaxis.set_major_locator(mdates.YearLocator())
ax2.xaxis.set_major_formatter(mdates.DateFormatter('%Y'))
plt.xticks(rotation=45)

fig.suptitle('Figure 3: Underlying Interest Rates — New Amount (Jan 2013 – Dec 2025)',
             fontweight='bold', fontsize=13, y=1.01)
plt.tight_layout()
plt.savefig('../figures/fig3_raw_rates.png', dpi=300, bbox_inches='tight')
plt.show()
print('Saved: fig3_raw_rates.png')

### Interpretation — Figure 3: Underlying Interest Rates

This figure reveals **what drives the spreads** by decomposing them into their lending and deposit rate components:

**USD Panel — Two Distinct Regimes:**
- **2013–2019:** USD loan rates declined from ~14% to ~9% while deposit rates remained stable at ~3.3%. The narrowing spread was driven **entirely by loan rate compression** — increasing banking competition in the dominant USD lending market.
- **2022–2025:** Deposit rates **surged** from ~3.5% to ~5.5% due to the Fed's aggressive tightening cycle (0% → 5.25%). This represents a transmission of U.S. monetary policy into Cambodia's dollarized economy — Cambodian banks had to raise USD deposit rates to remain competitive with rising global rates. Loan rates also ticked up modestly, but the net effect was **spread compression** to the 3–6% range.

**KHR Panel — Dramatic Loan Rate Collapse:**
- KHR loan rates halved from ~30% (2013) to ~10–12% (2019–2025), the single largest change in the dataset. This reflects banking sector maturation, increased competition in the riel segment, NBC incentives for riel lending, and the general development of Cambodia's financial infrastructure.
- KHR deposit rates remained relatively stable at 5–7% throughout, slightly higher than USD deposits — reflecting the premium banks offer to attract riel deposits.
- The spread compression was therefore driven **almost entirely by declining loan rates**, not by rising deposit rates. This is a sign of **healthy financial development** — borrowers accessing credit at lower cost.

**Key Insight for the Paper:** The USD spread is increasingly influenced by **external factors** (Fed policy), while the KHR spread is driven by **domestic structural changes** (financial deepening, de-dollarization). This divergence in drivers explains the COVID-era correlation breakdown and supports modeling each currency separately.

### Figure 4: Scatter Plot — USD vs KHR Spread

In [None]:
# ─── FIGURE 4: Scatter Plot ──────────────────────────────────────────────────
fig, ax = plt.subplots(figsize=(8, 7))

# Color by period
colors = []
for d in spreads.index:
    if d < pd.Timestamp('2020-01-01'):
        colors.append('#1565C0')
    elif d < pd.Timestamp('2022-01-01'):
        colors.append('#FF6F00')
    else:
        colors.append('#2E7D32')

scatter = ax.scatter(spreads['USD_Spread'], spreads['KHR_Spread'],
                     c=colors, alpha=0.6, s=40, edgecolors='white', linewidth=0.5)

# Regression line
slope, intercept, r_value, p_value, std_err = stats.linregress(
    spreads['USD_Spread'], spreads['KHR_Spread'])
x_line = np.linspace(spreads['USD_Spread'].min(), spreads['USD_Spread'].max(), 100)
ax.plot(x_line, slope * x_line + intercept, 'k--', linewidth=1.2, alpha=0.7)

# Legend
from matplotlib.lines import Line2D
legend_elements = [
    Line2D([0], [0], marker='o', color='w', markerfacecolor='#1565C0', markersize=8, label='Pre-COVID'),
    Line2D([0], [0], marker='o', color='w', markerfacecolor='#FF6F00', markersize=8, label='COVID'),
    Line2D([0], [0], marker='o', color='w', markerfacecolor='#2E7D32', markersize=8, label='Post-COVID'),
]
ax.legend(handles=legend_elements, loc='upper right')

ax.set_xlabel('USD Spread (%)')
ax.set_ylabel('KHR Spread (%)')
ax.set_title('Figure 4: USD vs. KHR Spread Correlation', fontweight='bold', fontsize=13)
ax.text(0.05, 0.95, f'ρ = {r_value:.4f}\np < 0.0001',
        transform=ax.transAxes, fontsize=11, verticalalignment='top',
        bbox=dict(boxstyle='round', facecolor='lightyellow', alpha=0.8))
ax.grid(True, alpha=0.3)

plt.tight_layout()
plt.savefig('../figures/fig4_correlation.png', dpi=300, bbox_inches='tight')
plt.show()
print('Saved: fig4_correlation.png')

### Interpretation — Figure 4: Cross-Currency Scatter Plot

The scatter plot beautifully illustrates the **time-varying relationship** between the two currency segments:

**Three Distinct Clusters Emerge:**

1. **Pre-COVID (blue)** — Occupies the upper-right region with a clear positive slope. When USD spreads were high (7–11%), KHR spreads were extremely high (12–26%). The wide dispersion of blue points reflects the volatile KHR market of 2013–2017.

2. **COVID (orange)** — A compact cluster in the lower-left, around USD 5–6% and KHR 5–7%. The tight clustering reflects the **artificially stabilized** market during the NBC restructuring period. These 24 points are nearly independent of each other (correlation ≈ 0.11), appearing as a **random scatter** within the cluster.

3. **Post-COVID (green)** — Concentrated in the lower-left at USD 3.5–6.5% and KHR 4.5–7%. Very similar to the COVID cluster but with slightly more spread, reflecting the **new normal** of converged dual-currency pricing.

**The regression line (ρ = 0.84)** captures the overall positive relationship, but the visual clearly shows this is driven by the **cross-period variation** (the shift from upper-right to lower-left) rather than within-period co-movement. This is a classic example of how a high correlation can be misleading when driven by a common trend rather than genuine short-term co-movement — further supporting the dual-currency modeling approach.

### Figure 5: Box Plots by Year

In [None]:
# ─── FIGURE 5: Box Plots by Year ─────────────────────────────────────────────
spreads_year = spreads.copy()
spreads_year['Year'] = spreads_year.index.year

fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(14, 8), sharex=True)

years = sorted(spreads_year['Year'].unique())
usd_by_year = [spreads_year[spreads_year['Year']==y]['USD_Spread'].values for y in years]
khr_by_year = [spreads_year[spreads_year['Year']==y]['KHR_Spread'].values for y in years]

bp1 = ax1.boxplot(usd_by_year, labels=years, patch_artist=True)
for box in bp1['boxes']:
    box.set(facecolor='#BBDEFB', edgecolor='#1565C0')
ax1.set_ylabel('Spread (%)')
ax1.set_title('USD Spread Distribution by Year', fontweight='bold')
ax1.grid(True, alpha=0.3, axis='y')

# Shade COVID years
for ax in [ax1, ax2]:
    covid_start_idx = years.index(2020)
    ax.axvspan(covid_start_idx + 0.5, covid_start_idx + 2.5, alpha=0.1, color='grey')

bp2 = ax2.boxplot(khr_by_year, labels=years, patch_artist=True)
for box in bp2['boxes']:
    box.set(facecolor='#FFCDD2', edgecolor='#C62828')
ax2.set_xlabel('Year')
ax2.set_ylabel('Spread (%)')
ax2.set_title('KHR Spread Distribution by Year', fontweight='bold')
ax2.grid(True, alpha=0.3, axis='y')
plt.xticks(rotation=45)

fig.suptitle('Figure 5: Annual Spread Distributions', fontweight='bold', fontsize=13, y=1.01)
plt.tight_layout()
plt.savefig('../figures/fig5_boxplots_by_year.png', dpi=300, bbox_inches='tight')
plt.show()
print('Saved: fig5_boxplots_by_year.png')

### Interpretation — Figure 5: Annual Box Plots

The box plots provide a year-by-year view of the **distributional evolution**:

**USD Spread:**
- 2013–2016: High median (~8–10%) with wide boxes, indicating both high levels and high within-year volatility
- 2017–2019: Gradual compression to ~6%, boxes narrowing (more stable pricing)
- 2020–2021 (COVID, shaded): **Remarkably narrow boxes** centered at ~5.5–5.7% — the restructuring program effectively **froze** risk repricing
- 2022–2023: The impact of the Fed tightening cycle is visible in the compressed spreads (3–5%), as rising deposit rates ate into margins
- 2024–2025: Partial recovery to 4.5–6% as Fed began cutting rates

**KHR Spread:**
- 2013–2016: Extremely high medians (18–24%) with large boxes — the immature KHR lending market
- 2017: A **dramatic outlier** (Jan 2017 spike to 26.6%) visible as a whisker; excluding this, the median dropped to ~12%
- 2018–2019: Rapid compression to 6–8%, boxes much narrower
- 2020–2025: Stable at 5–7% with very tight boxes — the KHR market has matured

**Key Pattern — Volatility Compression:** Both currencies show a secular trend toward **tighter boxes** (lower within-year volatility) over time. This suggests that Cambodia's banking sector has become more efficient at pricing credit risk, with less month-to-month fluctuation. For the OU model, this implies that the volatility parameter σ should be **lower in recent sub-periods** — as confirmed in Notebook 06's COVID analysis.

---
## Summary of Key Findings

The exploratory analysis reveals five fundamental characteristics of Cambodia's dual-currency credit risk landscape:

| Finding | USD | KHR | Implication |
|---------|-----|-----|-------------|
| Mean spread | 6.72% | 11.34% | KHR carries exchange rate risk premium |
| Volatility | 2.02% | 7.11% (3.5× higher) | KHR segment far more unstable |
| Normality | Rejected | Rejected | Stress testing needed for tail risks |
| ADF stationarity | Not rejected (p=0.86) | Not rejected (p=0.44) | Structural breaks reduce ADF power |
| Lag-1 autocorrelation | 0.87 | 0.97 | KHR shocks much more persistent |
| Cross-correlation | 0.84 full-sample, **0.11 during COVID** | | Dual-currency framework essential |

These findings validate the choice of the Ornstein-Uhlenbeck stochastic process for modeling, while highlighting that a **time-varying parameter approach** (rolling windows) will be essential to capture the structural shifts in the KHR market. The COVID-era correlation breakdown provides the strongest justification for the paper's dual-currency framework.