# Market-Consistent Interest Rate Modeling and Forecasting for Cambodia: KHR vs USD

## Research Methodology — Team Assignment (3 Members)

---

### Team Workload Division

| Member | Focus Area | Components | Objectives | Research Questions |
|:------:|-----------|------------|:----------:|:------------------:|
| **Person 1** | Data Foundation, Historical Analysis & Spread–Credit Risk | Data collection & cleaning, Descriptive analysis, Loan–deposit spread analysis | RO1 – RO4 | RQ1 – RQ4 |
| **Person 2** | Econometric Modeling & Forecasting | VAR & Granger causality, ARIMA/SARIMA forecasting | RO5 – RO7 | RQ5 – RQ7 |
| **Person 3** | Stochastic Modeling, Yield Curves & Insurance Applications | Vasicek/CIR calibration & simulation, Nelson–Siegel yield curve, Discount rate determination | RO8 – RO11 | RQ8 – RQ11 |

### Dependency Flow

```
Person 1 (Data & Descriptive & Spreads)
    │
    ├──→ Person 2 (VAR, Granger, ARIMA/SARIMA)
    │
    └──→ Person 3 (Vasicek/CIR, Nelson–Siegel, Discount Rates)
```

Person 1 produces the **cleaned dataset** and **descriptive insights** that both Person 2 and Person 3 depend on. Person 2 and Person 3 can work in **parallel** once Person 1 delivers the prepared data.

---

## Shared: Dataset Overview

**Source:** National Bank of Cambodia (NBC)  
**File:** `16_loananddepositweightedratedec-25_8815__1_.xlsx`  
**Period:** January 2013 – December 2025 (156 monthly observations)  
**Rate Types:** Weighted Average on New Amount + Weighted Average on Outstanding Amount

| Currency | Deposit Products | Loan Products |
|:--------:|-----------------|---------------|
| **KHR** | Demand, Saving, Term, Other | Overdraft, Credit Card, Term Loans, Other Loans |
| **USD** | Demand, Saving, Term, Other | Overdraft, Credit Card, Term Loans, Other Loans |

**Total:** 32 interest rate series (2 currencies × 2 rate types × 8 products)

---

---
# 👤 PERSON 1: Data Foundation, Historical Analysis & Spread–Credit Risk
---

**Scope:** Person 1 is responsible for the entire data pipeline — from raw data collection through cleaning, descriptive analysis, and the complete loan–deposit spread and credit risk analysis. This is the foundational work upon which all other components depend.

## Person 1 — Research Objectives

| # | Objective | Corresponding RQ |
|---|----------|:-----------------:|
| **RO1** | Collect, clean, and prepare the historical KHR and USD interest rate dataset for time-series analysis | → RQ1 |
| **RO2** | Analyze historical behavior, trends, volatility, and structural patterns of deposit and loan interest rates for both currencies | → RQ2 |
| **RO3** | Quantify and compare loan–deposit spreads across currencies and product types, identify regime changes, and decompose spreads into economic components | → RQ3 |
| **RO4** | Assess the relationship between spread dynamics and credit risk indicators, and evaluate the impact of high spreads on borrowers, depositors, and financial inclusion | → RQ4 |

---

**RO1 — Data Foundation:** Collect historical weighted-average deposit and loan interest rates from the NBC for both KHR and USD across all product types. Clean the data by handling missing values, detecting outliers, testing for stationarity, and preparing the dataset in a format suitable for all downstream analyses (Person 2 and Person 3).

**RO2 — Historical Analysis:** Examine the long-term trends, short-term fluctuations, seasonal patterns, structural breaks, and volatility regimes of KHR and USD interest rates across deposit types (Demand, Saving, Term, Other) and loan types (Overdraft, Term Loan, Credit Card, Other). Identify how rates differ across products and currencies, and contextualize movements against macroeconomic events.

**RO3 — Spread Analysis:** Compute loan–deposit spreads for each currency and product pair over time. Compare KHR vs. USD spreads to measure the currency risk premium. Identify structural breaks in spread behavior and decompose spreads into credit risk premium, liquidity premium, operational cost, and profit margin components.

**RO4 — Credit Risk & Impact Assessment:** Analyze how spreads correlate with credit risk indicators (NPL ratios, provisioning). Test for feedback loops between spreads and defaults. Quantify the impact of high spreads on borrowers (excess cost of credit), depositors (real returns), and financial inclusion. Benchmark Cambodia against regional peers.

## Person 1 — Research Questions

**RQ1** *(→ RO1: Data Foundation)* **— Data Quality and Preparation**

What are the characteristics of the available KHR and USD interest rate data (completeness, frequency, outliers, stationarity), and what cleaning and transformation procedures are required to prepare a reliable dataset for time-series modeling?

---

**RQ2** *(→ RO2: Historical Analysis)* **— Historical Behavior, Trends, and Volatility**

What are the long-term trends, short-term fluctuations, seasonal patterns, structural breaks, and volatility characteristics of KHR and USD deposit and loan interest rates in Cambodia? How do these characteristics differ across product types and between the two currencies?

---

**RQ3** *(→ RO3: Spread Analysis)* **— Spread Dynamics and Decomposition**

How do loan–deposit spreads vary across currency denominations (KHR vs. USD) and product types over the 2013–2025 period? Is the KHR spread statistically significantly higher than the USD spread? Are there structural breaks in spread behavior, and what are the primary economic components (credit risk, liquidity, operational cost, profit) driving the spreads in each currency?

---

**RQ4** *(→ RO4: Credit Risk & Impact)* **— Credit Risk Relationship and Socioeconomic Impact**

How do spread movements relate to credit risk indicators (NPL ratios, provisioning)? Is there a feedback loop between high spreads and rising defaults? What is the quantitative impact of high spreads on borrowers (cost of credit), depositors (real return on savings), and financial inclusion in Cambodia compared to regional peers?

## Person 1 — Detailed Methodology

### Step 1: Data Collection *(→ RQ1)*

**Purpose:** Assemble the complete historical dataset from NBC and supplementary sources.

**Primary Data:**

| Source | Variables |
|--------|-----------|
| NBC | Weighted-average deposit rates (Demand, Saving, Term, Other) × (KHR, USD) × (New, Outstanding) |
| NBC | Weighted-average loan rates (Overdraft, Credit Card, Term Loan, Other) × (KHR, USD) × (New, Outstanding) |

**Supplementary Data (to collect additionally):**

| Variable | Source | Frequency | Purpose |
|----------|--------|-----------|---------|
| CPI / Inflation rate | NBC or NIS | Monthly | Real interest rate calculation |
| NPL ratio by currency | NBC Supervision Reports | Quarterly/Annual | Credit risk proxy |
| GDP growth | World Bank / NBC | Quarterly | Business cycle control |
| US Federal Funds Rate | FRED | Monthly | USD transmission benchmark |
| Bank ROA / NIM | NBC | Annual | Profitability validation |

**Deliverable to Team:** Cleaned, aligned DataFrame saved as `.csv` and `.pkl` files for Person 2 and Person 3.

### Step 2: Data Cleaning and Preparation *(→ RQ1)*

**A. Initial Inspection:**
- Load data into a structured DataFrame with a consistent monthly datetime index.
- Verify all rates are in annualized percentage points.
- Document completeness: 25 of 32 series are 100% complete; 7 have zero-value gaps.

**B. Missing Value Treatment:**

| Gap Type | Method | Series Affected |
|----------|--------|-----------------|
| 1–2 consecutive zeros | Linear interpolation | KHR Overdraft (10 zeros) |
| Early-period zeros (product didn't exist) | Trim series start date | KHR Credit Card (40 zeros), KHR Other Loans (49 zeros) |
| Large blocks of zeros | Flag and exclude from affected analyses | KHR Other Deposits (76 zeros) |

**C. Outlier Detection:**
- Apply Z-score ($|Z| > 3$) and IQR methods.
- Cross-reference outliers with known events (policy changes, crises). Retain genuine events; correct errors.

**D. Stationarity Testing:**
- **Augmented Dickey–Fuller (ADF) test** on each series:
  - $H_0$: Unit root exists (non-stationary)
  - Reject at 5% → series is stationary
- **KPSS test** as complementary check:
  - $H_0$: Series is stationary
- Record which series require differencing ($d = 1$ or $d = 2$). Provide this to Person 2 for ARIMA.

**E. Data Transformation:**
- Log transformation for highly volatile KHR series (if needed to stabilize variance).
- Create differenced versions ($\Delta y_t = y_t - y_{t-1}$) for non-stationary series.

**Expected Output:** Clean dataset + stationarity test results table + data quality report.

### Step 3: Descriptive and Historical Analysis *(→ RQ2)*

**A. Summary Statistics:**

Compute for each interest rate series (by currency, product, rate type):

| Statistic | Formula | Interpretation |
|-----------|---------|----------------|
| Mean ($\bar{x}$) | $\frac{1}{T}\sum x_t$ | Average rate level |
| Std. Deviation ($\sigma$) | $\sqrt{\frac{1}{T-1}\sum(x_t - \bar{x})^2}$ | Rate volatility |
| Skewness | $\frac{1}{T}\sum\left(\frac{x_t-\bar{x}}{\sigma}\right)^3$ | Asymmetry |
| Kurtosis | $\frac{1}{T}\sum\left(\frac{x_t-\bar{x}}{\sigma}\right)^4$ | Tail risk |
| Min / Max / IQR | — | Range and central dispersion |

**B. Time-Series Visualization:**
1. Line plots of all deposit and loan rates over time, with KHR and USD overlaid.
2. Annotate key events: 2013 (start), 2016–2017 (NBC reforms), 2020–2021 (COVID-19), 2022+ (recovery).

**C. Trend Decomposition:**
- Apply **STL decomposition** (Seasonal-Trend using LOESS) to separate each series into trend, seasonal, and residual components.

**D. Rolling Volatility:**
- 6-month and 12-month rolling standard deviations:

$$\sigma_{t,w} = \sqrt{\frac{1}{w-1}\sum_{i=0}^{w-1}(x_{t-i} - \bar{x}_{t,w})^2}$$

- Compare KHR vs. USD volatility regimes.

**E. Structural Break Detection:**
- **Bai–Perron multiple breakpoint test** on key series (KHR Term Loan, USD Term Loan, KHR Term Deposit, USD Term Deposit).
- Report break dates, regime-specific means, and economic context.

**F. Correlation Matrix:**
- Contemporaneous and lagged correlations between all KHR and USD series.
- Identify preliminary evidence of cross-currency linkages (useful for Person 2's VAR).

**Expected Output:** Summary tables, time-series charts, STL decomposition plots, rolling volatility charts, break date table, and correlation heatmaps.

### Step 4: Loan–Deposit Spread Calculation and Analysis *(→ RQ3)*

**A. Spread Calculation:**

$$\text{Spread}_{c,p,t} = \text{Loan Rate}_{c,l,t} - \text{Deposit Rate}_{c,d,t}$$

**Spread Matrix:**

| Spread Pair | Loan Rate | Deposit Rate | Interpretation |
|-------------|-----------|--------------|----------------|
| Term Spread | Term Loan | Term Deposit | Core intermediation margin |
| Demand Spread | Overdraft | Demand Deposit | Short-term credit margin |
| Saving Spread | Term Loan | Saving Deposit | Margin on low-cost funding |
| Credit Card Spread | Credit Card | Demand Deposit | Unsecured consumer margin |
| Aggregate Spread | Average Loan | Average Deposit | Overall sector margin |

**B. Cross-Currency Differential:**

$$\Delta\text{Spread}_t = \text{Spread}_{\text{KHR},t} - \text{Spread}_{\text{USD},t}$$

- **Paired t-test** or **Wilcoxon signed-rank test**: $H_0: \mu_{\Delta} = 0$ vs. $H_1: \mu_{\Delta} > 0$
- **Cohen's d** for effect size: $d = \bar{\Delta} / s_{\Delta}$
- **Linear trend**: $\Delta_t = \gamma_0 + \gamma_1 t + u_t$ (convergence vs. divergence)

**C. Structural Breaks in Spreads:**
- Bai–Perron test on aggregate KHR and USD spread series.
- Chow test at known policy dates (e.g., 2017 interest rate cap).
- CUSUM / CUSUM-SQ for parameter stability.

**D. Spread Decomposition:**

$$\text{Spread}_{c,t} = \underbrace{\text{CRP}_{c,t}}_{\text{Credit Risk}} + \underbrace{\text{LP}_{c,t}}_{\text{Liquidity}} + \underbrace{\text{OC}_{c,t}}_{\text{Operational Cost}} + \underbrace{\pi_{c,t}}_{\text{Profit}}$$

| Component | Proxy | Formula |
|-----------|-------|---------|
| Credit Risk Premium | NPL ratio × LGD | $\widehat{\text{CRP}} \approx \text{NPL} \times \text{LGD}$ |
| Liquidity Premium | Term–Demand deposit spread | $\widehat{\text{LP}} \approx r^{\text{term dep}} - r^{\text{demand dep}}$ |
| Operational Cost | Cost-to-income ratio proxy | From NBC annual reports |
| Profit Margin | Residual | $\hat{\pi} = \text{Spread} - \text{CRP} - \text{LP} - \text{OC}$ |

- Construct stacked area charts of composition over time for KHR and USD.

**Expected Output:** Spread panel, cross-currency test results, break dates, and decomposition charts.

### Step 5: Credit Risk Analysis and Impact Assessment *(→ RQ4)*

**A. Correlation Analysis:**

| Variable | Expected Sign | Rationale |
|----------|:---:|-----------|
| NPL ratio | + | Higher defaults → wider spread |
| Provisioning ratio | + | Higher expected losses → wider spread |
| GDP growth | − | Expansion → lower risk → narrower spread |
| Inflation | + | Uncertainty → higher premium |

**B. Regression Model:**

$$\text{Spread}_{c,t} = \alpha + \beta_1 \text{NPL}_{c,t} + \beta_2 \text{Inflation}_t + \beta_3 \text{GDP}_t + \beta_4 \text{LP}_{c,t} + \beta_5 D_{\text{KHR}} + \varepsilon_{c,t}$$

- Newey-West standard errors for HAC-robust inference.
- Interaction term $D_{\text{KHR}} \times \text{NPL}$ to test currency-specific credit risk sensitivity.

**C. Granger Causality (Spread ↔ NPL):**

$$H_0^{(1)}: \text{NPL does not Granger-cause Spread}$$
$$H_0^{(2)}: \text{Spread does not Granger-cause NPL}$$

If both rejected → **vicious cycle** (high spreads → borrower stress → more defaults → wider spreads).

**D. Impact on Borrowers:**

$$\text{Excess Cost}_t = (r_{c,t}^{\text{loan}} - r_{\text{benchmark}}) \times \text{Loan Volume}_{c,t}$$

**E. Real Return to Depositors:**

$$r_{c,t}^{\text{real}} \approx r_{c,t}^{\text{deposit}} - \pi_t$$

Negative real returns → savers lose purchasing power → discourages savings mobilization.

**F. Regional Benchmarking:**

| Country | Typical Spread (%) |
|---------|:---:|
| Cambodia (KHR) | ~20–25% |
| Cambodia (USD) | ~8–12% |
| Vietnam | ~3–5% |
| Thailand | ~3–4% |
| Laos | ~5–10% |

**Expected Output:** Correlation matrices, regression tables, Granger causality results, excess cost estimates, real return analysis, and regional comparison.

---
# 👤 PERSON 2: Econometric Modeling & Forecasting
---

**Scope:** Person 2 is responsible for modeling the dynamic interaction between KHR and USD interest rates using VAR and Granger causality, and for generating short-term interest rate forecasts using ARIMA/SARIMA models.

**Depends on:** Person 1's cleaned dataset and stationarity test results.

## Person 2 — Research Objectives

| # | Objective | Corresponding RQ |
|---|----------|:-----------------:|
| **RO5** | Model the dynamic interaction and transmission mechanism between KHR and USD interest rates using VAR | → RQ5 |
| **RO6** | Determine the direction of causal influence between currencies using Granger causality testing | → RQ6 |
| **RO7** | Forecast short-term interest rate movements for both KHR and USD using ARIMA/SARIMA models | → RQ7 |

---

**RO5 — VAR Modeling:** Estimate a Vector Autoregression model to capture how KHR and USD interest rates jointly evolve over time. Quantify cross-currency spillover effects using impulse response functions (IRFs) and forecast error variance decomposition (FEVD). This is critical for understanding whether Cambodia's dollarized economy means that USD rates (driven by US Fed policy) dominate KHR rate dynamics.

**RO6 — Granger Causality:** Formally test the direction of influence between KHR and USD interest rates. Determine whether USD rates Granger-cause KHR rates (reflecting dollarization transmission), whether KHR rates Granger-cause USD rates (reflecting domestic monetary policy autonomy), or whether the relationship is bidirectional.

**RO7 — Short-Term Forecasting:** Develop ARIMA/SARIMA models for each key interest rate series to produce point forecasts and confidence intervals over a 1–12 month horizon. Validate forecast accuracy using out-of-sample testing (RMSE, MAE, MAPE). These forecasts provide actionable intelligence for bank ALM, loan pricing, and depositor strategy.

## Person 2 — Research Questions

**RQ5** *(→ RO5: VAR Modeling)* **— Cross-Currency Dynamic Interaction**

How do KHR and USD interest rates dynamically interact over time? What is the magnitude of cross-currency spillover — specifically, how does a shock to USD rates propagate to KHR rates (and vice versa), and how long does the effect persist?

---

**RQ6** *(→ RO6: Granger Causality)* **— Direction of Causal Influence**

Does the USD interest rate Granger-cause changes in KHR rates (reflecting the dominance of US monetary policy in a dollarized economy), does the KHR rate Granger-cause USD rates (reflecting domestic monetary policy transmission), or is the relationship bidirectional? What proportion of forecast error variance in each currency is attributable to shocks from the other?

---

**RQ7** *(→ RO7: ARIMA/SARIMA Forecasting)* **— Short-Term Forecast Accuracy**

How accurately can ARIMA and SARIMA models forecast KHR and USD interest rate movements over a 1–12 month horizon? Which model specification (order $p, d, q$ and seasonal components $P, D, Q, s$) provides the best fit for each currency and product type? Do the forecasts capture the declining trend in KHR rates and the relative stability of USD rates?

## Person 2 — Detailed Methodology

### Step 6: VAR Model Estimation *(→ RQ5)*

**Input Required from Person 1:** Cleaned, stationary (or appropriately differenced) KHR and USD interest rate series.

**Model Specification — VAR($p$):**

The bivariate VAR system:

$$\begin{pmatrix} \text{KHR}_t \\ \text{USD}_t \end{pmatrix} = \begin{pmatrix} a_1 \\ a_2 \end{pmatrix} + \sum_{i=1}^{p} \begin{pmatrix} b_{11}^{(i)} & b_{12}^{(i)} \\ b_{21}^{(i)} & b_{22}^{(i)} \end{pmatrix} \begin{pmatrix} \text{KHR}_{t-i} \\ \text{USD}_{t-i} \end{pmatrix} + \begin{pmatrix} \varepsilon_{1t} \\ \varepsilon_{2t} \end{pmatrix}$$

For the baseline VAR(1):

$$\text{KHR}_t = a_1 + b_{11} \cdot \text{KHR}_{t-1} + b_{12} \cdot \text{USD}_{t-1} + \varepsilon_{1t}$$
$$\text{USD}_t = a_2 + b_{21} \cdot \text{KHR}_{t-1} + b_{22} \cdot \text{USD}_{t-1} + \varepsilon_{2t}$$

**Key coefficients of interest:**
- $b_{12}$: Effect of lagged USD rate on current KHR rate (dollarization transmission)
- $b_{21}$: Effect of lagged KHR rate on current USD rate (domestic policy influence)

**Procedures:**

1. **Confirm stationarity:** Use ADF/KPSS results from Person 1. If series are I(1), consider:
   - VAR in first differences, or
   - VECM (Vector Error Correction Model) if cointegration is detected (Johansen test)

2. **Lag order selection:** Estimate VAR for $p = 1, 2, \ldots, 12$ and select optimal $p$ using:

| Criterion | Formula | Preference |
|-----------|---------|------------|
| AIC | $\ln|\hat{\Sigma}| + \frac{2k}{T}$ | Flexible (may overfit) |
| BIC | $\ln|\hat{\Sigma}| + \frac{k \ln T}{T}$ | Parsimonious |
| HQ | $\ln|\hat{\Sigma}| + \frac{2k \ln(\ln T)}{T}$ | Compromise |

3. **Estimate VAR($p$)** by OLS equation-by-equation.

4. **Diagnostic checks:**
   - **Residual autocorrelation:** Ljung-Box test, Portmanteau test
   - **Residual normality:** Jarque-Bera test
   - **Stability:** Verify all eigenvalues of the companion matrix lie inside the unit circle

5. **Impulse Response Functions (IRFs):**
   - Compute orthogonalized IRFs using Cholesky decomposition.
   - Trace a 1-standard-deviation shock to USD → effect on KHR over 24 months (and vice versa).
   - Bootstrap 95% confidence bands (1,000+ replications).

6. **Forecast Error Variance Decomposition (FEVD):**
   - At horizon $h = 1, 3, 6, 12, 24$ months, decompose:
     - What % of KHR forecast error is due to USD shocks?
     - What % of USD forecast error is due to KHR shocks?

**Expected Output:** Estimated VAR coefficients table, lag selection criteria comparison, diagnostic test results, IRF plots with confidence bands, and FEVD table.

### Step 7: Granger Causality Testing *(→ RQ6)*

**Purpose:** Formally test the direction of predictive influence between KHR and USD rates.

**Granger Causality Definition:** Variable $X$ Granger-causes $Y$ if past values of $X$ contain information useful for predicting $Y$ beyond what past values of $Y$ alone provide.

**Test 1: USD → KHR (Dollarization Transmission)**

- $H_0$: USD does **not** Granger-cause KHR (i.e., $b_{12}^{(i)} = 0 \ \forall \ i$)
- $H_1$: USD **does** Granger-cause KHR
- Test statistic: F-test (Wald test) on joint significance of lagged USD coefficients in the KHR equation

**Test 2: KHR → USD (Domestic Policy Autonomy)**

- $H_0$: KHR does **not** Granger-cause USD (i.e., $b_{21}^{(i)} = 0 \ \forall \ i$)
- $H_1$: KHR **does** Granger-cause USD

**Interpretation Grid:**

| USD→KHR | KHR→USD | Interpretation |
|:---:|:---:|---|
| ✅ Reject | ❌ Fail to reject | USD drives KHR — dollarization dominance |
| ❌ Fail to reject | ✅ Reject | KHR drives USD — domestic monetary policy effective |
| ✅ Reject | ✅ Reject | Bidirectional — feedback between currencies |
| ❌ Fail to reject | ❌ Fail to reject | No significant linkage — rates move independently |

**Procedures:**
1. Run Granger causality tests at multiple lag orders ($p = 1, 2, 3, 6, 12$) for robustness.
2. Report F-statistics and p-values.
3. Run tests on multiple product pairs (e.g., Term Deposits, Term Loans, Saving Deposits) to check consistency.
4. Combine with IRF and FEVD results from Step 6 for a comprehensive narrative.

**Expected Output:** Granger causality test table (F-stats, p-values, lag orders), interpretation summary, and combined narrative with IRF/FEVD.

### Step 8: ARIMA/SARIMA Forecasting *(→ RQ7)*

**Purpose:** Develop univariate time-series forecasts for short-term (1–12 month) interest rate prediction.

**Model Specification — ARIMA($p, d, q$):**

$$\phi(L)(1 - L)^d y_t = \theta(L)\varepsilon_t$$

where:
- $\phi(L) = 1 - \phi_1 L - \cdots - \phi_p L^p$ (AR polynomial)
- $\theta(L) = 1 + \theta_1 L + \cdots + \theta_q L^q$ (MA polynomial)
- $d$ = differencing order (from Person 1's stationarity tests)
- $\varepsilon_t \sim WN(0, \sigma^2)$

**SARIMA($p, d, q$)($P, D, Q$)$_s$ extension (seasonal):**

$$\phi(L)\Phi(L^s)(1-L)^d(1-L^s)^D y_t = \theta(L)\Theta(L^s)\varepsilon_t$$

with seasonal period $s = 12$ (monthly data).

**Procedures:**

**A. Model Identification:**
1. Examine ACF and PACF plots of the (differenced) series.
   - PACF cuts off at lag $p$ → AR($p$) candidate
   - ACF cuts off at lag $q$ → MA($q$) candidate
   - ACF/PACF spikes at seasonal lags ($12, 24, \ldots$) → seasonal terms needed
2. Use `pmdarima.auto_arima` for automated order selection (AIC/BIC minimization).

**B. Estimation:**
- Fit via Maximum Likelihood Estimation (MLE).
- Target series: KHR Term Deposits, KHR Term Loans, USD Term Deposits, USD Term Loans (at minimum). Optionally extend to all 25 complete series.

**C. Diagnostic Checking:**

| Test | Purpose | Criterion |
|------|---------|----------|
| Ljung-Box | Residual autocorrelation | p-value > 0.05 |
| Jarque-Bera | Residual normality | p-value > 0.05 |
| ACF of residuals | Visual white noise check | No significant spikes |
| AIC / BIC comparison | Model parsimony | Lower is better |
| Coefficient significance | Parameter relevance | p-value < 0.05 |

**D. Out-of-Sample Validation:**
- Split: ~130 observations (training) / ~26 observations (test).
- Rolling or expanding window forecast.
- Accuracy metrics:

$$\text{RMSE} = \sqrt{\frac{1}{n}\sum(\hat{y}_t - y_t)^2} \qquad \text{MAE} = \frac{1}{n}\sum|\hat{y}_t - y_t| \qquad \text{MAPE} = \frac{100}{n}\sum\left|\frac{\hat{y}_t - y_t}{y_t}\right|$$

- **Diebold-Mariano test** to compare forecast accuracy between competing models.

**E. Forecast Generation:**
- Point forecasts + 95% confidence intervals for $h = 1, 3, 6, 12$ months ahead.
- Compare KHR vs. USD forecast uncertainty (KHR expected to have wider intervals).

**Expected Output:** Optimal ARIMA/SARIMA orders table, diagnostic test results, out-of-sample accuracy comparison, and forecast plots with confidence bands.

---
# 👤 PERSON 3: Stochastic Modeling, Yield Curves & Insurance Applications
---

**Scope:** Person 3 is responsible for calibrating long-term stochastic interest rate models (Vasicek and CIR), constructing yield curves using the Nelson–Siegel framework, and deriving market-consistent discount rates for insurance liability valuation.

**Depends on:** Person 1's cleaned dataset and descriptive statistics (mean, volatility, mean-reversion evidence).

## Person 3 — Research Objectives

| # | Objective | Corresponding RQ |
|---|----------|:-----------------:|
| **RO8** | Calibrate and compare Vasicek and CIR stochastic models to historical KHR and USD interest rate data | → RQ8 |
| **RO9** | Simulate long-term interest rate paths and generate probabilistic scenario distributions | → RQ9 |
| **RO10** | Construct Nelson–Siegel yield curves for both currencies and interpret the term structure | → RQ10 |
| **RO11** | Derive market-consistent discount rates for insurance liability valuation and provide safe rate recommendations | → RQ11 |

---

**RO8 — Stochastic Model Calibration:** Estimate the parameters of the Vasicek and Cox–Ingersoll–Ross (CIR) models ($a$, $b$, $\sigma$) for KHR and USD interest rates using Maximum Likelihood Estimation (MLE) or Generalized Method of Moments (GMM). Compare model fit to determine which specification better captures each currency's rate dynamics.

**RO9 — Long-Term Simulation:** Use the calibrated models to generate Monte Carlo simulations of interest rate paths over 10–30 year horizons. Produce fan charts showing percentile-based distributions of future rates. These scenarios support long-term strategic planning and stress testing.

**RO10 — Yield Curve Construction:** Fit Nelson–Siegel yield curves for both KHR and USD using available cross-sectional interest rate data at different effective maturities. Interpret the level, slope, and curvature parameters to characterize the term structure in Cambodia's dual-currency environment.

**RO11 — Insurance Discount Rates:** Extract spot rates from the yield curves and compute stochastic discount factors from Monte Carlo simulations to derive market-consistent discount rates. Perform sensitivity analysis and recommend safe (conservative) discount rates for KHR and USD insurance liabilities.

## Person 3 — Research Questions

**RQ8** *(→ RO8: Stochastic Model Calibration)* **— Vasicek vs. CIR Fit**

How well do the Vasicek and CIR models capture the mean-reverting dynamics of KHR and USD interest rates? What are the calibrated parameters ($a$, $b$, $\sigma$) for each currency, and which model provides a better fit based on log-likelihood, AIC/BIC, and visual comparison with historical data?

---

**RQ9** *(→ RO9: Long-Term Simulation)* **— Probabilistic Rate Scenarios**

What are the plausible long-term paths of KHR and USD interest rates over 10–30 year horizons under the calibrated stochastic models? What are the median, optimistic (5th percentile), and pessimistic (95th percentile) rate trajectories, and how do the Vasicek and CIR scenarios differ in terms of tail risk and rate floor behavior?

---

**RQ10** *(→ RO10: Yield Curve Construction)* **— Term Structure of Interest Rates**

What is the shape of the yield curve for KHR and USD in Cambodia? What do the Nelson–Siegel parameters ($\beta_0$, $\beta_1$, $\beta_2$, $\lambda$) reveal about the long-term rate level, slope, and curvature of the term structure in each currency? How does the KHR curve compare to the USD curve?

---

**RQ11** *(→ RO11: Insurance Discount Rates)* **— Market-Consistent Discount Rates**

What are the appropriate market-consistent discount rates for insurance liability valuation in Cambodia, separately for KHR and USD liabilities? How sensitive are present value calculations to discount rate assumptions (±50bps, ±100bps), and what safe discount rates should be recommended to ensure adequate reserving?

## Person 3 — Detailed Methodology

### Step 9: Stochastic Model Calibration *(→ RQ8)*

**Input Required from Person 1:** Cleaned, level (undifferenced) KHR and USD interest rate series + descriptive statistics (mean, std dev, autocorrelation).

**A. Vasicek Model (1977):**

$$dr_t = a(b - r_t)\,dt + \sigma\,dW_t$$

| Parameter | Meaning | Expected Range |
|-----------|---------|----------------|
| $a$ | Speed of mean reversion | 0.01 – 2.0 |
| $b$ | Long-run mean rate | KHR: ~10–15%, USD: ~5–8% |
| $\sigma$ | Volatility (constant) | KHR: higher, USD: lower |

**Properties:** Gaussian; allows negative rates (limitation); closed-form bond pricing.

**B. Cox–Ingersoll–Ross Model (1985):**

$$dr_t = a(b - r_t)\,dt + \sigma\sqrt{r_t}\,dW_t$$

**Properties:** Non-central chi-squared distribution; ensures $r_t \geq 0$ when **Feller condition** $2ab \geq \sigma^2$ holds; volatility proportional to $\sqrt{r_t}$.

**Calibration via MLE:**

The discretized versions (Euler-Maruyama, time step $\Delta t$):

$$\text{Vasicek: } r_{t+\Delta t} = r_t + a(b - r_t)\Delta t + \sigma\sqrt{\Delta t}\,Z_t$$
$$\text{CIR: } r_{t+\Delta t} = r_t + a(b - r_t)\Delta t + \sigma\sqrt{r_t}\sqrt{\Delta t}\,Z_t$$

where $Z_t \sim N(0,1)$. The conditional distribution is:
- Vasicek: $r_{t+\Delta t} | r_t \sim N(\mu, s^2)$ where $\mu = r_t + a(b - r_t)\Delta t$, $s = \sigma\sqrt{\Delta t}$
- CIR: $r_{t+\Delta t} | r_t \sim N(\mu, s^2)$ where $\mu = r_t + a(b - r_t)\Delta t$, $s = \sigma\sqrt{r_t \Delta t}$

Log-likelihood:

$$\mathcal{L}(a, b, \sigma) = \sum_{t=1}^{T-1} \left[ -\frac{1}{2}\ln(2\pi s_t^2) - \frac{(r_{t+1} - \mu_t)^2}{2s_t^2} \right]$$

Maximize $\mathcal{L}$ using `scipy.optimize.minimize` (Nelder-Mead or L-BFGS-B).

**Procedures:**
1. Calibrate Vasicek and CIR separately for KHR and USD (4 calibrations total).
2. Report estimated $\hat{a}$, $\hat{b}$, $\hat{\sigma}$ with standard errors.
3. Check Feller condition for CIR ($2\hat{a}\hat{b} \geq \hat{\sigma}^2$).
4. Compare models:

| Metric | Better Model |
|--------|-------------|
| Log-likelihood | Higher is better |
| AIC = $-2\mathcal{L} + 2k$ | Lower is better |
| BIC = $-2\mathcal{L} + k\ln T$ | Lower is better |

**Expected Output:** Parameter estimates table (4 models), model comparison table, Feller condition check.

### Step 10: Monte Carlo Simulation *(→ RQ9)*

**Purpose:** Generate large-scale simulations of future interest rate paths using calibrated models.

**Simulation Setup:**

| Parameter | Value |
|-----------|-------|
| Number of paths ($N$) | 10,000 |
| Horizon | 10, 20, and 30 years |
| Time step ($\Delta t$) | 1/12 (monthly) |
| Starting rate ($r_0$) | Last observed rate in dataset (Dec 2025) |

**Simulation Algorithm (per path):**

```
For each path i = 1, ..., N:
    Set r_0 = last observed rate
    For each time step t = 1, ..., T:
        Draw Z ~ N(0,1)
        Vasicek: r_t = r_{t-1} + a(b - r_{t-1})Δt + σ√(Δt) × Z
        CIR:     r_t = r_{t-1} + a(b - r_{t-1})Δt + σ√(r_{t-1})√(Δt) × Z
                 r_t = max(r_t, 0)  # floor at zero for CIR
```

**Output Processing:**

At each future time point, compute from the $N$ simulated paths:

| Percentile | Interpretation |
|:----------:|----------------|
| 5th | Optimistic scenario (low rates) |
| 25th | Mild low scenario |
| 50th (median) | Central scenario |
| 75th | Mild high scenario |
| 95th | Pessimistic scenario (high rates) |

**Fan Charts:**
- Plot the median path with shaded percentile bands (5–95%, 25–75%).
- Overlay historical data for context.
- Produce separate charts for KHR and USD, and for Vasicek vs. CIR.

**Model Comparison via Simulation:**
- Check if Vasicek produces negative rates (especially for low USD rates).
- Check if CIR rate floor behavior is more realistic.
- Compare simulation distributions against historical distribution.

**Expected Output:** 10,000-path simulation results, fan charts (4: KHR-Vasicek, KHR-CIR, USD-Vasicek, USD-CIR), percentile tables at key horizons.

### Step 11: Nelson–Siegel Yield Curve Construction *(→ RQ10)*

**Purpose:** Construct smooth, continuous yield curves for both KHR and USD.

**Nelson–Siegel Model (1987):**

$$y(\tau) = \beta_0 + \beta_1 \cdot e^{-\tau/\lambda} + \beta_2 \cdot \frac{\tau}{\lambda} \cdot e^{-\tau/\lambda}$$

**Parameter Interpretation:**

| Parameter | Role | As $\tau \to 0$ | As $\tau \to \infty$ |
|-----------|------|:---:|:---:|
| $\beta_0$ | Level (long-term rate) | $\beta_0$ | $\beta_0$ |
| $\beta_1$ | Slope (short-term deviation) | $\beta_1$ | $0$ |
| $\beta_2$ | Curvature (medium-term hump) | $0$ | $0$ |
| $\lambda$ | Decay rate | — | — |

- Short-rate: $y(0) = \beta_0 + \beta_1$
- Long-rate: $y(\infty) = \beta_0$
- Hump location: $\tau^* = \lambda$

**Maturity Assignment (using available data):**

Since explicit bond yields at multiple maturities are not available, assign proxy maturities:

| Product | Assigned Maturity ($\tau$, years) | Rationale |
|---------|:---:|---|
| Demand Deposit | 0.08 (1 month) | Overnight / on-demand |
| Saving Deposit | 0.25 (3 months) | Short-term savings |
| Term Deposit | 1.0 (1 year) | Standard term |
| Other Deposit | 2.0 (2 years) | Longer-term products |
| Overdraft Rate | 0.5 (6 months) | Revolving short-term |
| Term Loan Rate | 5.0 (5 years) | Medium-term lending |

**Fitting Procedure:**

1. For each time period $t$ and currency $c$, collect the observed rates at the assigned maturities.
2. Fit $\beta_0, \beta_1, \beta_2, \lambda$ via **Nonlinear Least Squares (NLS)**:

$$\min_{\beta_0, \beta_1, \beta_2, \lambda} \sum_{i=1}^{N} \left[ y_i^{\text{obs}} - y(\tau_i; \beta_0, \beta_1, \beta_2, \lambda) \right]^2$$

   Use `scipy.optimize.curve_fit` or `scipy.optimize.least_squares`.

3. Constrain: $\lambda > 0$, and optionally $\beta_0 > 0$.

4. Fit separate curves for KHR and USD.

5. **Validation:** Compute $R^2$ and RMSE between fitted and observed rates.

6. **Time-varying estimation (optional):** Fit Nelson–Siegel at each time point to track how $\beta_0(t)$, $\beta_1(t)$, $\beta_2(t)$ evolve. Plot the dynamics of the level, slope, and curvature factors.

**Expected Output:** Fitted yield curve plots for KHR and USD, parameter estimates table, $R^2$/RMSE fit metrics, and optional time-varying factor dynamics.

### Step 12: Insurance Discount Rate Determination *(→ RQ11)*

**Purpose:** Derive market-consistent discount rates for insurance liability valuation.

**A. Spot Rates from Nelson–Siegel:**

Extract discount rates at each required maturity from the fitted yield curves:

$$r_t = y(t) \quad \text{for } t = 1, 2, 3, \ldots, 30 \text{ years}$$

**B. Stochastic Discount Factors from Monte Carlo:**

From Step 10 simulations, compute scenario-weighted discount factors:

$$D(0, T) = \mathbb{E}\left[\exp\left(-\int_0^T r_s\,ds\right)\right] \approx \frac{1}{N}\sum_{i=1}^{N} \exp\left(-\sum_{t=0}^{T-1} r_t^{(i)} \Delta t\right)$$

The implied discount rate at maturity $T$:

$$r^{\text{discount}}(T) = -\frac{\ln D(0,T)}{T}$$

**C. Present Value Calculation:**

$$PV = \sum_{t=1}^{T} \frac{CF_t}{(1 + r_t)^t}$$

where $r_t$ is the spot rate at maturity $t$.

**D. Sensitivity Analysis:**

Test impact of discount rate changes on a hypothetical liability portfolio:

| Scenario | Rate Shift | Purpose |
|----------|:---:|---|
| Base | 0 bps | Central estimate |
| Mild stress | ±50 bps | Moderate rate movement |
| Severe stress | ±100 bps | Significant rate shock |
| Historical worst | Min/max from data | Extreme scenario |

Report: $\Delta PV$ (change in present value) and $\Delta PV / PV$ (percentage change) for each scenario.

**E. Comparison with Regulatory Benchmarks:**
- Compare derived rates with any NBC-prescribed or IFRS 17 reference rates.
- Identify gaps between market rates and regulatory assumptions.

**F. Safe Discount Rate Recommendation:**

Recommend conservative discount rates that:
- Are based on the yield curve but with a **risk margin** (e.g., 75th percentile from Monte Carlo rather than median).
- Ensure reserves are adequate with high confidence (e.g., 90% or 95%).
- Are provided separately for KHR and USD liabilities.

**Discount Rate Summary Table (example format):**

| Maturity | KHR (Nelson–Siegel) | KHR (Stochastic) | KHR (Safe) | USD (Nelson–Siegel) | USD (Stochastic) | USD (Safe) |
|:--------:|:---:|:---:|:---:|:---:|:---:|:---:|
| 1 year | — | — | — | — | — | — |
| 5 years | — | — | — | — | — | — |
| 10 years | — | — | — | — | — | — |
| 20 years | — | — | — | — | — | — |
| 30 years | — | — | — | — | — | — |

**Expected Output:** Discount rate tables by currency and maturity, sensitivity analysis results, regulatory benchmark comparison, and final safe rate recommendations.

---
# 🔗 Integration & Collaboration Plan
---

## Timeline and Dependencies

```
Week 1–2:  Person 1 → Data collection, cleaning, stationarity tests
           ├── Deliver cleaned dataset to Person 2 and Person 3
           
Week 2–4:  Person 1 → Descriptive analysis + Spread analysis (parallel)
           Person 2 → VAR + Granger + ARIMA (parallel)
           Person 3 → Vasicek/CIR calibration + simulation (parallel)

Week 4–5:  Person 3 → Nelson–Siegel + Discount rates
           All → Review cross-references between sections

Week 5–6:  All → Integration, final report, recommendations
```

## Shared Deliverables Between Members

| From | To | Deliverable |
|:----:|:--:|-------------|
| Person 1 | Person 2 | Cleaned dataset (.csv/.pkl), stationarity test results, differencing orders |
| Person 1 | Person 3 | Cleaned dataset (.csv/.pkl), descriptive stats (mean, std dev, autocorrelation), historical rate levels |
| Person 1 | Person 2 & 3 | Structural break dates (useful context for modeling) |
| Person 2 | Person 3 | Granger causality results (inform whether to model currencies jointly or separately) |
| Person 2 | All | ARIMA forecasts (can be compared to Vasicek/CIR short-term paths) |
| Person 3 | All | Yield curves and discount rates (final product for insurance application) |

## Full Traceability Matrix

| Member | Objective | Research Question | Methodology Step | Primary Output |
|:------:|:---------:|:-----------------:|:----------------:|----------------|
| **P1** | RO1 | RQ1 | Steps 1–2 | Cleaned dataset + data quality report |
| **P1** | RO2 | RQ2 | Step 3 | Descriptive analysis + visualizations |
| **P1** | RO3 | RQ3 | Step 4 | Spread panel + decomposition + break analysis |
| **P1** | RO4 | RQ4 | Step 5 | Credit risk regression + impact assessment |
| **P2** | RO5 | RQ5 | Step 6 | VAR coefficients + IRFs + FEVD |
| **P2** | RO6 | RQ6 | Step 7 | Granger causality test results |
| **P2** | RO7 | RQ7 | Step 8 | ARIMA/SARIMA forecasts + accuracy metrics |
| **P3** | RO8 | RQ8 | Step 9 | Vasicek/CIR calibrated parameters |
| **P3** | RO9 | RQ9 | Step 10 | Monte Carlo simulations + fan charts |
| **P3** | RO10 | RQ10 | Step 11 | Nelson–Siegel yield curves |
| **P3** | RO11 | RQ11 | Step 12 | Discount rate tables + safe rate recommendations |

---

## Tools and Software (All Members)

| Tool | Person 1 | Person 2 | Person 3 |
|------|:--------:|:--------:|:--------:|
| `pandas`, `numpy` | ✅ | ✅ | ✅ |
| `matplotlib`, `seaborn`, `plotly` | ✅ | ✅ | ✅ |
| `statsmodels` | ✅ ADF, KPSS, OLS | ✅ VAR, Granger, ARIMA | — |
| `pmdarima` | — | ✅ auto_arima | — |
| `scipy.optimize` | — | — | ✅ MLE, NLS curve fitting |
| `scipy.stats` | ✅ t-test, Wilcoxon | ✅ F-test | ✅ chi-squared |
| `ruptures` | ✅ Bai–Perron | — | — |
| Custom scripts | — | — | ✅ Monte Carlo simulation |
| Jupyter Notebook | ✅ | ✅ | ✅ |

---

*This methodology provides a complete, team-assignable analytical framework with clear ownership, dependencies, and traceability from objectives through questions, methods, and deliverables.*