[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/danpele/Time-Series-Analysis/blob/main/chapter7_seminar_notebook.ipynb)

---

# Chapter 7 Seminar: Cointegration and VECM - Practice

**Course:** Time Series Analysis and Forecasting  
**Program:** Bachelor program, Faculty of Cybernetics, Statistics and Economic Informatics, Bucharest University of Economic Studies, Romania  
**Academic Year:** 2025-2026

---

## Seminar Objectives

In this practical seminar, you will:
1. Practice identifying spurious vs genuine cointegration
2. Apply Engle-Granger and Johansen tests
3. Estimate and interpret VECM models
4. Work with real financial data (Pairs Trading example)
5. Analyze PPP (Purchasing Power Parity) relationships

## Setup

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import warnings
warnings.filterwarnings('ignore')

from statsmodels.tsa.stattools import adfuller, coint
from statsmodels.tsa.vector_ar.vecm import coint_johansen, VECM
from statsmodels.tsa.api import VAR
from statsmodels.regression.linear_model import OLS
from statsmodels.tools.tools import add_constant
from statsmodels.stats.stattools import durbin_watson
from scipy import stats

try:
    import pandas_datareader.data as web
    HAS_PDR = True
except ImportError:
    HAS_PDR = False

plt.rcParams['figure.figsize'] = (12, 5)
plt.rcParams['font.size'] = 11
plt.rcParams['axes.facecolor'] = 'none'
plt.rcParams['figure.facecolor'] = 'none'
plt.rcParams['axes.grid'] = False
plt.rcParams['axes.spines.top'] = False
plt.rcParams['axes.spines.right'] = False
plt.rcParams['legend.frameon'] = False

COLORS = {'blue': '#1A3A6E', 'red': '#DC3545', 'green': '#2E7D32', 'orange': '#E67E22', 'gray': '#666666'}

print("Setup complete!")

## Exercise 1: Spurious vs Cointegrated Series

**Task:** Generate and distinguish between spurious and genuine cointegration.

In [None]:
np.random.seed(42)
n = 300

# Case A: Two INDEPENDENT random walks (spurious)
y1_spurious = np.cumsum(np.random.randn(n))
y2_spurious = np.cumsum(np.random.randn(n))

# Case B: Two COINTEGRATED series (genuine)
common_trend = np.cumsum(np.random.randn(n))
y1_coint = common_trend + np.random.randn(n) * 0.5
y2_coint = 0.7 * common_trend + np.random.randn(n) * 0.5

# Visualize
fig, axes = plt.subplots(2, 2, figsize=(14, 8))

# Spurious
axes[0, 0].plot(y1_spurious, color=COLORS['blue'], label='Y1')
axes[0, 0].plot(y2_spurious, color=COLORS['orange'], label='Y2')
axes[0, 0].set_title('Case A: Independent Random Walks', fontweight='bold')
axes[0, 0].legend()

spread_spurious = y1_spurious - y2_spurious
axes[0, 1].plot(spread_spurious, color=COLORS['red'])
axes[0, 1].set_title('Case A: "Spread" (Non-Stationary!)', fontweight='bold', color=COLORS['red'])

# Cointegrated
axes[1, 0].plot(y1_coint, color=COLORS['blue'], label='Y1')
axes[1, 0].plot(y2_coint / 0.7, color=COLORS['orange'], label='Y2 (scaled)')
axes[1, 0].set_title('Case B: Cointegrated Series', fontweight='bold')
axes[1, 0].legend()

# Estimate cointegrating coefficient
beta_est = np.polyfit(y2_coint, y1_coint, 1)[0]
spread_coint = y1_coint - beta_est * y2_coint
axes[1, 1].plot(spread_coint, color=COLORS['green'])
axes[1, 1].axhline(y=np.mean(spread_coint), color='red', linestyle='--')
axes[1, 1].set_title('Case B: Spread (Stationary!)', fontweight='bold', color=COLORS['green'])

plt.tight_layout()
plt.show()

In [None]:
# Statistical tests
print("Statistical Analysis")
print("="*70)

print("\n--- Case A: Independent Random Walks ---")
# Regression
X_sp = add_constant(y2_spurious)
reg_sp = OLS(y1_spurious, X_sp).fit()
dw_sp = durbin_watson(reg_sp.resid)

print(f"Regression: Y1 = {reg_sp.params[0]:.3f} + {reg_sp.params[1]:.3f}*Y2")
print(f"R-squared: {reg_sp.rsquared:.4f}")
print(f"Durbin-Watson: {dw_sp:.4f}")
print(f"R² > DW? {reg_sp.rsquared > dw_sp} → {'SPURIOUS!' if reg_sp.rsquared > dw_sp else 'OK'}")

# Cointegration test
stat_sp, pval_sp, _ = coint(y1_spurious, y2_spurious)
print(f"Engle-Granger test: stat={stat_sp:.3f}, p-value={pval_sp:.4f}")
print(f"Conclusion: {'Cointegrated' if pval_sp < 0.05 else 'NOT Cointegrated'}")

print("\n--- Case B: Cointegrated Series ---")
# Regression
X_co = add_constant(y2_coint)
reg_co = OLS(y1_coint, X_co).fit()
dw_co = durbin_watson(reg_co.resid)

print(f"Regression: Y1 = {reg_co.params[0]:.3f} + {reg_co.params[1]:.3f}*Y2")
print(f"R-squared: {reg_co.rsquared:.4f}")
print(f"Durbin-Watson: {dw_co:.4f}")

# Cointegration test
stat_co, pval_co, _ = coint(y1_coint, y2_coint)
print(f"Engle-Granger test: stat={stat_co:.3f}, p-value={pval_co:.4f}")
print(f"Conclusion: {'Cointegrated ✓' if pval_co < 0.05 else 'NOT Cointegrated'}")

## Exercise 2: Pairs Trading Example

**Concept:** Find cointegrated stocks, trade the mean-reverting spread.

**Strategy:**
- When spread > mean + 2σ: Sell stock A, Buy stock B
- When spread < mean - 2σ: Buy stock A, Sell stock B
- Exit when spread ≈ mean

In [None]:
# Simulate two cointegrated "stock prices" (like Coca-Cola and Pepsi)
np.random.seed(789)
n = 500

# Common market factor
market = np.cumsum(np.random.randn(n) * 0.5) + 100

# Two related stocks
stock_A = market + np.random.randn(n) * 2  # e.g., Coca-Cola
stock_B = 0.9 * market + 10 + np.random.randn(n) * 2  # e.g., Pepsi

print("Pairs Trading: Stock A and Stock B")
print("="*60)

# Test for cointegration
coint_stat, coint_pval, _ = coint(stock_A, stock_B)
print(f"Cointegration test: p-value = {coint_pval:.4f}")
print(f"Result: {'Cointegrated - Good for pairs trading!' if coint_pval < 0.05 else 'Not cointegrated'}")

In [None]:
# Estimate hedge ratio (cointegrating coefficient)
hedge_ratio = np.polyfit(stock_B, stock_A, 1)[0]
spread = stock_A - hedge_ratio * stock_B

# Trading signals
mean_spread = np.mean(spread)
std_spread = np.std(spread)

upper_band = mean_spread + 2 * std_spread
lower_band = mean_spread - 2 * std_spread

# Visualize trading strategy
fig, axes = plt.subplots(2, 1, figsize=(14, 8))

# Stock prices
axes[0].plot(stock_A, color=COLORS['blue'], label='Stock A', linewidth=1)
axes[0].plot(stock_B, color=COLORS['orange'], label='Stock B', linewidth=1)
axes[0].set_title('Cointegrated Stock Pair', fontweight='bold')
axes[0].legend(loc='upper left')
axes[0].set_ylabel('Price ($)')

# Trading spread with signals
axes[1].plot(spread, color=COLORS['green'], linewidth=1, label='Spread')
axes[1].axhline(y=mean_spread, color='black', linestyle='-', label='Mean')
axes[1].axhline(y=upper_band, color=COLORS['red'], linestyle='--', label='+2σ (Sell A, Buy B)')
axes[1].axhline(y=lower_band, color=COLORS['blue'], linestyle='--', label='-2σ (Buy A, Sell B)')
axes[1].fill_between(range(n), lower_band, upper_band, alpha=0.1, color='green')

# Mark trading signals
sell_signals = spread > upper_band
buy_signals = spread < lower_band
axes[1].scatter(np.where(sell_signals)[0], spread[sell_signals], color=COLORS['red'], s=20, alpha=0.5, zorder=5)
axes[1].scatter(np.where(buy_signals)[0], spread[buy_signals], color=COLORS['blue'], s=20, alpha=0.5, zorder=5)

axes[1].set_title('Trading Spread with Signals', fontweight='bold')
axes[1].set_xlabel('Time')
axes[1].set_ylabel('Spread')
axes[1].legend(loc='upper left', fontsize=9)

plt.tight_layout()
plt.show()

print(f"\nHedge ratio (β): {hedge_ratio:.4f}")
print(f"Mean spread: {mean_spread:.4f}")
print(f"Trading bands: [{lower_band:.2f}, {upper_band:.2f}]")
print(f"\nTrading signals: {sum(sell_signals)} sell, {sum(buy_signals)} buy")

## Exercise 3: VECM Estimation and Interpretation

**Task:** Estimate VECM and interpret adjustment coefficients.

In [None]:
# Create economic example: Consumption and Income
np.random.seed(321)
n = 200

# Income follows random walk with drift
income = np.cumsum(np.random.randn(n) * 0.5 + 0.1) + 100

# Consumption is cointegrated with income (permanent income hypothesis)
# C = 0.9 * Y + stationary component
consumption = 0.9 * income + np.random.randn(n) * 2

# Create DataFrame
data = pd.DataFrame({
    'Consumption': consumption,
    'Income': income
}, index=pd.date_range('2000-01', periods=n, freq='ME'))

print("Consumption-Income Cointegration Analysis")
print("="*60)

# Unit root tests
print("\n1. Unit Root Tests:")
for col in data.columns:
    adf = adfuller(data[col])
    print(f"   {col}: ADF = {adf[0]:.3f}, p = {adf[1]:.4f} → {'I(0)' if adf[1] < 0.05 else 'I(1)'}")

# Cointegration test
print("\n2. Cointegration Test:")
stat, pval, crit = coint(data['Consumption'], data['Income'])
print(f"   Engle-Granger: stat = {stat:.3f}, p-value = {pval:.4f}")
print(f"   Result: {'Cointegrated ✓' if pval < 0.05 else 'Not cointegrated'}")

In [None]:
# Estimate VECM
print("\n3. VECM Estimation:")
print("="*60)

vecm = VECM(data, k_ar_diff=1, coint_rank=1, deterministic='ci')
vecm_results = vecm.fit()

print(vecm_results.summary())

In [None]:
# Interpret results
print("\n4. Economic Interpretation:")
print("="*60)

alpha = vecm_results.alpha
beta = vecm_results.beta

print(f"\nCointegrating vector (β):")
print(f"   Consumption coef: {beta[0, 0]:.4f}")
print(f"   Income coef: {beta[1, 0]:.4f}")
print(f"   → Long-run: C = {-beta[1, 0]/beta[0, 0]:.4f} × Y")
print(f"   → Marginal propensity to consume ≈ {-beta[1, 0]/beta[0, 0]:.2f}")

print(f"\nAdjustment coefficients (α):")
print(f"   Consumption α: {alpha[0, 0]:.4f}")
print(f"   Income α: {alpha[1, 0]:.4f}")

print(f"\nInterpretation:")
print(f"   • When C is above equilibrium (C > 0.9Y):")
if alpha[0, 0] < 0:
    print(f"     - Consumption decreases ({abs(alpha[0, 0])*100:.1f}% adjustment/period)")
if abs(alpha[1, 0]) < 0.05:
    print(f"   • Income is weakly exogenous (does not respond to C-Y gap)")
    print(f"     - This supports: Income drives consumption, not vice versa")

In [None]:
# Visualize error correction mechanism
fig, axes = plt.subplots(2, 2, figsize=(14, 8))

# Time series
axes[0, 0].plot(data.index, data['Consumption'], color=COLORS['blue'], label='Consumption')
axes[0, 0].plot(data.index, data['Income'] * (-beta[1, 0]/beta[0, 0]), color=COLORS['orange'], 
                label=f'{-beta[1, 0]/beta[0, 0]:.2f} × Income', alpha=0.7)
axes[0, 0].set_title('Consumption and Scaled Income', fontweight='bold')
axes[0, 0].legend()

# Error correction term
ec_term = data['Consumption'].values - (-beta[1, 0]/beta[0, 0]) * data['Income'].values
axes[0, 1].plot(data.index, ec_term, color=COLORS['green'])
axes[0, 1].axhline(y=0, color='red', linestyle='--')
axes[0, 1].fill_between(data.index, 0, ec_term, where=ec_term > 0, alpha=0.3, color='red', label='Above equilibrium')
axes[0, 1].fill_between(data.index, 0, ec_term, where=ec_term < 0, alpha=0.3, color='blue', label='Below equilibrium')
axes[0, 1].set_title('Error Correction Term (C - βY)', fontweight='bold')
axes[0, 1].legend()

# Changes vs EC term
delta_c = np.diff(data['Consumption'].values)
ec_lagged = ec_term[:-1]
axes[1, 0].scatter(ec_lagged, delta_c, alpha=0.5, s=20, color=COLORS['blue'])
z = np.polyfit(ec_lagged, delta_c, 1)
axes[1, 0].plot(ec_lagged, np.poly1d(z)(ec_lagged), color='red', linewidth=2)
axes[1, 0].axhline(y=0, color='black', linestyle='-', alpha=0.3)
axes[1, 0].axvline(x=0, color='black', linestyle='-', alpha=0.3)
axes[1, 0].set_xlabel('EC term (t-1)')
axes[1, 0].set_ylabel('ΔConsumption')
axes[1, 0].set_title(f'Error Correction: α = {alpha[0, 0]:.3f}', fontweight='bold')

# IRF
irf = vecm_results.irf(20)
axes[1, 1].plot(irf.irfs[:, 0, 1], color=COLORS['blue'], linewidth=2, label='C response to Y shock')
axes[1, 1].plot(irf.irfs[:, 1, 1], color=COLORS['orange'], linewidth=2, label='Y response to Y shock')
axes[1, 1].axhline(y=0, color='black', linestyle='-', alpha=0.3)
axes[1, 1].set_xlabel('Horizon')
axes[1, 1].set_title('IRF: Response to Income Shock', fontweight='bold')
axes[1, 1].legend()

plt.tight_layout()
plt.show()

## Exercise 4: Johansen Test with 3 Variables

**Task:** Test for cointegration rank with multiple variables.

In [None]:
# Create 3-variable system with 2 cointegrating relationships
np.random.seed(654)
n = 300

# One common trend
trend = np.cumsum(np.random.randn(n))

# Three variables sharing this trend (2 cointegrating relationships possible)
y1 = trend + np.random.randn(n) * 0.5
y2 = 0.8 * trend + 5 + np.random.randn(n) * 0.5
y3 = 1.2 * trend - 3 + np.random.randn(n) * 0.5

data_3var = pd.DataFrame({'Y1': y1, 'Y2': y2, 'Y3': y3})

print("Three-Variable Cointegration Analysis")
print("="*60)

# Johansen test
johansen = coint_johansen(data_3var.values, det_order=0, k_ar_diff=1)

print("\nTrace Test:")
print(f"{'H0':>8} {'Trace Stat':>12} {'95% CV':>10} {'Reject?':>10}")
print("-"*45)
for i in range(3):
    reject = "Yes ✓" if johansen.lr1[i] > johansen.cvt[i, 1] else "No"
    print(f"r ≤ {i:>4} {johansen.lr1[i]:>12.2f} {johansen.cvt[i, 1]:>10.2f} {reject:>10}")

# Determine rank
rank = 0
for i in range(3):
    if johansen.lr1[i] > johansen.cvt[i, 1]:
        rank = i + 1
    else:
        break

print(f"\nConclusion: Cointegrating rank r = {rank}")
print(f"            Number of common trends = {3 - rank}")

In [None]:
# Visualize the 3-variable system
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# All three series
axes[0].plot(y1, color=COLORS['blue'], label='Y1', linewidth=1)
axes[0].plot(y2, color=COLORS['orange'], label='Y2', linewidth=1)
axes[0].plot(y3, color=COLORS['green'], label='Y3', linewidth=1)
axes[0].set_title('Three Cointegrated Variables', fontweight='bold')
axes[0].legend(loc='upper center', bbox_to_anchor=(0.5, -0.1), ncol=3)

# Spreads
spread1 = y1 - y2/0.8
spread2 = y1 - y3/1.2
axes[1].plot(spread1, color=COLORS['blue'], label='Y1 - Y2/0.8', linewidth=1, alpha=0.7)
axes[1].plot(spread2, color=COLORS['orange'], label='Y1 - Y3/1.2', linewidth=1, alpha=0.7)
axes[1].axhline(y=0, color='black', linestyle='--', alpha=0.5)
axes[1].set_title('Cointegrating Combinations (Stationary)', fontweight='bold')
axes[1].legend(loc='upper center', bbox_to_anchor=(0.5, -0.1), ncol=2)

plt.tight_layout()
plt.subplots_adjust(bottom=0.2)
plt.show()

## Exercise 5: Practice Problems

### Problem 1: Identify Cointegration
Given ADF statistics for residuals from a cointegrating regression with 2 variables:
- ADF = -3.50, Critical value (5%) = -3.34

**Question:** Is there cointegration at 5% level?

In [None]:
# Problem 1 Solution
print("Problem 1: Cointegration Test")
print("="*50)

adf_stat = -3.50
critical_value = -3.34

print(f"ADF Statistic: {adf_stat}")
print(f"Critical Value (5%): {critical_value}")
print(f"\nDecision: {adf_stat} < {critical_value}? {adf_stat < critical_value}")

if adf_stat < critical_value:
    print("\n✓ REJECT H0: Residuals are stationary → COINTEGRATED!")
else:
    print("\n✗ Cannot reject H0: No evidence of cointegration")

### Problem 2: VECM Interpretation
Given VECM:
$$\Delta Y_t = 0.02 - 0.15(Y_{t-1} - 2X_{t-1}) + ...$$
$$\Delta X_t = 0.01 + 0.05(Y_{t-1} - 2X_{t-1}) + ...$$

**Questions:**
1. What is the cointegrating vector?
2. What is the long-run equilibrium?
3. Which variable adjusts faster?

In [None]:
# Problem 2 Solution
print("Problem 2: VECM Interpretation")
print("="*50)

print("\n1. Cointegrating vector:")
print("   β = (1, -2) normalized on Y")
print("   Cointegrating relationship: Y - 2X = 0")

print("\n2. Long-run equilibrium:")
print("   When EC term = 0: Y = 2X")
print("   Long-run: Y is twice as large as X")

print("\n3. Adjustment speeds:")
alpha_Y = -0.15
alpha_X = 0.05
print(f"   αY = {alpha_Y} → Y adjusts by {abs(alpha_Y)*100:.0f}% per period")
print(f"   αX = {alpha_X} → X adjusts by {abs(alpha_X)*100:.0f}% per period")
print(f"\n   Y adjusts {abs(alpha_Y)/abs(alpha_X):.0f}x faster than X")

print("\n4. Economic interpretation:")
print("   When Y > 2X (positive EC term):")
print(f"   - Y decreases (α = {alpha_Y} < 0)")
print(f"   - X increases (α = {alpha_X} > 0)")
print("   → Both adjust to restore equilibrium")

### Problem 3: Weak Exogeneity Test
Given: α₁ = -0.20 (s.e. = 0.05), α₂ = 0.03 (s.e. = 0.04)

**Question:** Is variable 2 weakly exogenous at 5% level?

In [None]:
# Problem 3 Solution
print("Problem 3: Weak Exogeneity Test")
print("="*50)

alpha2 = 0.03
se2 = 0.04

t_stat = alpha2 / se2
critical_t = 1.96  # 5%, two-tailed

print(f"H0: α₂ = 0 (weak exogeneity)")
print(f"\nt-statistic: {alpha2}/{se2} = {t_stat:.2f}")
print(f"Critical value (5%): ±{critical_t}")
print(f"\n|t| = {abs(t_stat):.2f} < {critical_t}? {abs(t_stat) < critical_t}")

if abs(t_stat) < critical_t:
    print("\n✓ CANNOT REJECT H0: Variable 2 is weakly exogenous")
    print("  → Variable 2 does not respond to disequilibrium")
    print("  → Only variable 1 does the adjusting")
else:
    print("\n✗ REJECT H0: Variable 2 is NOT weakly exogenous")

## Summary

### Key Takeaways from This Seminar

1. **Spurious vs Cointegration**
   - Spurious: High R², low DW, non-stationary residuals
   - Cointegrated: Stationary spread (error correction term)

2. **Testing for Cointegration**
   - Engle-Granger: Test residuals for stationarity
   - Johansen: Sequential testing of rank

3. **VECM Interpretation**
   - β defines long-run equilibrium
   - α measures adjustment speed
   - Weak exogeneity: α = 0

4. **Applications**
   - Pairs trading: Trade mean-reverting spreads
   - Interest rates: Term structure analysis
   - Consumption-Income: Permanent income hypothesis

### Practical Workflow
1. Test for unit roots
2. If I(1), test for cointegration
3. Determine cointegrating rank (Johansen)
4. Estimate VECM with correct rank
5. Interpret α and β
6. Test weak exogeneity if needed
7. Diagnostic checks
8. Forecasting and IRF analysis