[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/danpele/Time-Series-Analysis/blob/main/chapter5_lecture_notebook.ipynb)

---

# Chapter 5: VAR Models and Granger Causality

**Course:** Time Series Analysis and Forecasting  
**Program:** Bachelor program, Faculty of Cybernetics, Statistics and Economic Informatics, Bucharest University of Economic Studies, Romania  
**Academic Year:** 2025-2026

---

## Learning Objectives

By the end of this notebook, you will be able to:
1. Understand the structure of Vector Autoregression (VAR) models
2. Estimate VAR models and select optimal lag order
3. Conduct and interpret Granger causality tests
4. Compute and interpret Impulse Response Functions (IRF)
5. Perform Forecast Error Variance Decomposition (FEVD)
6. Understand cointegration and Vector Error Correction Models (VECM)

## Setup and Imports

In [None]:
# Core libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import warnings
warnings.filterwarnings('ignore')

# VAR and multivariate time series
from statsmodels.tsa.api import VAR
from statsmodels.tsa.stattools import adfuller, grangercausalitytests, ccf
from statsmodels.tsa.vector_ar.vecm import coint_johansen, VECM
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
from scipy import stats

# Plotting style - clean, professional
plt.rcParams['figure.figsize'] = (12, 5)
plt.rcParams['font.size'] = 11
plt.rcParams['axes.facecolor'] = 'none'
plt.rcParams['figure.facecolor'] = 'none'
plt.rcParams['savefig.facecolor'] = 'none'
plt.rcParams['axes.grid'] = False
plt.rcParams['axes.spines.top'] = False
plt.rcParams['axes.spines.right'] = False

# Colors (IDA color scheme)
COLORS = {
    'blue': '#1A3A6E',
    'red': '#DC3545',
    'green': '#2E7D32',
    'orange': '#E67E22',
    'gray': '#666666'
}

print("All libraries loaded successfully!")

## 1. Introduction to Multivariate Time Series

In many applications, we have **multiple time series** that are related:
- GDP, consumption, investment, government spending
- Stock prices of related companies
- Interest rates at different maturities
- Inflation and unemployment (Phillips curve)

**Why multivariate models?**
- Capture interdependencies between variables
- Improve forecasts by using information from related series
- Analyze dynamic relationships (causality, impulse responses)

In [None]:
# Generate example macroeconomic data
np.random.seed(42)
n = 200

# Simulate a simple bivariate system (GDP growth and Inflation)
# Y1_t = 0.6*Y1_{t-1} + 0.2*Y2_{t-1} + e1_t
# Y2_t = 0.1*Y1_{t-1} + 0.5*Y2_{t-1} + e2_t

Y1 = np.zeros(n)  # GDP growth
Y2 = np.zeros(n)  # Inflation

for t in range(1, n):
    Y1[t] = 0.6 * Y1[t-1] + 0.2 * Y2[t-1] + np.random.randn() * 0.5 + 2
    Y2[t] = 0.1 * Y1[t-1] + 0.5 * Y2[t-1] + np.random.randn() * 0.3 + 1

# Create DataFrame
data = pd.DataFrame({
    'GDP_Growth': Y1,
    'Inflation': Y2
}, index=pd.date_range('2000-01', periods=n, freq='ME'))

print(f"Simulated Macro Data: {len(data)} monthly observations")
print(data.describe())

In [None]:
# Plot the multivariate time series
fig, axes = plt.subplots(2, 1, figsize=(14, 8), sharex=True)

axes[0].plot(data.index, data['GDP_Growth'], color=COLORS['blue'], linewidth=1, label='GDP Growth')
axes[0].axhline(y=data['GDP_Growth'].mean(), color='red', linestyle='--', alpha=0.5)
axes[0].set_title('GDP Growth Rate (%)', fontweight='bold')
axes[0].set_ylabel('%')
axes[0].legend(loc='upper right', frameon=False)

axes[1].plot(data.index, data['Inflation'], color=COLORS['orange'], linewidth=1, label='Inflation')
axes[1].axhline(y=data['Inflation'].mean(), color='red', linestyle='--', alpha=0.5)
axes[1].set_title('Inflation Rate (%)', fontweight='bold')
axes[1].set_xlabel('Date')
axes[1].set_ylabel('%')
axes[1].legend(loc='upper right', frameon=False)

plt.tight_layout()
plt.show()

# Correlation
print(f"\nCorrelation between GDP Growth and Inflation: {data['GDP_Growth'].corr(data['Inflation']):.4f}")

## 2. The VAR(p) Model

A **Vector Autoregression of order p**, VAR(p), for $K$ variables is:

$$\mathbf{Y}_t = \mathbf{c} + \mathbf{A}_1 \mathbf{Y}_{t-1} + \mathbf{A}_2 \mathbf{Y}_{t-2} + \cdots + \mathbf{A}_p \mathbf{Y}_{t-p} + \boldsymbol{\varepsilon}_t$$

where:
- $\mathbf{Y}_t$ is a $K \times 1$ vector of endogenous variables
- $\mathbf{c}$ is a $K \times 1$ vector of constants
- $\mathbf{A}_i$ are $K \times K$ coefficient matrices
- $\boldsymbol{\varepsilon}_t$ is a $K \times 1$ vector of white noise errors

### Number of Parameters
- Each equation has: $1 + Kp$ parameters (constant + K coefficients × p lags)
- Total system: $K(1 + Kp)$ parameters
- Plus $K(K+1)/2$ covariance parameters

In [None]:
# VAR(1) in matrix form
print("VAR(1) for 2 Variables:")
print("="*60)
print()
print("[ Y1_t ]   [ c1 ]   [ a11  a12 ] [ Y1_{t-1} ]   [ e1_t ]")
print("[      ] = [    ] + [          ] [          ] + [      ]")
print("[ Y2_t ]   [ c2 ]   [ a21  a22 ] [ Y2_{t-1} ]   [ e2_t ]")
print()
print("Written as two equations:")
print("  Y1_t = c1 + a11*Y1_{t-1} + a12*Y2_{t-1} + e1_t")
print("  Y2_t = c2 + a21*Y1_{t-1} + a22*Y2_{t-1} + e2_t")
print()
print(f"Parameters per equation: 1 + K*p = 1 + 2*1 = 3")
print(f"Total parameters: K*(1 + K*p) = 2*(1 + 2*1) = 6")

## 3. VAR Stability Condition

A VAR(p) is **stable** (stationary) if all eigenvalues of the companion matrix lie inside the unit circle:

$$|\lambda_i| < 1 \quad \text{for all } i$$

### Companion Form
Any VAR(p) can be written as a VAR(1) in companion form:

$$\mathbf{Z}_t = \mathbf{A} \mathbf{Z}_{t-1} + \mathbf{u}_t$$

where $\mathbf{Z}_t = [\mathbf{Y}_t', \mathbf{Y}_{t-1}', ..., \mathbf{Y}_{t-p+1}']'$

In [None]:
# Demonstrate stability condition
# True coefficient matrix from our simulation
A_true = np.array([[0.6, 0.2],
                   [0.1, 0.5]])

eigenvalues = np.linalg.eigvals(A_true)

print("Coefficient Matrix A:")
print(A_true)
print(f"\nEigenvalues: {eigenvalues}")
print(f"Moduli: {np.abs(eigenvalues)}")
print(f"\nStable: {all(np.abs(eigenvalues) < 1)}")

# Visualize in complex plane
fig, ax = plt.subplots(figsize=(6, 6))

# Unit circle
theta = np.linspace(0, 2*np.pi, 100)
ax.plot(np.cos(theta), np.sin(theta), 'k--', alpha=0.5, label='Unit Circle')

# Eigenvalues
ax.scatter(eigenvalues.real, eigenvalues.imag, s=200, c=COLORS['red'], 
           marker='x', linewidths=3, label='Eigenvalues')

ax.set_xlim(-1.5, 1.5)
ax.set_ylim(-1.5, 1.5)
ax.set_aspect('equal')
ax.axhline(y=0, color='gray', linewidth=0.5)
ax.axvline(x=0, color='gray', linewidth=0.5)
ax.set_xlabel('Real')
ax.set_ylabel('Imaginary')
ax.set_title('VAR Stability: Eigenvalues Inside Unit Circle', fontweight='bold')
ax.legend(loc='upper left', frameon=False)

plt.tight_layout()
plt.show()

## 4. Estimating VAR Models

In [None]:
# Fit VAR model
model = VAR(data)

# Select optimal lag order
lag_selection = model.select_order(maxlags=8)
print("Lag Order Selection Criteria:")
print(lag_selection.summary())

In [None]:
# Fit VAR(1) based on BIC
results = model.fit(1)
print(results.summary())

In [None]:
# Compare estimated vs true coefficients
print("Coefficient Comparison:")
print("="*50)

print("\nTrue A matrix:")
print(A_true)

print("\nEstimated A matrix:")
A_hat = results.coefs[0]
print(A_hat)

print(f"\nMaximum absolute error: {np.max(np.abs(A_true - A_hat)):.4f}")

## 5. Granger Causality

**Granger causality** tests whether lagged values of one variable help predict another.

### Definition
$X$ **Granger-causes** $Y$ if:
- Past values of $X$ contain information useful for predicting $Y$
- Beyond what is already contained in past values of $Y$ itself

### The Test
In a VAR with $Y$ and $X$:
$$Y_t = c + \sum_{i=1}^p \alpha_i Y_{t-i} + \sum_{i=1}^p \beta_i X_{t-i} + \varepsilon_t$$

Test $H_0: \beta_1 = \beta_2 = \cdots = \beta_p = 0$ (X does NOT Granger-cause Y)

### Important Caveats
- Granger causality ≠ true causality
- May be spurious due to omitted variables
- Sensitive to lag selection

In [None]:
# Granger causality tests
print("Granger Causality Tests")
print("="*60)

# Test: Does Inflation Granger-cause GDP Growth?
print("\n1. H₀: Inflation does NOT Granger-cause GDP Growth")
print("-"*50)
gc_infl_to_gdp = grangercausalitytests(data[['GDP_Growth', 'Inflation']], maxlag=4, verbose=True)

In [None]:
# Test: Does GDP Growth Granger-cause Inflation?
print("\n2. H₀: GDP Growth does NOT Granger-cause Inflation")
print("-"*50)
gc_gdp_to_infl = grangercausalitytests(data[['Inflation', 'GDP_Growth']], maxlag=4, verbose=True)

In [None]:
# Summary of Granger causality
print("\nGranger Causality Summary (at lag 1):")
print("="*50)

p_infl_to_gdp = gc_infl_to_gdp[1][0]['ssr_ftest'][1]
p_gdp_to_infl = gc_gdp_to_infl[1][0]['ssr_ftest'][1]

print(f"Inflation → GDP: p-value = {p_infl_to_gdp:.4f} {'✓ Significant' if p_infl_to_gdp < 0.05 else '✗ Not significant'}")
print(f"GDP → Inflation: p-value = {p_gdp_to_infl:.4f} {'✓ Significant' if p_gdp_to_infl < 0.05 else '✗ Not significant'}")

print("\nInterpretation:")
if p_infl_to_gdp < 0.05 and p_gdp_to_infl < 0.05:
    print("  Bidirectional causality (feedback)")
elif p_infl_to_gdp < 0.05:
    print("  Inflation Granger-causes GDP (unidirectional)")
elif p_gdp_to_infl < 0.05:
    print("  GDP Granger-causes Inflation (unidirectional)")
else:
    print("  No Granger causality detected")

## 6. Impulse Response Functions (IRF)

**Impulse Response Functions** trace the effect of a one-time shock to one variable on all variables over time.

$$\text{IRF}_{ij}(h) = \frac{\partial Y_{i,t+h}}{\partial \varepsilon_{j,t}}$$

### Key Properties
- Shows dynamic multipliers
- For stable VAR: IRF → 0 as h → ∞
- Requires identification (e.g., Cholesky ordering)

In [None]:
# Compute IRFs
irf = results.irf(20)

# Plot IRFs
fig, axes = plt.subplots(2, 2, figsize=(14, 10))

# Response to GDP shock
axes[0, 0].plot(irf.irfs[:, 0, 0], color=COLORS['blue'], linewidth=2, label='IRF')
axes[0, 0].fill_between(range(21), 
                        irf.irfs[:, 0, 0] - 1.96*irf.stderr()[:, 0, 0],
                        irf.irfs[:, 0, 0] + 1.96*irf.stderr()[:, 0, 0], 
                        alpha=0.2, color=COLORS['blue'], label='95% CI')
axes[0, 0].axhline(y=0, color='black', linestyle='-', alpha=0.3)
axes[0, 0].set_title('GDP → GDP', fontweight='bold')
axes[0, 0].legend(loc='upper right', frameon=False)

axes[0, 1].plot(irf.irfs[:, 1, 0], color=COLORS['blue'], linewidth=2, label='IRF')
axes[0, 1].fill_between(range(21), 
                        irf.irfs[:, 1, 0] - 1.96*irf.stderr()[:, 1, 0],
                        irf.irfs[:, 1, 0] + 1.96*irf.stderr()[:, 1, 0], 
                        alpha=0.2, color=COLORS['blue'], label='95% CI')
axes[0, 1].axhline(y=0, color='black', linestyle='-', alpha=0.3)
axes[0, 1].set_title('GDP → Inflation', fontweight='bold')
axes[0, 1].legend(loc='upper right', frameon=False)

# Response to Inflation shock
axes[1, 0].plot(irf.irfs[:, 0, 1], color=COLORS['orange'], linewidth=2, label='IRF')
axes[1, 0].fill_between(range(21), 
                        irf.irfs[:, 0, 1] - 1.96*irf.stderr()[:, 0, 1],
                        irf.irfs[:, 0, 1] + 1.96*irf.stderr()[:, 0, 1], 
                        alpha=0.2, color=COLORS['orange'], label='95% CI')
axes[1, 0].axhline(y=0, color='black', linestyle='-', alpha=0.3)
axes[1, 0].set_title('Inflation → GDP', fontweight='bold')
axes[1, 0].set_xlabel('Horizon (months)')
axes[1, 0].legend(loc='upper right', frameon=False)

axes[1, 1].plot(irf.irfs[:, 1, 1], color=COLORS['orange'], linewidth=2, label='IRF')
axes[1, 1].fill_between(range(21), 
                        irf.irfs[:, 1, 1] - 1.96*irf.stderr()[:, 1, 1],
                        irf.irfs[:, 1, 1] + 1.96*irf.stderr()[:, 1, 1], 
                        alpha=0.2, color=COLORS['orange'], label='95% CI')
axes[1, 1].axhline(y=0, color='black', linestyle='-', alpha=0.3)
axes[1, 1].set_title('Inflation → Inflation', fontweight='bold')
axes[1, 1].set_xlabel('Horizon (months)')
axes[1, 1].legend(loc='upper right', frameon=False)

plt.tight_layout()
plt.show()

print("IRF Interpretation:")
print("- Own shocks have immediate impact, then decay")
print("- Cross shocks show spillover effects")
print("- All responses converge to 0 (stable VAR)")

## 7. Forecast Error Variance Decomposition (FEVD)

**FEVD** decomposes the variance of forecast errors into contributions from each shock.

$$\text{FEVD}_{ij}(h) = \frac{\text{Variance of } Y_i \text{ due to shock } j}{\text{Total variance of } Y_i}$$

### Interpretation
- Shows relative importance of each shock
- At h=0: own shock explains 100%
- As h increases: shows long-run importance

In [None]:
# Compute FEVD
fevd = results.fevd(20)

# Plot FEVD
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# FEVD for GDP
axes[0].stackplot(range(21), 
                  fevd.decomp[:, 0, 0]*100, 
                  fevd.decomp[:, 0, 1]*100,
                  labels=['GDP shock', 'Inflation shock'],
                  colors=[COLORS['blue'], COLORS['orange']], alpha=0.7)
axes[0].set_title('FEVD of GDP Growth', fontweight='bold')
axes[0].set_xlabel('Horizon (months)')
axes[0].set_ylabel('Percent')
axes[0].legend(loc='center right', frameon=False)
axes[0].set_ylim(0, 100)

# FEVD for Inflation
axes[1].stackplot(range(21), 
                  fevd.decomp[:, 1, 0]*100, 
                  fevd.decomp[:, 1, 1]*100,
                  labels=['GDP shock', 'Inflation shock'],
                  colors=[COLORS['blue'], COLORS['orange']], alpha=0.7)
axes[1].set_title('FEVD of Inflation', fontweight='bold')
axes[1].set_xlabel('Horizon (months)')
axes[1].set_ylabel('Percent')
axes[1].legend(loc='center right', frameon=False)
axes[1].set_ylim(0, 100)

plt.tight_layout()
plt.show()

# Print table
print("\nFEVD Table (%)")
print("="*70)
print(f"{'Horizon':<10} {'GDP by GDP':>12} {'GDP by Infl':>12} {'Infl by GDP':>12} {'Infl by Infl':>12}")
print("-"*70)
for h in [1, 5, 10, 20]:
    print(f"{h:<10} {fevd.decomp[h, 0, 0]*100:>12.1f} {fevd.decomp[h, 0, 1]*100:>12.1f} "
          f"{fevd.decomp[h, 1, 0]*100:>12.1f} {fevd.decomp[h, 1, 1]*100:>12.1f}")

## 8. VAR Forecasting

In [None]:
# Generate forecasts
forecast_steps = 12
lag_order = results.k_ar

forecast = results.forecast(data.values[-lag_order:], steps=forecast_steps)
forecast_interval = results.forecast_interval(data.values[-lag_order:], steps=forecast_steps, alpha=0.05)

# Create forecast dates
forecast_dates = pd.date_range(start=data.index[-1] + pd.DateOffset(months=1), 
                               periods=forecast_steps, freq='ME')

# Plot
fig, axes = plt.subplots(2, 1, figsize=(14, 8), sharex=True)

# GDP Growth
axes[0].plot(data.index[-36:], data['GDP_Growth'].values[-36:], 
             color=COLORS['blue'], linewidth=1.5, label='Historical')
axes[0].plot(forecast_dates, forecast[:, 0], 
             color=COLORS['red'], linewidth=2, linestyle='--', label='Forecast')
axes[0].fill_between(forecast_dates, forecast_interval[1][:, 0], forecast_interval[2][:, 0],
                     color=COLORS['red'], alpha=0.2, label='95% CI')
axes[0].axvline(x=data.index[-1], color='black', linestyle='-', alpha=0.3)
axes[0].set_title('GDP Growth Forecast', fontweight='bold')
axes[0].set_ylabel('%')
axes[0].legend(loc='upper right', frameon=False)

# Inflation
axes[1].plot(data.index[-36:], data['Inflation'].values[-36:], 
             color=COLORS['orange'], linewidth=1.5, label='Historical')
axes[1].plot(forecast_dates, forecast[:, 1], 
             color=COLORS['red'], linewidth=2, linestyle='--', label='Forecast')
axes[1].fill_between(forecast_dates, forecast_interval[1][:, 1], forecast_interval[2][:, 1],
                     color=COLORS['red'], alpha=0.2, label='95% CI')
axes[1].axvline(x=data.index[-1], color='black', linestyle='-', alpha=0.3)
axes[1].set_title('Inflation Forecast', fontweight='bold')
axes[1].set_ylabel('%')
axes[1].set_xlabel('Date')
axes[1].legend(loc='upper right', frameon=False)

plt.tight_layout()
plt.show()

# Forecast table
print("\nForecast Summary:")
print("="*70)
print(f"{'Date':<12} {'GDP':>10} {'GDP 95% CI':>20} {'Inflation':>10} {'Infl 95% CI':>20}")
print("-"*70)
for i in range(min(6, forecast_steps)):
    print(f"{str(forecast_dates[i].date()):<12} {forecast[i, 0]:>10.2f} "
          f"[{forecast_interval[1][i, 0]:>6.2f}, {forecast_interval[2][i, 0]:>6.2f}] "
          f"{forecast[i, 1]:>10.2f} [{forecast_interval[1][i, 1]:>6.2f}, {forecast_interval[2][i, 1]:>6.2f}]")

## 9. Model Diagnostics

In [None]:
# Residual diagnostics
residuals = results.resid

fig, axes = plt.subplots(2, 2, figsize=(14, 10))

# Residuals over time
axes[0, 0].plot(residuals[:, 0], color=COLORS['blue'], alpha=0.7, linewidth=0.5, label='GDP residuals')
axes[0, 0].plot(residuals[:, 1], color=COLORS['orange'], alpha=0.7, linewidth=0.5, label='Inflation residuals')
axes[0, 0].axhline(y=0, color='black', linestyle='--', alpha=0.5)
axes[0, 0].set_title('Residuals Over Time', fontweight='bold')
axes[0, 0].legend(loc='upper right', frameon=False)

# Cross-correlation of residuals
axes[0, 1].scatter(residuals[:, 0], residuals[:, 1], alpha=0.5, color=COLORS['blue'], s=20)
axes[0, 1].set_xlabel('GDP Residuals')
axes[0, 1].set_ylabel('Inflation Residuals')
axes[0, 1].set_title('Residual Cross-Plot', fontweight='bold')
corr = np.corrcoef(residuals[:, 0], residuals[:, 1])[0, 1]
axes[0, 1].text(0.05, 0.95, f'Corr = {corr:.3f}', transform=axes[0, 1].transAxes, 
                fontsize=12, verticalalignment='top')

# ACF of residuals
plot_acf(residuals[:, 0], ax=axes[1, 0], lags=20, color=COLORS['blue'], title='ACF: GDP Residuals')
axes[1, 0].set_title('ACF: GDP Residuals', fontweight='bold')

plot_acf(residuals[:, 1], ax=axes[1, 1], lags=20, color=COLORS['orange'], title='ACF: Inflation Residuals')
axes[1, 1].set_title('ACF: Inflation Residuals', fontweight='bold')

plt.tight_layout()
plt.show()

# Portmanteau test
print("\nPortmanteau Test for Residual Autocorrelation:")
print(results.test_whiteness(nlags=12).summary())

## 10. Cointegration and VECM

When variables are **I(1)** (non-stationary) but share a long-run equilibrium, they are **cointegrated**.

### Vector Error Correction Model (VECM)
$$\Delta \mathbf{Y}_t = \boldsymbol{\Pi} \mathbf{Y}_{t-1} + \sum_{i=1}^{p-1} \boldsymbol{\Gamma}_i \Delta \mathbf{Y}_{t-i} + \boldsymbol{\varepsilon}_t$$

where $\boldsymbol{\Pi} = \boldsymbol{\alpha} \boldsymbol{\beta}'$ contains:
- $\boldsymbol{\beta}$: cointegrating vectors (long-run relationships)
- $\boldsymbol{\alpha}$: adjustment speeds (error correction)

In [None]:
# Generate cointegrated data
np.random.seed(123)
n = 200

# Random walk
x = np.cumsum(np.random.randn(n)) + 50

# Cointegrated with x (long-run relationship: y = 2x + noise)
y = 2 * x + np.random.randn(n) * 2 + 10

coint_data = pd.DataFrame({'X': x, 'Y': y})

# Plot
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

axes[0].plot(coint_data['X'], color=COLORS['blue'], label='X (random walk)')
axes[0].plot(coint_data['Y'], color=COLORS['orange'], label='Y (cointegrated with X)')
axes[0].set_title('Two Cointegrated Series', fontweight='bold')
axes[0].legend(loc='upper left', frameon=False)

# Spread (should be stationary)
spread = coint_data['Y'] - 2 * coint_data['X']
axes[1].plot(spread, color=COLORS['green'])
axes[1].axhline(y=spread.mean(), color='red', linestyle='--')
axes[1].set_title('Spread: Y - 2X (Stationary)', fontweight='bold')

plt.tight_layout()
plt.show()

# ADF tests
print("\nUnit Root Tests:")
print("="*50)
for col in ['X', 'Y']:
    result = adfuller(coint_data[col])
    print(f"{col}: ADF = {result[0]:.3f}, p-value = {result[1]:.4f} → {'Stationary' if result[1] < 0.05 else 'Non-stationary'}")

result = adfuller(spread)
print(f"Spread: ADF = {result[0]:.3f}, p-value = {result[1]:.4f} → {'Stationary' if result[1] < 0.05 else 'Non-stationary'}")

In [None]:
# Johansen cointegration test
print("\nJohansen Cointegration Test:")
print("="*60)

johansen_result = coint_johansen(coint_data, det_order=0, k_ar_diff=1)

print("\nTrace Statistics:")
print(f"{'Rank':>6} {'Trace Stat':>12} {'Crit 95%':>12} {'Crit 99%':>12}")
print("-"*50)
for i in range(2):
    sig = " **" if johansen_result.lr1[i] > johansen_result.cvt[i, 1] else ""
    print(f"{i:>6} {johansen_result.lr1[i]:>12.2f} {johansen_result.cvt[i, 1]:>12.2f} "
          f"{johansen_result.cvt[i, 2]:>12.2f}{sig}")

print("\nConclusion: Reject rank=0, cannot reject rank=1")
print("→ There is 1 cointegrating relationship")

In [None]:
# Fit VECM
vecm = VECM(coint_data, k_ar_diff=1, coint_rank=1, deterministic='ci')
vecm_results = vecm.fit()

print("\nVECM Estimation Results:")
print("="*60)
print(vecm_results.summary())

## 11. Cross-Correlation Function

In [None]:
# Cross-correlation between GDP and Inflation
from scipy import signal

# Compute CCF
x = data['GDP_Growth'].values
y = data['Inflation'].values

# Normalize
x_norm = (x - np.mean(x)) / np.std(x)
y_norm = (y - np.mean(y)) / np.std(y)

# Cross-correlation
ccf_values = np.correlate(x_norm, y_norm, mode='full') / len(x)
lags = np.arange(-len(x)+1, len(x))

# Plot
fig, ax = plt.subplots(figsize=(12, 5))

# Only show lags -20 to 20
mask = (lags >= -20) & (lags <= 20)
ax.stem(lags[mask], ccf_values[mask], linefmt=COLORS['blue'], markerfmt='o', basefmt=' ')
ax.axhline(y=0, color='black', linestyle='-')
ax.axhline(y=1.96/np.sqrt(len(x)), color='red', linestyle='--', alpha=0.5)
ax.axhline(y=-1.96/np.sqrt(len(x)), color='red', linestyle='--', alpha=0.5)
ax.axvline(x=0, color='gray', linestyle=':', alpha=0.5)

ax.set_xlabel('Lag (k)')
ax.set_ylabel('Cross-Correlation')
ax.set_title('Cross-Correlation: GDP Growth and Inflation', fontweight='bold')

plt.tight_layout()
plt.show()

print("Interpretation:")
print("- Positive lag k: GDP leads Inflation by k periods")
print("- Negative lag k: Inflation leads GDP by k periods")
print("- Peak correlation shows dominant lead-lag relationship")

## Summary

### Key Takeaways

1. **VAR models** capture dynamic interdependencies between multiple time series
   - Each variable depends on its own lags AND lags of other variables
   - OLS estimation is efficient with identical regressors

2. **Stability** requires all eigenvalues inside the unit circle
   - Ensures stationarity and convergent impulse responses

3. **Granger causality** tests predictive content, not true causation
   - Useful for understanding lead-lag relationships
   - Sensitive to lag selection and omitted variables

4. **Impulse Response Functions** trace shock propagation
   - Requires identification (Cholesky, structural restrictions)
   - Shows dynamic multipliers over time

5. **FEVD** decomposes forecast variance by shock source
   - Shows relative importance of different shocks

6. **Cointegration** exists when I(1) variables share long-run equilibrium
   - Use VECM to model both short-run and long-run dynamics
   - Johansen test for testing cointegration rank

### Practical Workflow
1. Check stationarity (unit root tests)
2. Select lag order (information criteria)
3. Estimate VAR or VECM
4. Diagnostic checks (residual autocorrelation)
5. Granger causality tests
6. IRF and FEVD analysis
7. Forecasting