<a href="https://colab.research.google.com/github/NikosAng/UEA-macro-lectures/blob/main/ueamacro.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
# Colab bootstrap: install dependencies
!pip install pandas numpy matplotlib statsmodels arch fredapi pandas-datareader tqdm


## 1  Setup & Data Load

In this first section we:

1. Import required libraries.  
2. Pull our key series (Real GDP, Potential GDP, CPI, IP, Fed Funds).  
3. Provide a quick preview of the data.  
4. Define helper routines for subsequent plots and tests.


In [None]:
# 1.2 — Imports
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from pandas_datareader import data as web

# 1.3 — Date range
start = "1960-01-01"
end   = pd.Timestamp.today().strftime("%Y-%m-%d")

# 1.4 — FRED tickers dictionary
tickers = {
    "GDP":    "GDPC1",      # Real GDP (qtrly, SA)
    "POT_GDP":"GDPPOT",     # Potential GDP
    "CPI":    "CPIAUCSL",   # CPI all items
    "IP":     "INDPRO",     # Industrial production index
    "FFR":    "FEDFUNDS"    # Effective fed funds rate
}

# 1.5 — Download into a single DataFrame
data = pd.concat(
    {name: web.DataReader(code, "fred", start, end)
     for name, code in tickers.items()},
    axis=1
)

# 1.6 — Quick preview hhhhdddddd
data.tail()


## 2 Visual Diagnostics

We start by looking at basic time-series plots of real GDP, its growth rate, and simple diagnostics.

* **Level vs. growth** – visual cue for I(1) vs I(0).  
* **ACF / PACF** – check autocorrelation structure.  
* **Rolling variance** – see whether volatility is constant or trending.


In [None]:
# --- Section 1: Load demo GDP series ---------------------------------
import pandas as pd, numpy as np, matplotlib.pyplot as plt
from statsmodels.datasets import macrodata

demo = macrodata.load_pandas().data        # quarterly 1959-2009 sample
dates = pd.date_range("1959-01-01", periods=len(demo), freq="QS")
gdp   = pd.Series(demo["realgdp"].values, index=dates, name="GDP")

gdp.tail()      # quick preview


fig, axes = plt.subplots(2, 1, figsize=(9, 6), sharex=True)

# Log level
np.log(gdp).plot(ax=axes[0], title="Log Real GDP Level")
axes[0].set_ylabel("Log Level")

# % growth (annualised)
growth = np.log(gdp).diff().mul(100)
growth.plot(ax=axes[1], title="Real GDP Growth Rate (%, q/q)")
axes[1].set_ylabel("Pct. Change")

plt.tight_layout()



In [None]:
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf

fig, ax = plt.subplots(1, 2, figsize=(12, 4))
plot_acf(growth.dropna(), lags=40, ax=ax[0])
ax[0].set_title("ACF: GDP Growth")
plot_pacf(growth.dropna(), lags=40, ax=ax[1])
ax[1].set_title("PACF: GDP Growth")
plt.tight_layout()


In [None]:
fig, ax = plt.subplots(figsize=(8, 3))
rolling_var = growth.rolling(window=40).var()
rolling_var.plot(ax=ax)
ax.set_title("Rolling Variance of GDP Growth (40-quarter window)")
ax.set_ylabel("Variance")
ax.set_xlabel("Date")
plt.tight_layout()


## 3 Unit-Root Testing Suite

We apply four complementary tests to **log GDP** and **GDP growth**:

| Test | Null \(H₀\) | Alternative | Note |
|------|-------------|-------------|------|
| ADF          | Unit root | Stationary | Parametric; low power near 1 |
| Phillips–Perron | Unit root | Stationary | Non-parametric error correction |
| DF-GLS       | Unit root | Stationary | GLS-detrended, higher power |
| KPSS         | (Trend-)Stationary | Unit root | Opposite null; useful cross-check |

Interpreting results:  
*Reject the null* in ADF/PP/DF-GLS → evidence of stationarity.  
*Reject the null* in KPSS → evidence of a unit root.


In [None]:
# Install arch for unit‐root tests
!pip install arch


In [None]:
from arch.unitroot import ADF, PhillipsPerron, DFGLS, KPSS
import pandas as pd

def unit_root_tests(series, lags=4, trend='ct'):
    """Return test statistic & p-value for four unit-root tests."""
    s = series.dropna()
    return {
        'ADF'   : (ADF(s, lags=lags, trend=trend).stat,
                   ADF(s, lags=lags, trend=trend).pvalue),
        'PP'    : (PhillipsPerron(s, lags=lags, trend=trend).stat,
                   PhillipsPerron(s, lags=lags, trend=trend).pvalue),
        'DF-GLS': (DFGLS(s, lags=lags, trend=trend).stat,
                   DFGLS(s, lags=lags, trend=trend).pvalue),
        'KPSS'  : (KPSS(s, lags=lags, trend=trend).stat,
                   KPSS(s, lags=lags, trend=trend).pvalue)
    }



# Log GDP level (quarterly)
lvl_res = pd.DataFrame.from_dict(
    unit_root_tests(np.log(gdp), lags=4, trend='ct'),
    orient='index', columns=['Statistic', 'P-value']
)

# GDP growth (first difference)
grw_res = pd.DataFrame.from_dict(
    unit_root_tests(np.log(gdp).diff().dropna(), lags=4, trend='c'),
    orient='index', columns=['Statistic', 'P-value']
)

print("=== Log GDP Level ===")
display(lvl_res)

print("\n=== GDP Growth ===")
display(grw_res)






**Typical pattern (US GDP sample):**

* Log level – ADF, PP, DF-GLS *fail* to reject  \(p > 0.10\), while KPSS
  *rejects* the stationarity null ⇒ evidence points to a **unit root (I(1))**.

* Growth rate – ADF/PP/DF-GLS all *reject* the unit-root null, KPSS
  *does not reject* stationarity ⇒ growth is **stationary (I(0))**.

This matches our visual impression from Section 2.


## 4 Persistence Measures

We’ll quantify how long shocks to GDP growth persist via:

1. **Half‐life** of an AR(1) process  
2. **Impulse‐Response Function (IRF)** of that AR(1)  
3. **Variance‐Ratio (VR)** statistic at various horizons


In [None]:
import statsmodels.api as sm

# 4.1 — Fit AR(1) to GDP growth
y = growth.dropna()  # from Section 2: growth = np.log(gdp).diff().mul(100)
ar1 = sm.tsa.ARIMA(y, order=(1, 0, 0)).fit()
phi = ar1.params.get('ar.L1', ar1.params[1])  # name differs by statsmodels version

# Compute half‐life
hl = np.log(0.5) / np.log(abs(phi))
print(f"Estimated φ = {phi:.3f}")
print(f"Half‐life ≈ {hl:.1f} quarters")

# IRF plot
irf_horizon = 20
irf = [phi**h for h in range(irf_horizon)]
plt.figure(figsize=(6,3))
plt.plot(range(irf_horizon), irf, marker='o')
plt.title("AR(1) Impulse‐Response Function")
plt.xlabel("h quarters")
plt.ylabel("Response to 1-unit shock")
plt.tight_layout()
plt.show()





# 4.2 — Variance‐Ratio function
def variance_ratio(series, k):
    num = series.diff(k).var()
    den = k * series.diff(1).var()
    return num / den

# Compute VR at several horizons
ks = [1, 4, 8, 12, 20]
vr_values = [variance_ratio(y, k) for k in ks]

# Plot VR(k)
plt.figure(figsize=(6,3))
plt.plot(ks, vr_values, marker='o')
plt.title("Variance Ratio vs. Horizon k")
plt.xlabel("k (quarters)")
plt.ylabel("VR(k)")
plt.xticks(ks)
plt.tight_layout()
plt.show()

# Display the numbers
import pandas as pd
pd.Series(vr_values, index=ks, name="VR(k)")








## 5 Trend/Cycle Decomposition

We’ll compare two approaches:

1. **Hodrick–Prescott (HP) filter** – purely statistical smoothing  
2. **Beveridge–Nelson (BN)** via a local‐level structural model  
   (trend = long-run forecast; cycle = deviation)


In [None]:
from statsmodels.tsa.filters.hp_filter import hpfilter

# Apply HP filter to log GDP
log_gdp = np.log(gdp)
cycle_hp, trend_hp = hpfilter(log_gdp, lamb=1600)

# Plot
fig, ax = plt.subplots(figsize=(9,4))
trend_hp.plot(ax=ax, label="HP Trend")
cycle_hp.plot(ax=ax, label="HP Cycle (dashed)", linestyle="--")
ax.set_title("HP Filter Decomposition")
ax.legend()
plt.tight_layout()
plt.show()



from statsmodels.tsa.statespace.structural import UnobservedComponents

# Fit local‐level model to log GDP
mod = UnobservedComponents(log_gdp, level='local level')
res = mod.fit(disp=False)

# Extract smoothed trend (state 0) and cycle
bn_trend  = pd.Series(res.smoothed_state[0], index=log_gdp.index)
bn_cycle  = log_gdp - bn_trend

# Compare HP vs BN
fig, axes = plt.subplots(2,1,figsize=(9,6), sharex=True)

axes[0].plot(trend_hp, label="HP Trend")
axes[0].plot(bn_trend, label="BN Trend", linestyle='--')
axes[0].set_title("HP vs BN: Trend")
axes[0].legend()

axes[1].plot(cycle_hp, label="HP Cycle")
axes[1].plot(bn_cycle, label="BN Cycle", linestyle='--')
axes[1].set_title("HP vs BN: Cycle")
axes[1].legend()

plt.tight_layout()
plt.show()


## 6 Cointegration & ECM

We’ll explore long‐run equilibria between I(1) series using:

1. **Engle‐Granger two‐step**: regress level on potential regressor, ADF test on residuals.  
2. **Johansen method**: VAR‐based test for cointegration rank, then VECM estimation.  
3. **Error‐Correction Model**: connect short‐run dynamics to long‐run disequilibrium.


In [None]:
import pandas as pd
import numpy as np
from statsmodels.tsa.vector_ar.vecm import coint_johansen, VECM
from arch.unitroot import ADF

# 6.2.1 — Prepare two distinct I(1) series: GDP & Consumption
log_gdp = np.log(gdp)
# demo["realcons"] is real consumption from statsmodels’ macrodata
realcons = pd.Series(demo["realcons"].values, index=dates, name="CONS")
log_cons = np.log(realcons)

df_coint = pd.concat([log_gdp, log_cons], axis=1).dropna()

# 6.2.2 — Engle‐Granger two‐step
import statsmodels.api as sm
resid = sm.OLS(df_coint["GDP"], sm.add_constant(df_coint["CONS"])).fit().resid
eg_adf = ADF(resid, lags=4, trend='c')
print("Engle‐Granger Residual ADF:")
print(f"  Statistic = {eg_adf.stat:.3f}, p-value = {eg_adf.pvalue:.3f}")
print("  → Stationary residual ⇒ cointegration"
      if eg_adf.pvalue < 0.05 else "  → No cointegration detected")

# 6.2.3 — Johansen test
from statsmodels.tsa.vector_ar.vecm import coint_johansen
jres = coint_johansen(df_coint, det_order=0, k_ar_diff=1)
print("\nJohansen Trace Statistics:")
for r, (stat, crit) in enumerate(zip(jres.lr1, jres.cvt[:,1])):
    print(f" r ≤ {r}: {stat:.2f} (5% crit = {crit:.2f})")

# 6.2.4 — If rank=1, estimate VECM
if jres.lr1[0] > jres.cvt[0,1]:
    from statsmodels.tsa.vector_ar.vecm import VECM
    vecm = VECM(df_coint, k_ar_diff=1, coint_rank=1, deterministic='ci').fit()
    print("\nVECM adjustment speeds (alpha):")
    display(vecm.alpha)
