# 02 VAR and Impulse Responses

Fit VARs, test Granger causality, and interpret IRFs.


## Table of Contents
- [Build stationary dataset](#build-stationary-dataset)
- [Fit VAR + choose lags](#fit-var-choose-lags)
- [Granger causality](#granger-causality)
- [IRFs + forecasting](#irfs-forecasting)
- [Checkpoint (Self-Check)](#checkpoint-self-check)
- [Solutions (Reference)](#solutions-reference)


## Why This Notebook Matters
Time-series econometrics notebooks build the classical toolkit you need before trusting macro regressions:
- stationarity + unit roots,
- cointegration + error correction,
- VAR dynamics and impulse responses.


## What You Will Produce
- (no file output; learning/analysis notebook)

## Success Criteria
- You can explain what you built and why each step exists.
- You can run your work end-to-end without undefined variables.

## Common Pitfalls
- Running cells top-to-bottom without reading the instructions.
- Leaving `...` placeholders in code cells.
- Running tests without plotting or transforming the series first.
- Treating impulse responses as structural causality without an identification story.

## Matching Guide
- `docs/guides/08_time_series_econ/02_var_impulse_responses.md`



## How To Use This Notebook
- This notebook is hands-on. Most code cells are incomplete on purpose.
- Complete each TODO, then run the cell.
- Use the matching guide (`docs/guides/08_time_series_econ/02_var_impulse_responses.md`) for deep explanations and alternative examples.
- Write short interpretation notes as you go (what changed, why it matters).



<a id="environment-bootstrap"></a>
## Environment Bootstrap
Run this cell first. It makes the repo importable and defines common directories.



In [None]:
from __future__ import annotations

from pathlib import Path
import sys


def find_repo_root(start: Path) -> Path:
    p = start
    for _ in range(8):
        if (p / 'src').exists() and (p / 'docs').exists():
            return p
        p = p.parent
    raise RuntimeError('Could not find repo root. Start Jupyter from the repo root.')


PROJECT_ROOT = find_repo_root(Path.cwd())
if str(PROJECT_ROOT) not in sys.path:
    sys.path.append(str(PROJECT_ROOT))

DATA_DIR = PROJECT_ROOT / 'data'
RAW_DIR = DATA_DIR / 'raw'
PROCESSED_DIR = DATA_DIR / 'processed'
SAMPLE_DIR = DATA_DIR / 'sample'

PROJECT_ROOT



## Goal
Fit a VAR on transformed macro series and interpret:
- lag selection
- Granger causality
- impulse response functions (IRFs)



## Primer: Classical time-series econometrics with statsmodels (ADF/KPSS, VAR)

This repo already uses time-aware evaluation for ML.
This primer introduces the “classical” time-series econometrics toolkit in `statsmodels`.

### Stationarity and unit roots (ADF / KPSS)
Two common tests:
- **ADF**: null = unit root (nonstationary)
- **KPSS**: null = stationary

```python
from statsmodels.tsa.stattools import adfuller, kpss

# x is a 1D array-like (no missing)
# adf_stat, adf_p, *_ = adfuller(x)
# kpss_stat, kpss_p, *_ = kpss(x, regression='c', nlags='auto')
```

Interpretation habit:
- If ADF p-value is small → evidence against unit root.
- If KPSS p-value is small → evidence against stationarity.

### VAR: multivariate autoregression
VAR models multiple series together:
```python
from statsmodels.tsa.api import VAR

# df: DataFrame of stationary-ish series with a DatetimeIndex
# model = VAR(df)
# res = model.fit(maxlags=8, ic='aic')  # or choose lags manually
# print(res.summary())
```

Useful tools:
```python
# res.test_causality('y', ['x1', 'x2'])      # Granger causality tests
# irf = res.irf(12)                         # impulse responses to 12 steps
# irf.plot(orth=True)                       # orthogonalized (ordering matters)
```

### Practical cautions
- Nonstationary series can create **spurious regression** results.
- IRFs depend on identification choices (e.g., Cholesky ordering).
- Macro series are revised and can have structural breaks; treat results as conditional and fragile.


<a id="build-stationary-dataset"></a>
## Build stationary dataset

### Goal
Build a small stationary-ish dataset to fit a VAR.



### Your Turn: Load and transform


In [None]:
import pandas as pd

panel = pd.read_csv(SAMPLE_DIR / 'panel_monthly_sample.csv', index_col=0, parse_dates=True).dropna()

# TODO: Choose a few columns and difference them
df = panel[['UNRATE', 'FEDFUNDS', 'INDPRO']].astype(float).diff().dropna()
df.head()



<a id="fit-var-choose-lags"></a>
## Fit VAR + choose lags

### Goal
Fit a VAR and choose lags using an information criterion.



### Your Turn: Fit VAR


In [None]:
from statsmodels.tsa.api import VAR

# TODO: Fit and inspect chosen lag order
res = VAR(df).fit(maxlags=8, ic='aic')
res.k_ar



<a id="granger-causality"></a>
## Granger causality

### Goal
Run at least one Granger causality test.

Reminder: this is predictive causality, not structural causality.



### Your Turn: Test causality


In [None]:
# Example: do lagged FEDFUNDS help predict UNRATE?
res.test_causality('UNRATE', ['FEDFUNDS']).summary()



<a id="irfs-forecasting"></a>
## IRFs + forecasting

### Goal
Compute and plot impulse responses.

Caution:
- orthogonalized IRFs depend on variable ordering.



### Your Turn: IRFs


In [None]:
irf = res.irf(12)
irf.plot(orth=True)



<a id="checkpoint-self-check"></a>
## Checkpoint (Self-Check)
Run a few asserts and write 2-3 sentences summarizing what you verified.



In [None]:
import pandas as pd

# TODO: Validate your time series table is well-formed.
# Example (adjust variable names):
# assert isinstance(df.index, pd.DatetimeIndex)
# assert df.index.is_monotonic_increasing
# assert df.shape[0] > 30
#
# TODO: If you built transformed series (diff/logdiff), confirm no future leakage.
# Hint: transformations should only use past/current values (shift/diff), never future.
...



## Extensions (Optional)
- Try one additional variant beyond the main path (different features, different split, different model).
- Write down what improved, what got worse, and your hypothesis for why.



## Reflection
- What did you assume implicitly (about timing, availability, stationarity, or costs)?
- If you had to ship this model, what would you monitor?



<a id="solutions-reference"></a>
## Solutions (Reference)

Try the TODOs first. Use these only to unblock yourself or to compare approaches.

<details><summary>Solution: Build stationary dataset</summary>

_One possible approach. Your variable names may differ; align them with the notebook._

```python
# Reference solution for 02_var_impulse_responses — Build stationary dataset
import pandas as pd

panel = pd.read_csv(SAMPLE_DIR / 'panel_monthly_sample.csv', index_col=0, parse_dates=True).dropna()
df = panel[['UNRATE', 'FEDFUNDS', 'INDPRO']].astype(float).diff().dropna()
df.head()
```

</details>

<details><summary>Solution: Fit VAR + choose lags</summary>

_One possible approach. Your variable names may differ; align them with the notebook._

```python
# Reference solution for 02_var_impulse_responses — Fit VAR + choose lags
from statsmodels.tsa.api import VAR

res = VAR(df).fit(maxlags=8, ic='aic')
res.k_ar
```

</details>

<details><summary>Solution: Granger causality</summary>

_One possible approach. Your variable names may differ; align them with the notebook._

```python
# Reference solution for 02_var_impulse_responses — Granger causality
# Example: do lagged FEDFUNDS help predict UNRATE?
res.test_causality('UNRATE', ['FEDFUNDS']).summary()
```

</details>

<details><summary>Solution: IRFs + forecasting</summary>

_One possible approach. Your variable names may differ; align them with the notebook._

```python
# Reference solution for 02_var_impulse_responses — IRFs + forecasting
irf = res.irf(12)
irf.plot(orth=True)
```

</details>

