Skip to content
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
82 changes: 44 additions & 38 deletions lectures/ar1_bayes.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ kernelspec:
```{code-cell} ipython3
:tags: [hide-output]

!pip install numpyro jax
!pip install numpyro
```

In addition to what's included in base Anaconda, we need to install the following packages
Expand Down Expand Up @@ -53,9 +53,7 @@ logger.setLevel(logging.CRITICAL)

This lecture uses Bayesian methods offered by [pymc](https://www.pymc.io/projects/docs/en/stable/) and [numpyro](https://num.pyro.ai/en/stable/) to make statistical inferences about two parameters of a univariate first-order autoregression.


The model is a good laboratory for illustrating
consequences of alternative ways of modeling the distribution of the initial $y_0$:
The model is a good laboratory for illustrating consequences of alternative ways of modeling the distribution of the initial $y_0$:

- As a fixed number

Expand All @@ -68,7 +66,8 @@ $$
y_{t+1} = \rho y_t + \sigma_x \epsilon_{t+1}, \quad t \geq 0
$$ (eq:themodel)

where the scalars $\rho$ and $\sigma_x$ satisfy $|\rho| < 1$ and $\sigma_x > 0$;
where the scalars $\rho$ and $\sigma_x$ satisfy $|\rho| < 1$ and $\sigma_x > 0$.

$\{\epsilon_{t+1}\}$ is a sequence of i.i.d. normal random variables with mean $0$ and variance $1$.

The second component of the statistical model is
Expand All @@ -81,8 +80,7 @@ $$ (eq:themodel_2)

Consider a sample $\{y_t\}_{t=0}^T$ governed by this statistical model.

The model
implies that the likelihood function of $\{y_t\}_{t=0}^T$ can be **factored**:
The model implies that the likelihood function of $\{y_t\}_{t=0}^T$ can be **factored**:

$$
f(y_T, y_{T-1}, \ldots, y_0) = f(y_T| y_{T-1}) f(y_{T-1}| y_{T-2}) \cdots f(y_1 | y_0 ) f(y_0)
Expand All @@ -103,7 +101,9 @@ We want to study how inferences about the unknown parameters $(\rho, \sigma_x)$

Below, we study two widely used alternative assumptions:

- $(\mu_0,\sigma_0) = (y_0, 0)$ which means that $y_0$ is drawn from the distribution ${\mathcal N}(y_0, 0)$; in effect, we are **conditioning on an observed initial value**.
- $(\mu_0,\sigma_0) = (y_0, 0)$ which means that $y_0$ is drawn from the distribution ${\mathcal N}(y_0, 0)$.

In effect, we are **conditioning on an observed initial value**.

- $\mu_0,\sigma_0$ are functions of $\rho, \sigma_x$ because $y_0$ is drawn from the stationary distribution implied by $\rho, \sigma_x$.

Expand All @@ -115,16 +115,23 @@ Unknown parameters are $\rho, \sigma_x$.

We have independent **prior probability distributions** for $\rho, \sigma_x$ and want to compute a posterior probability distribution after observing a sample $\{y_{t}\}_{t=0}^T$.

The notebook uses `pymc4` and `numpyro` to compute a posterior distribution of $\rho, \sigma_x$. We will use NUTS samplers to generate samples from the posterior in a chain. Both of these libraries support NUTS samplers.
The notebook uses `pymc4` and `numpyro` to compute a posterior distribution of $\rho, \sigma_x$.

We will use NUTS samplers to generate samples from the posterior in a chain.

NUTS is a form of Monte Carlo Markov Chain (MCMC) algorithm that bypasses random walk behaviour and allows for convergence to a target distribution more quickly. This not only has the advantage of speed, but allows for complex models to be fitted without having to employ specialised knowledge regarding the theory underlying those fitting methods.
Both of these libraries support NUTS samplers.

NUTS is a form of Monte Carlo Markov Chain (MCMC) algorithm that bypasses random walk behaviour and allows for convergence to a target distribution more quickly.

This not only has the advantage of speed, but allows for complex models to be fitted without having to employ specialised knowledge regarding the theory underlying those fitting methods.

Thus, we explore consequences of making these alternative assumptions about the distribution of $y_0$:

- A first procedure is to condition on whatever value of $y_0$ is observed. This amounts to assuming that the probability distribution of the random variable $y_0$ is a Dirac delta function that puts probability one on the observed value of $y_0$.
- A first procedure is to condition on whatever value of $y_0$ is observed.

- A second procedure assumes that $y_0$ is drawn from the stationary distribution of a process described by {eq}`eq:themodel`
so that $y_0 \sim {\cal N} \left(0, {\sigma_x^2\over (1-\rho)^2} \right) $
This amounts to assuming that the probability distribution of the random variable $y_0$ is a Dirac delta function that puts probability one on the observed value of $y_0$.

- A second procedure assumes that $y_0$ is drawn from the stationary distribution of a process described by {eq}`eq:themodel` so that $y_0 \sim {\cal N} \left(0, {\sigma_x^2\over (1-\rho)^2} \right) $

When the initial value $y_0$ is far out in a tail of the stationary distribution, conditioning on an initial value gives a posterior that is **more accurate** in a sense that we'll explain.

Expand All @@ -137,7 +144,9 @@ We begin by solving a **direct problem** that simulates an AR(1) process.

How we select the initial value $y_0$ matters.

* If we think $y_0$ is drawn from the stationary distribution ${\mathcal N}(0, \frac{\sigma_x^{2}}{1-\rho^2})$, then it is a good idea to use this distribution as $f(y_0)$. Why? Because $y_0$ contains information about $\rho, \sigma_x$.
* If we think $y_0$ is drawn from the stationary distribution ${\mathcal N}(0, \frac{\sigma_x^{2}}{1-\rho^2})$, then it is a good idea to use this distribution as $f(y_0)$.

Why? Because $y_0$ contains information about $\rho, \sigma_x$.

* If we suspect that $y_0$ is far in the tails of the stationary distribution -- so that variation in early observations in the sample have a significant **transient component** -- it is better to condition on $y_0$ by setting $f(y_0) = 1$.

Expand All @@ -146,25 +155,25 @@ To illustrate the issue, we'll begin by choosing an initial $y_0$ that is far ou

```{code-cell} ipython3

def ar1_simulate(rho, sigma, y0, T):
def ar1_simulate(ρ, σ, y0, T):

# Allocate space and draw epsilons
y = np.empty(T)
eps = np.random.normal(0.,sigma,T)
eps = np.random.normal(0., σ, T)

# Initial condition and step forward
y[0] = y0
for t in range(1, T):
y[t] = rho*y[t-1] + eps[t]
y[t] = ρ * y[t-1] + eps[t]

return y

sigma = 1.
rho = 0.5
σ = 1.
ρ = 0.5
T = 50

np.random.seed(145353452)
y = ar1_simulate(rho, sigma, 10, T)
y = ar1_simulate(ρ, σ, 10, T)
```

```{code-cell} ipython3
Expand All @@ -180,8 +189,7 @@ First we'll use **pymc4**.

## PyMC Implementation

For a normal distribution in `pymc`,
$var = 1/\tau = \sigma^{2}$.
For a normal distribution in `pymc`, $var = 1/\tau = \sigma^{2}$.

```{code-cell} ipython3

Expand All @@ -190,10 +198,10 @@ AR1_model = pmc.Model()
with AR1_model:

# Start with priors
rho = pmc.Uniform('rho', lower=-1., upper=1.) # Assume stable rho
rho = pmc.Uniform('rho', lower=-1., upper=1.) # Assume stable ρ
sigma = pmc.HalfNormal('sigma', sigma = np.sqrt(10))

# Expected value of y at the next period (rho * y)
# Expected value of y at the next period (ρ * y)
yhat = rho * y[:-1]

# Likelihood of the actual realization
Expand All @@ -216,7 +224,7 @@ with AR1_model:

Evidently, the posteriors aren't centered on the true values of $.5, 1$ that we used to generate the data.

This is a symptom of the classic **Hurwicz bias** for first order autoregressive processes (see Leonid Hurwicz {cite}`hurwicz1950least`.)
This is a symptom of the classic **Hurwicz bias** for first order autoregressive processes (see Leonid Hurwicz {cite}`hurwicz1950least`).

The Hurwicz bias is worse the smaller is the sample (see {cite}`Orcutt_Winokur_69`).

Expand Down Expand Up @@ -246,7 +254,7 @@ AR1_model_y0 = pmc.Model()
with AR1_model_y0:

# Start with priors
rho = pmc.Uniform('rho', lower=-1., upper=1.) # Assume stable rho
rho = pmc.Uniform('rho', lower=-1., upper=1.) # Assume stable ρ
sigma = pmc.HalfNormal('sigma', sigma=np.sqrt(10))

# Standard deviation of ergodic y
Expand Down Expand Up @@ -284,8 +292,7 @@ Please note how the posterior for $\rho$ has shifted to the right relative to wh
Think about why this happens.

```{hint}
It is connected to how Bayes Law (conditional probability) solves an **inverse problem** by putting high probability on parameter values
that make observations more likely.
It is connected to how Bayes Law (conditional probability) solves an **inverse problem** by putting high probability on parameter values that make observations more likely.
```

We'll return to this issue after we use `numpyro` to compute posteriors under our two alternative assumptions about the distribution of $y_0$.
Expand All @@ -308,18 +315,18 @@ def plot_posterior(sample):

fig, axs = plt.subplots(2, 2, figsize=(17, 6))
# Plot trace
axs[0, 0].plot(rhos) # rho
axs[1, 0].plot(sigmas) # sigma
axs[0, 0].plot(rhos) # ρ
axs[1, 0].plot(sigmas) # σ

# Plot posterior
axs[0, 1].hist(rhos, bins=50, density=True, alpha=0.7)
axs[0, 1].set_xlim([0, 1])
axs[1, 1].hist(sigmas, bins=50, density=True, alpha=0.7)

axs[0, 0].set_title("rho")
axs[0, 1].set_title("rho")
axs[1, 0].set_title("sigma")
axs[1, 1].set_title("sigma")
axs[0, 0].set_title("ρ")
axs[0, 1].set_title("ρ")
axs[1, 0].set_title("σ")
axs[1, 1].set_title("σ")
plt.show()
```

Expand All @@ -329,7 +336,7 @@ def AR1_model(data):
rho = numpyro.sample('rho', dist.Uniform(low=-1., high=1.))
sigma = numpyro.sample('sigma', dist.HalfNormal(scale=np.sqrt(10)))

# Expected value of y at the next period (rho * y)
# Expected value of y at the next period (ρ * y)
yhat = rho * data[:-1]

# Likelihood of the actual realization.
Expand Down Expand Up @@ -376,7 +383,7 @@ def AR1_model_y0(data):
# Standard deviation of ergodic y
y_sd = sigma / jnp.sqrt(1 - rho**2)

# Expected value of y at the next period (rho * y)
# Expected value of y at the next period (ρ * y)
yhat = rho * data[:-1]

# Likelihood of the actual realization.
Expand Down Expand Up @@ -408,8 +415,7 @@ mcmc2.print_summary()

Look what happened to the posterior!

It has moved far from the true values of the parameters used to generate the data because of how Bayes' Law (i.e., conditional probability)
is telling `numpyro` to explain what it interprets as "explosive" observations early in the sample.
It has moved far from the true values of the parameters used to generate the data because of how Bayes' Law (i.e., conditional probability) is telling `numpyro` to explain what it interprets as "explosive" observations early in the sample.

Bayes' Law is able to generate a plausible likelihood for the first observation by driving $\rho \rightarrow 1$ and $\sigma \uparrow$ in order to raise the variance of the stationary distribution.

Expand Down