Skip to content

Commit b45dab0

Browse files
committed
Styling fixes
1 parent 5fb07e9 commit b45dab0

File tree

1 file changed

+46
-48
lines changed

1 file changed

+46
-48
lines changed

lectures/ar1_bayes.md

Lines changed: 46 additions & 48 deletions
Original file line numberDiff line numberDiff line change
@@ -43,96 +43,95 @@ logger.setLevel(logging.CRITICAL)
4343
4444
```
4545

46-
This lecture uses Bayesian methods offered by [pymc](https://www.pymc.io/projects/docs/en/stable/) and [numpyro](https://num.pyro.ai/en/stable/) to make statistical inferences about two parameters of a univariate first-order autoregression.
46+
This lecture uses Bayesian methods offered by [pymc](https://www.pymc.io/projects/docs/en/stable/) and [numpyro](https://num.pyro.ai/en/stable/) to make statistical inferences about two parameters of a univariate first-order autoregression.
4747

4848

49-
The model is a good laboratory for illustrating
49+
The model is a good laboratory for illustrating
5050
consequences of alternative ways of modeling the distribution of the initial $y_0$:
5151

5252
- As a fixed number
53-
53+
5454
- As a random variable drawn from the stationary distribution of the $\{y_t\}$ stochastic process
5555

5656

57-
The first component of statistical model is
57+
The first component of the statistical model is
5858

59-
$$
60-
y_{t+1} = \rho y_t + \sigma_x \epsilon_{t+1}, \quad t \geq 0
59+
$$
60+
y_{t+1} = \rho y_t + \sigma_x \epsilon_{t+1}, \quad t \geq 0
6161
$$ (eq:themodel)
6262
63-
where the scalars $\rho$ and $\sigma_x$ satisfy $|\rho| < 1$ and $\sigma_x > 0$;
63+
where the scalars $\rho$ and $\sigma_x$ satisfy $|\rho| < 1$ and $\sigma_x > 0$;
6464
$\{\epsilon_{t+1}\}$ is a sequence of i.i.d. normal random variables with mean $0$ and variance $1$.
6565
6666
The second component of the statistical model is
6767
6868
$$
6969
y_0 \sim {\cal N}(\mu_0, \sigma_0^2)
70-
$$ (eq:themodel_2)
70+
$$ (eq:themodel_2)
7171
7272
7373
7474
Consider a sample $\{y_t\}_{t=0}^T$ governed by this statistical model.
7575
76-
The model
76+
The model
7777
implies that the likelihood function of $\{y_t\}_{t=0}^T$ can be **factored**:
7878
79-
$$
80-
f(y_T, y_{T-1}, \ldots, y_0) = f(y_T| y_{T-1}) f(y_{T-1}| y_{T-2}) \cdots f(y_1 | y_0 ) f(y_0)
79+
$$
80+
f(y_T, y_{T-1}, \ldots, y_0) = f(y_T| y_{T-1}) f(y_{T-1}| y_{T-2}) \cdots f(y_1 | y_0 ) f(y_0)
8181
$$
8282
8383
where we use $f$ to denote a generic probability density.
8484
85-
The statistical model {eq}`eq:themodel`-{eq}`eq:themodel_2` implies
85+
The statistical model {eq}`eq:themodel`-{eq}`eq:themodel_2` implies
8686
8787
$$
8888
\begin{aligned}
8989
f(y_t | y_{t-1}) & \sim {\mathcal N}(\rho y_{t-1}, \sigma_x^2) \\
9090
f(y_0) & \sim {\mathcal N}(\mu_0, \sigma_0^2)
91-
\end{aligned}
91+
\end{aligned}
9292
$$
9393
9494
We want to study how inferences about the unknown parameters $(\rho, \sigma_x)$ depend on what is assumed about the parameters $\mu_0, \sigma_0$ of the distribution of $y_0$.
9595
96-
Below, we study two widely used alternative assumptions:
96+
Below, we study two widely used alternative assumptions:
9797
9898
- $(\mu_0,\sigma_0) = (y_0, 0)$ which means that $y_0$ is drawn from the distribution ${\mathcal N}(y_0, 0)$; in effect, we are **conditioning on an observed initial value**.
9999
100100
- $\mu_0,\sigma_0$ are functions of $\rho, \sigma_x$ because $y_0$ is drawn from the stationary distribution implied by $\rho, \sigma_x$.
101-
102101
103-
102+
103+
104104
**Note:** We do **not** treat a third possible case in which $\mu_0,\sigma_0$ are free parameters to be estimated.
105-
106-
Unknown parameters are $\rho, \sigma_x$.
105+
106+
Unknown parameters are $\rho, \sigma_x$.
107107
108108
We have independent **prior probability distributions** for $\rho, \sigma_x$ and want to compute a posterior probability distribution after observing a sample $\{y_{t}\}_{t=0}^T$.
109109
110-
The notebook uses `pymc4` and `numpyro` to compute a posterior distribution of $\rho, \sigma_x$.
110+
The notebook uses `pymc4` and `numpyro` to compute a posterior distribution of $\rho, \sigma_x$.
111111
112112
113-
Thus, we explore consequences of making these alternative assumptions about the distribution of $y_0$:
113+
Thus, we explore consequences of making these alternative assumptions about the distribution of $y_0$:
114114
115-
- A first procedure is to condition on whatever value of $y_0$ is observed. This amounts to assuming that the probability distribution of the random variable $y_0$ is a Dirac delta function that puts probability one on the observed value of $y_0$.
115+
- A first procedure is to condition on whatever value of $y_0$ is observed. This amounts to assuming that the probability distribution of the random variable $y_0$ is a Dirac delta function that puts probability one on the observed value of $y_0$.
116116
117-
- A second procedure assumes that $y_0$ is drawn from the stationary distribution of a process described by {eq}`eq:themodel`
118-
so that $y_0 \sim {\cal N} \left(0, {\sigma_x^2\over (1-\rho)^2} \right) $
117+
- A second procedure assumes that $y_0$ is drawn from the stationary distribution of a process described by {eq}`eq:themodel`
118+
so that $y_0 \sim {\cal N} \left(0, {\sigma_x^2\over (1-\rho)^2} \right) $
119119
120120
When the initial value $y_0$ is far out in a tail of the stationary distribution, conditioning on an initial value gives a posterior that is **more accurate** in a sense that we'll explain.
121121
122-
Basically, when $y_0$ happens to be in a tail of the stationary distribution and we **don't condition on $y_0$**, the likelihood function for $\{y_t\}_{t=0}^T$ adjusts the posterior distribution of the parameter pair $\rho, \sigma_x $ to make the observed value of $y_0$ more likely than it really is under the stationary distribution, thereby adversely twisting the posterior in short samples.
122+
Basically, when $y_0$ happens to be in a tail of the stationary distribution and we **don't condition on $y_0$**, the likelihood function for $\{y_t\}_{t=0}^T$ adjusts the posterior distribution of the parameter pair $\rho, \sigma_x $ to make the observed value of $y_0$ more likely than it really is under the stationary distribution, thereby adversely twisting the posterior in short samples.
123123
124124
An example below shows how not conditioning on $y_0$ adversely shifts the posterior probability distribution of $\rho$ toward larger values.
125125
126126
127-
We begin by solving a **direct problem** that simulates an AR(1) process.
127+
We begin by solving a **direct problem** that simulates an AR(1) process.
128128
129129
How we select the initial value $y_0$ matters.
130130
131-
* If we think $y_0$ is drawn from the stationary distribution ${\mathcal N}(0, \frac{\sigma_x^{2}}{1-\rho^2})$, then it is a good idea to use this distribution as $f(y_0)$. Why? Because $y_0$ contains information
132-
about $\rho, \sigma_x$.
133-
131+
* If we think $y_0$ is drawn from the stationary distribution ${\mathcal N}(0, \frac{\sigma_x^{2}}{1-\rho^2})$, then it is a good idea to use this distribution as $f(y_0)$. Why? Because $y_0$ contains information about $\rho, \sigma_x$.
132+
134133
* If we suspect that $y_0$ is far in the tails of the stationary distribution -- so that variation in early observations in the sample have a significant **transient component** -- it is better to condition on $y_0$ by setting $f(y_0) = 1$.
135-
134+
136135
137136
To illustrate the issue, we'll begin by choosing an initial $y_0$ that is far out in a tail of the stationary distribution.
138137
@@ -148,7 +147,7 @@ def ar1_simulate(rho, sigma, y0, T):
148147
y[0] = y0
149148
for t in range(1, T):
150149
y[t] = rho*y[t-1] + eps[t]
151-
150+
152151
return y
153152
154153
sigma = 1.
@@ -164,23 +163,23 @@ plt.plot(y)
164163
plt.tight_layout()
165164
```
166165
167-
Now we shall use Bayes' law to construct a posterior distribution, conditioning on the initial value of $y_0$.
166+
Now we shall use Bayes' law to construct a posterior distribution, conditioning on the initial value of $y_0$.
168167
169168
(Later we'll assume that $y_0$ is drawn from the stationary distribution, but not now.)
170169
171170
First we'll use **pymc4**.
172171
173172
## PyMC Implementation
174173
175-
For a normal distribution in `pymc`,
174+
For a normal distribution in `pymc`,
176175
$var = 1/\tau = \sigma^{2}$.
177176
178177
```{code-cell} ipython3
179178
180179
AR1_model = pmc.Model()
181180
182181
with AR1_model:
183-
182+
184183
# Start with priors
185184
rho = pmc.Uniform('rho', lower=-1., upper=1.) # Assume stable rho
186185
sigma = pmc.HalfNormal('sigma', sigma = np.sqrt(10))
@@ -204,31 +203,31 @@ with AR1_model:
204203
az.plot_trace(trace, figsize=(17,6))
205204
```
206205
207-
Evidently, the posteriors aren't centered on the true values of $.5, 1$ that we used to generate the data.
206+
Evidently, the posteriors aren't centered on the true values of $.5, 1$ that we used to generate the data.
208207
209-
This is is a symptom of the classic **Hurwicz bias** for first order autorgressive processes (see Leonid Hurwicz {cite}`hurwicz1950least`.)
208+
This is a symptom of the classic **Hurwicz bias** for first order autoregressive processes (see Leonid Hurwicz {cite}`hurwicz1950least`.)
210209
211-
The Hurwicz bias is worse the smaller is the sample (see {cite}`Orcutt_Winokur_69`.)
210+
The Hurwicz bias is worse the smaller is the sample (see {cite}`Orcutt_Winokur_69`).
212211
213212
214213
Be that as it may, here is more information about the posterior.
215214
216215
```{code-cell} ipython3
217216
with AR1_model:
218217
summary = az.summary(trace, round_to=4)
219-
218+
220219
summary
221220
```
222221
223222
Now we shall compute a posterior distribution after seeing the same data but instead assuming that $y_0$ is drawn from the stationary distribution.
224223
225-
This means that
224+
This means that
226225
227226
$$
228227
y_0 \sim N \left(0, \frac{\sigma_x^{2}}{1 - \rho^{2}} \right)
229228
$$
230229
231-
We alter the code as follows:
230+
We alter the code as follows:
232231
233232
```{code-cell} ipython3
234233
AR1_model_y0 = pmc.Model()
@@ -238,10 +237,10 @@ with AR1_model_y0:
238237
# Start with priors
239238
rho = pmc.Uniform('rho', lower=-1., upper=1.) # Assume stable rho
240239
sigma = pmc.HalfNormal('sigma', sigma=np.sqrt(10))
241-
240+
242241
# Standard deviation of ergodic y
243242
y_sd = sigma / np.sqrt(1 - rho**2)
244-
243+
245244
# yhat
246245
yhat = rho * y[:-1]
247246
y_data = pmc.Normal('y_obs', mu=yhat, sigma=sigma, observed=y[1:])
@@ -265,7 +264,7 @@ with AR1_model_y0:
265264
```{code-cell} ipython3
266265
with AR1_model:
267266
summary_y0 = az.summary(trace_y0, round_to=4)
268-
267+
269268
summary_y0
270269
```
271270
@@ -278,7 +277,7 @@ that make observations more likely.
278277
279278
We'll return to this issue after we use `numpyro` to compute posteriors under our two alternative assumptions about the distribution of $y_0$.
280279
281-
We'll now repeat the calculations using `numpyro`.
280+
We'll now repeat the calculations using `numpyro`.
282281
283282
## Numpyro Implementation
284283
@@ -319,10 +318,10 @@ def AR1_model(data):
319318
320319
# Expected value of y at the next period (rho * y)
321320
yhat = rho * data[:-1]
322-
321+
323322
# Likelihood of the actual realization.
324323
y_data = numpyro.sample('y_obs', dist.Normal(loc=yhat, scale=sigma), obs=data[1:])
325-
324+
326325
```
327326
328327
```{code-cell} ipython3
@@ -347,7 +346,7 @@ plot_posterior(mcmc.get_samples())
347346
mcmc.print_summary()
348347
```
349348
350-
Next, we again compute the posterior under the assumption that $y_0$ is drawn from the stationary distribution, so that
349+
Next, we again compute the posterior under the assumption that $y_0$ is drawn from the stationary distribution, so that
351350
352351
$$
353352
y_0 \sim N \left(0, \frac{\sigma_x^{2}}{1 - \rho^{2}} \right)
@@ -366,7 +365,7 @@ def AR1_model_y0(data):
366365
367366
# Expected value of y at the next period (rho * y)
368367
yhat = rho * data[:-1]
369-
368+
370369
# Likelihood of the actual realization.
371370
y_data = numpyro.sample('y_obs', dist.Normal(loc=yhat, scale=sigma), obs=data[1:])
372371
y0_data = numpyro.sample('y0_obs', dist.Normal(loc=0., scale=y_sd), obs=data[0])
@@ -402,4 +401,3 @@ is telling `numpyro` to explain what it interprets as "explosive" observations
402401
Bayes' Law is able to generate a plausible likelihood for the first observation by driving $\rho \rightarrow 1$ and $\sigma \uparrow$ in order to raise the variance of the stationary distribution.
403402
404403
Our example illustrates the importance of what you assume about the distribution of initial conditions.
405-

0 commit comments

Comments
 (0)