Skip to content

Commit

Permalink
pdf formatting and correlated mcmc
Browse files Browse the repository at this point in the history
  • Loading branch information
Bob Carpenter committed Feb 19, 2019
1 parent 1070bc3 commit 78fb877
Show file tree
Hide file tree
Showing 8 changed files with 225 additions and 34 deletions.
2 changes: 1 addition & 1 deletion before-chapter.R
Expand Up @@ -9,7 +9,7 @@ knitr::opts_chunk$set(
comment = " ",
dev = "png",
dpi = 300,
echo = FALSE,
echo = TRUE,
fig.align = "center",
fig.width = 7,
fig.asp = 0.618,
Expand Down
40 changes: 19 additions & 21 deletions continuous-random-variables.Rmd
Expand Up @@ -178,32 +178,30 @@ $${10 \choose 1} \times {10 \choose 5} \times \left(\frac{1}{10}\right)^5 \times
\left(\frac{9}{10}\right)^5 \approx 0.015.$$] As usual, we turn to
repetition and sizing to see what's going on.
```{r}
set.seed(1234)
df_runif <- data.frame()
for (N in 2^c(2, 4, 6, 8, 10, 12))
df_runif <- rbind(df_runif,
data.frame(y = runif(N),
size = rep(N, N)))
```
```{r fig.cap="Histograms for uniform(0, 1) samples of increasing sizes. The proportion of draws falling into each bin becomes more uniform as the sample size increases. With each sample plotted to the same height, the vertical count axis varies in scale among the plots."}
```{r fig.asp = 1.25, fig.cap="Histograms for uniform(0, 1) samples of increasing sizes. The proportion of draws falling into each bin becomes more uniform as the sample size increases. With each sample plotted to the same height, the vertical count axis varies in scale among the plots."}
set.seed(1234)
df_unif <- data.frame()
for (N in 2^c(2, 4, 6, 8, 10, 12)) {
df_unif <- rbind(df_unif,
data.frame(y = runif(N),
size = rep(paste("N=", N), N)))
}
plot_runif <-
ggplot(df_runif, aes(y)) +
facet_wrap(size ~ .) +
geom_histogram(binwidth = 0.1, center = 0.05, color = "black",
fill="#ffffe6", size = 0.25) +
scale_x_continuous("y", breaks = seq(0, 1, 0.2), limits = c(0, 1),
expand = c(0, 0.02)) +
scale_y_continuous("proportion of draws", breaks = c(),
expand = c(0.02, 0)) +
coord_fixed(ratio = 0.5) +
plot <- ggplot(df_unif, aes(y)) +
facet_wrap(size ~ ., scales = "free") +
geom_histogram(binwidth = 0.1, center = 0.05, color = "black", fill="#ffffe6") +
scale_x_continuous("y", breaks = c(0, 0.5, 1), labels = c("0.0", "0.5", "1.0"), limits = c(0, 1), expand = c(0, 0.02)) +
scale_y_continuous("proportion of draws", breaks = c(), expand = c(0.02, 0)) +
# coord_fixed() +
theme(panel.spacing.y = unit(24, "pt")) +
theme(panel.spacing.x = unit(24, "pt")) +
ggtheme_tufte()
plot_runif
theme(aspect.ratio = 0.5)
plot
```
## Calculating $\pi$ via simulation
Now that we have a continuous random number generator, there are all
Expand All @@ -225,7 +223,7 @@ Simulations $x^{(m)}, y^{(m)}$ of these variables pick out a point on
the plane within a square bounded by -1 and 1 in both dimensions. Here
is a plot of the square in which the draws will fall. Also shown is a
circle of unit radius inscribed within that square. Draws may or may
not fall within the circle.[Technically, the
not fall within the circle.^[Technically, the
bearer of area is a disc and the line around its border a circle.
Mathematicians are picky because, topologically speaking, a disc has
two dimensions whereas a circle has but one.]
Expand Down
2 changes: 1 addition & 1 deletion index.Rmd
Expand Up @@ -3,7 +3,7 @@ knit: 'bookdown::render_book("index.Rmd", "tufte_html_book")'
title: 'Probability and Statistics'
subtitle: 'a simulation-based introduction'
author: 'Bob Carpenter'
date: '2018'
date: '2019'
output:
bookdown::tufte_html_book:
split_by: chapter
Expand Down
2 changes: 1 addition & 1 deletion inference.Rmd
Expand Up @@ -602,7 +602,7 @@ printf('estimated Pr[theta > 0.5] = %3.2f\n', sum(theta > 0.5) / M)

Now let's overlay the median and central 90% interval.

```{r fig.cap = "Histogram of $1\\,000\\,000$ draws from the posterior $$p(\\theta \\mid y, N) \\propto \\mbox{binomial}(y \\mid N, \\theta),$$ given $N = 10, y = 3$. The median (50% quantile) is indicated with a dashed line and the boundaries of the central 90% interval (5% and 95% quantiles) are picked out with dotted lines. The proportion of the total area shaded to the right of 0.5 represents the posterior probability that $\\theta > 0.5,$ which is about 11%."}
```{r fig.cap = 'Histogram of $1\\,000\\,000$ draws from the posterior $$p(\\theta \\mid y, N) \\propto \\mbox{binomial}(y \\mid N, \\theta),$$ given $N = 10, y = 3$. The median (50 percent quantile) is indicated with a dashed line and the boundaries of the central 90 percent interval (5 percent and 95 percent quantiles) are picked out with dotted lines. The proportion of the total area shaded to the right of 0.5 represents the posterior probability that $\\theta > 0.5,$ which is about 11 percent.'}
q05 <- quantile(binom_post_df2$theta, 0.05)
q50 <- quantile(binom_post_df2$theta, 0.50)
Expand Down
199 changes: 196 additions & 3 deletions mcmc.Rmd
Expand Up @@ -10,13 +10,206 @@ high dimensions.

## Markov chains

A sequence of random variables $Y = Y_1, Y_2, \ldots $ forms a *Markov
chain* if each element of the sequence is conditionally independent
given the previous element,
A sequence of random variables $Y = Y_1, Y_2, \ldots$ forms a *Markov
chain* if each elements of the sequence is conditionally independent
of the other elements given the previous element,

$$
p(y_n \mid y_1, y_2, \ldots, y_{n-1})
=
p(y_n \mid y_{n-1}).
$$

In all the examples of simulation we've seen up to now, each simulated
draw was independent of the others. In this chapter, we will consider
sequences of draws that make up a Markov chain, which are potentially
correlated (or anti-correlated) with each other.

## Finite Markov chains

The following function draws a Markov chain $y$ of length $M$ from the specified initial state $y_1$ given a transition matrix $\theta$ that indicates the probabilities $y_m$ given $y_{m-1}$. The initial state is included in the final result $y = y_1, \ldots, y_M$.

```{r}
sample_chain <- function(M, init_state, theta, seed = 1234) {
set.seed(seed)
y <- c(init_state)
for (m in 2:M)
y[m] <- sample(x = 0:1, size = 1, prob = theta[y[m - 1] + 1, ])
return(y)
}
M <- 100
```

Here, `y[m - 1]` is the value in the previous iteration (`0` or `1`). If `y[m - 1]` is `0`, then `theta[y[m - 1] + 1, ]` is the first row of `theta`; if `y[m - 1]` is `1`, it is the second row.

The base R function `sample()` draws a single value (`n = 1`) from the sequence `c(0, 1)`.


## Independent draws

```{r, engine="tikz", fig.ext="pdf", fig.width=1.5, fig.cap="State diagram for independent draws."}
\begin{tikzpicture}[->, auto, node distance=1.75cm, font=\footnotesize]
\node[circle,draw,semithick] (A) {{\bf 0}};
\node[circle,draw,semithick] (B) [right of=A] {{\bf 1}};
\path (A) edge [bend left] node {0.5} (B);
\path (B) edge [bend left] node {0.5} (A);
\path (A) edge [loop above] node {0.5} (A);
\path (B) edge [loop above] node {0.5} (B);
\end{tikzpicture}
```

This transition diagram has two nodes representing outputs, which are labeled $0$ and $1$. The edges are labeled with the probability of a transition. For example, the edge from $0$ to $1$ labeled $0.5$ indicates that

$$
\mbox{Pr}[y_m = 1 \mid y_{m - 1} = 0] \ = \ 0.5.
$$

Because the transition probabilities are the same out of each state, $y_n$ and $y_{n-1}$ are independent, i.e.,

$$
p(y_m) \ = \ p(y_m \mid y_{m - 1}) \ = \ 0.5.
$$

```{r}
y <- sample_chain(M, 1, matrix(c(0.5, 0.5,
0.5, 0.5), 2, 2))
```


```{r fig.height=2, fig.cap = "Traceplot of chain with independent draws."}
traceplot_bool <- function (y) {
df <- data.frame(iteration = 1:length(y), draw = y)
plot <- ggplot(df, aes(x = iteration, y = draw)) +
geom_line(aes(y = y)) +
scale_x_continuous("iteration", breaks = c(0, 50, 100), expand = c(0, 0)) +
scale_y_continuous("y", breaks = c(0, 1), expand = c(0.2, 0)) +
ggtheme_tufte()
return(plot)
}
traceplot_bool(y)
```

```{r}
y
```


## Correlated draws

```{r, engine='tikz', fig.ext="pdf", fig.width=1.5, fig.cap = "State diagram for correlated draws."}
\begin{tikzpicture}[->, auto, node distance=1.75cm, font=\footnotesize]
\node[circle,draw,semithick] (A) {{\bf 0}};
\node[circle,draw,semithick] (B) [right of=A] {{\bf 1}};
\path (A) edge [bend left] node {0.1} (B);
\path (B) edge [bend left] node {0.1} (A);
\path (A) edge [loop above] node {0.9} (A);
\path (B) edge [loop above] node {0.9} (B);
\end{tikzpicture}
```

To generate correlated draws, the transition matrix favors remaining in the current state over transitioning to a new state. In this particular case,

$$
p(y_{m} | y_{m-1}) =
\begin{cases}
0.9 & \mbox{if } y_m = y_{m-1}
\\[4pt]
0.1 & \mbox{otherwise}
\end{cases}
$$


```{r}
y <- sample_chain(M, 1, matrix(c(0.9, 0.1,
0.1, 0.9), 2, 2))
```

```{r fig.height=2, fig.cap = "Traceplot for chain with correlated draws."}
traceplot_bool(y)
```

```{r}
y
```

As expected with correlated draws, there are long runs of the same value being produced in the chain. This leads to poor mixing and thus higher MCMC standard errors for the same number of iterations.

## Anticorrelated draws

```{r, engine='tikz', fig.ext="pdf", fig.width = 3, fig.cap="State diagram for anticorrelated draws."}
\begin{tikzpicture}[->, auto, node distance=1.75cm, font=\footnotesize]
\node[circle,draw,semithick] (A) {{\bf 0}};
\node[circle,draw,semithick] (B) [right of=A] {{\bf 1}};
\path (A) edge [bend left] node {0.9} (B);
\path (B) edge [bend left] node {0.9} (A);
\path (A) edge [loop above] node {0.1} (A);
\path (B) edge [loop above] node {0.1} (B);
\end{tikzpicture}
```

To generate anticorrelated draws, the transition matrix favors remaining in the current state over transitioning to a new state. In this particular case,

$$
p(y_{m} | y_{m-1}) =
\begin{cases}
0.1 & \mbox{if } y_m = y_{m-1}
\\[4pt]
0.9 & \mbox{otherwise}
\end{cases}
$$

```{r fig.cap="Traceplot of chain with correlated draws."}
y <- sample_chain(M, 1, matrix(c(0.1, 0.9,
0.9, 0.1), 2, 2))
```


```{r fig.cap="Traceplot of chain with anticorrelated draws."}
traceplot_bool(y)
```

```{r}
y
```

The draws form a dramatic sawtooth pattern as as they alternate between zero and one.

Now let's see how quickly they converge in a side-by-side comparison. A single
chain is enough to illustrate the dramatic differences.

```{r}
M <- 1e4
y_corr <- cumsum(sample_chain(M, 1, corr_trans, seed = 1234)) / (1:M)
y_ind <- cumsum(sample_chain(M, 1, ind_trans, seed = 1234)) / (1:M)
y_anti <- cumsum(sample_chain(M, 1, anti_trans, seed = 1234)) / (1:M)
```

```{r fig.cap = "Correlated draws."}
mcmc_corr_plot <- ggplot(data.frame(y = y_corr, x = 1:M), aes(x = x, y = y)) +
geom_line() +
geom_hline(yintercept = 0.5, color = "red") +
scale_x_log10(lim = c(10, M), breaks = c(1e1, 1e2, 1e3, 1e4)) +
scale_y_continuous(lim = c(.25, .75)) +
ggtheme_tufte()
mcmc_corr_plot
```

```{r fig.cap = "Independent draws."}
mcmc_ind_plot <- ggplot(data.frame(y = y_ind, x = 1:M), aes(x = x, y = y)) +
geom_line() +
geom_hline(yintercept = 0.5, color = "red") +
scale_x_log10(lim = c(10, M), breaks = c(1e1, 1e2, 1e3, 1e4)) +
scale_y_continuous(lim = c(.25, .75)) +
ggtheme_tufte()
mcmc_ind_plot
```

```{r fig.cap = "Anticorrelated draws."}
mcmc_anti_plot <- ggplot(data.frame(y = y_anti, x = 1:M), aes(x = x, y = y)) +
geom_line() +
geom_hline(yintercept = 0.5, color = "red") +
scale_x_log10(lim = c(10, M), breaks = c(1e1, 1e2, 1e3, 1e4)) +
scale_y_continuous(lim = c(.25, .75)) +
ggtheme_tufte()
mcmc_anti_plot
```
4 changes: 2 additions & 2 deletions normal.Rmd
Expand Up @@ -9,7 +9,7 @@ summam terminorum binomii*.] With 10 trials and a probability of
success of 0.9 in each trial, the distribution is clearly asymmetric
(i.e., skewed).

```{r fig.cap = 'Probability of number of successes in $N = 10$ independent trials, each with a 90% chance of success. The result is $\\mbox{binomial}(y \\mid 10, 0.9)$ by construction. With only ten trials, the distribution is highly asymmetric, with skew (longer tails) to the left.'}
```{r fig.cap = 'Probability of number of successes in $N = 10$ independent trials, each with a 90 percent chance of success. The result is $\\mbox{binomial}(y \\mid 10, 0.9)$ by construction. With only ten trials, the distribution is highly asymmetric, with skew (longer tails) to the left.'}
N <- 10
x <- 0:10
Expand All @@ -27,7 +27,7 @@ binomial_limit_plot
But when we increase the number of trials by a factor of 100 to
$N = 1\,000$ without changing the 0.9 probability of success, the result is a nearly symmetric bell shape.

```{r fig.cap = 'Probability of number of successes in $N = 1000$ independent trials, each with a 90% chance of success. The familiar bell-shaped curve arises as $N$ grows.'}
```{r fig.cap = 'Probability of number of successes in $N = 1000$ independent trials, each with a 90 percent chance of success. The familiar bell-shaped curve arises as $N$ grows.'}
x <- 860:940
y <- dbinom(x, 1000, 0.9)
Expand Down
6 changes: 3 additions & 3 deletions predictive-inference.Rmd
Expand Up @@ -182,7 +182,7 @@ the next twenty observations and calculate the probability for each
possible $\tilde{y} \in 0:\tilde{N}$. Then we'll plot them in a bar
chart to inspect the distribution.

```{r fig.cap = "Probablity mass function for the posterior predictive distribution of the number of boys $\\tilde{y}$ in a subsequent group of $\\tilde{N} = 20$ births based on observing 3 boys in 10 births with no prior information. Although it is peaked near the 30% level observed in the data, it would not be that surprising to see more boys than girls in the next 20 births."}
```{r fig.cap = 'Probablity mass function for the posterior predictive distribution of the number of boys, $\\theta^{*} = \\frac{y}{N} = 0.3$, into the sampling distribution, to yield $\\mbox{binomial}(\\tilde{y} \\mid \\tilde{N}, \\theta^{*})$. Compared to the full posterior taking into account estimation uncertinaty in $\\theta$, this plug-in estimate makes it seem very unlikely there will be more boys than girls born in the next 20 births.'}
set.seed(1234)
log_sum_exp <- function(u) max(u) + log(sum(exp(u - max(u))))
Expand Down Expand Up @@ -218,7 +218,7 @@ boys.^[Inference with a single point estimate of one or more
parameters is common in live, real-time applications, where the cost
of averaging over the posterior may be prohibitive.]

```{r fig.cap = "Probability mass function for predictions made by plugging the proportion of boy births observed in the data, $\\theta^* = \\frac{y}{N} = 0.3$, into the sampling distribution, to yield $\\mbox{binomial}(\\tilde{y} \\mid \\tilde{N}, \\theta^*)$. Compared to the full posterior taking into account estimation uncertinaty in $\\theta$, this plug-in estimate makes it seem very unlikely there will be more boys than girls born in the next 20 births."}
```{r fig.cap = 'Probability mass function for predictions made by plugging the proportion of boy births observed in the data $\\tilde{y}$ in a subsequent group of $\\tilde{N} = 20$ births based on observing 3 boys in 10 births with no prior information. Although it is peaked near the 30 percent level observed in the data, it would not be that surprising to see more boys than girls in the next 20 births.'}
binom_df <- data.frame(y_pred = 0:20, log_p = dbinom(0:20, 20, 0.3))
binom_plot <-
Expand All @@ -244,7 +244,7 @@ Consider the following plot, in which the same proportion of boys is
provided (30%), but the sample size continues to grow (quadrupling
each time).

```{r out.width = "100%", fig.width = 9, fig.asp=0.4, fig.cap = "Illustration of convergence of posterior to binomial prediction based on proportion of boys as number of observations $y$ and $N$ grows, with proportion $\\frac{y}{N}$ fixed. The central limit theorem tells us that each quadrupling of the data cuts the uncertainty in parameter estimation in half until all that is left is the sampling uncertainty in the binomial, as represented in the final plot, where $\\theta = 0.3$ is fixed."}
```{r out.width = "100%", fig.width = 9, fig.asp=0.4, fig.cap = 'Illustration of convergence of posterior to binomial prediction based on proportion of boys as number of observations $y$ and $N$ grows, with proportion $\\frac{y}{N}$ fixed. The central limit theorem tells us that each quadrupling of the data cuts the uncertainty in parameter estimation in half until all that is left is the sampling uncertainty in the binomial, as represented in the final plot, where $\\theta = 0.3$ is fixed.'}
M <- 1000
Expand Down
4 changes: 2 additions & 2 deletions rejection-sampling.Rmd
Expand Up @@ -219,7 +219,7 @@ vertical position $u^{(m)} \sim \mbox{uniform}(0, 5)$. The points
whose value for $u$ falls below the density at the value for $\theta$
are retained.
```{r fig.cap = 'A simple instance of rejection sampling from a bounded $\\mbox{beta}(6.8, 1.2)$ distribution, whose density is shown as a solid line. Points $(\\theta, u)$ are drawn uniformly from the rectangle, then accepted as a draw of $\\theta$ if $u$ falls below the density at $\\theta$. The accepted draws are rendered as plus signs and the rejected ones as circles. The acceptance rate here is roughly 20%.'}
```{r fig.cap = 'A simple instance of rejection sampling from a bounded $\\mbox{beta}(6.8, 1.2)$ distribution, whose density is shown as a solid line. Points $(\\theta, u)$ are drawn uniformly from the rectangle, then accepted as a draw of $\\theta$ if $u$ falls below the density at $\\theta$. The accepted draws are rendered as plus signs and the rejected ones as circles. The acceptance rate here is roughly 20 percent.'}
M <- 500
alpha <- 6.8
Expand Down Expand Up @@ -294,7 +294,7 @@ while (true)
Let's run this algorithm for $M = 100\,000$ iterations and see what the
histogram looks like.
```{r, fig.cap="Histogram of $M = 100\\,000$ Draws from $\\mbox{beta}(6.8, 1.2)$ made via rejection sampling. The true density is plotted over the histogram as a line. The acceptance rate for draws was roughly 20%."}
```{r, fig.cap="Histogram of $M = 100\\,000$ Draws from $\\mbox{beta}(6.8, 1.2)$ made via rejection sampling. The true density is plotted over the histogram as a line. The acceptance rate for draws was roughly 20 percent."}
set.seed(1234)
M <- 100000
Expand Down

0 comments on commit 78fb877

Please sign in to comment.