pdf formatting and correlated mcmc

bob-carpenter · Feb 19, 2019 · 78fb877 · 78fb877
1 parent 1070bc3
commit 78fb877
Show file tree

Hide file tree

Showing 8 changed files with 225 additions and 34 deletions.
diff --git a/before-chapter.R b/before-chapter.R
@@ -9,7 +9,7 @@ knitr::opts_chunk$set(
   comment = "  ",
   dev = "png",
   dpi = 300,
-  echo = FALSE,
+  echo = TRUE,
   fig.align = "center",
   fig.width = 7,
   fig.asp = 0.618,

diff --git a/continuous-random-variables.Rmd b/continuous-random-variables.Rmd
@@ -178,32 +178,30 @@ $${10 \choose 1} \times {10 \choose 5} \times \left(\frac{1}{10}\right)^5 \times
 \left(\frac{9}{10}\right)^5 \approx 0.015.$$]  As usual, we turn to
 repetition and sizing to see what's going on.
 
-```{r}
-set.seed(1234)
-df_runif <- data.frame()
-for (N in 2^c(2, 4, 6, 8, 10, 12))
-  df_runif <- rbind(df_runif,
-                    data.frame(y = runif(N),
-                               size = rep(N, N)))
-```
+```{r  fig.cap="Histograms for uniform(0, 1) samples of increasing sizes.  The proportion of draws falling into each bin becomes more uniform as the sample size increases.  With each sample plotted to the same height, the vertical count axis varies in scale among the plots."}
 
-```{r fig.asp = 1.25, fig.cap="Histograms for uniform(0, 1) samples of increasing sizes.  The proportion of draws falling into each bin becomes more uniform as the sample size increases.  With each sample plotted to the same height, the vertical count axis varies in scale among the plots."}
+set.seed(1234)
+df_unif <- data.frame()
+for (N in 2^c(2, 4, 6, 8, 10, 12)) {
+  df_unif <- rbind(df_unif,
+             data.frame(y = runif(N),
+                        size = rep(paste("N=", N), N)))
+}
 
-plot_runif <-
-  ggplot(df_runif, aes(y)) +
-  facet_wrap(size ~ .) +
-  geom_histogram(binwidth = 0.1, center = 0.05, color = "black",
-                 fill="#ffffe6", size = 0.25) +
-  scale_x_continuous("y", breaks = seq(0, 1, 0.2), limits = c(0, 1),
-                     expand = c(0, 0.02)) +
-  scale_y_continuous("proportion of draws", breaks = c(),
-                     expand = c(0.02, 0)) +
-  coord_fixed(ratio = 0.5) +
+plot <- ggplot(df_unif, aes(y)) +
+  facet_wrap(size ~ ., scales = "free") +
+  geom_histogram(binwidth = 0.1, center = 0.05, color = "black", fill="#ffffe6") +
+  scale_x_continuous("y", breaks = c(0, 0.5, 1), labels = c("0.0", "0.5", "1.0"), limits = c(0, 1), expand = c(0, 0.02)) +
+  scale_y_continuous("proportion of draws", breaks = c(), expand = c(0.02, 0)) +
+#  coord_fixed() +
   theme(panel.spacing.y = unit(24, "pt")) +
+  theme(panel.spacing.x = unit(24, "pt")) +
   ggtheme_tufte()
-plot_runif
+  theme(aspect.ratio = 0.5)
+plot
 ```
 
+
 ## Calculating $\pi$ via simulation
 
 Now that we have a continuous random number generator, there are all
@@ -225,7 +223,7 @@ Simulations $x^{(m)}, y^{(m)}$ of these variables pick out a point on
 the plane within a square bounded by -1 and 1 in both dimensions. Here
 is a plot of the square in which the draws will fall. Also shown is a
 circle of unit radius inscribed within that square.  Draws may or may
-not fall within the circle.[Technically, the
+not fall within the circle.^[Technically, the
 bearer of area is a disc and the line around its border a circle.
 Mathematicians are picky because, topologically speaking, a disc has
 two dimensions whereas a circle has but one.]

diff --git a/index.Rmd b/index.Rmd
@@ -3,7 +3,7 @@ knit: 'bookdown::render_book("index.Rmd", "tufte_html_book")'
 title: 'Probability and Statistics'
 subtitle: 'a simulation-based introduction'
 author: 'Bob Carpenter'
-date: '2018'
+date: '2019'
 output:
   bookdown::tufte_html_book:
     split_by: chapter

diff --git a/inference.Rmd b/inference.Rmd
@@ -602,7 +602,7 @@ printf('estimated Pr[theta > 0.5] = %3.2f\n', sum(theta > 0.5) / M)
 
 Now let's overlay the median and central 90% interval.
 
-```{r fig.cap = "Histogram of $1\\,000\\,000$ draws from the posterior $$p(\\theta \\mid y, N) \\propto \\mbox{binomial}(y \\mid N, \\theta),$$ given $N = 10, y = 3$.  The median (50% quantile) is indicated with a dashed line and the boundaries of the central 90% interval (5% and 95% quantiles) are picked out with dotted lines.  The proportion of the total area shaded to the right of 0.5 represents the posterior probability that $\\theta > 0.5,$ which is about 11%."}
+```{r fig.cap = 'Histogram of $1\\,000\\,000$ draws from the posterior $$p(\\theta \\mid y, N) \\propto \\mbox{binomial}(y \\mid N, \\theta),$$ given $N = 10, y = 3$.  The median (50 percent quantile) is indicated with a dashed line and the boundaries of the central 90 percent interval (5 percent and 95 percent quantiles) are picked out with dotted lines.  The proportion of the total area shaded to the right of 0.5 represents the posterior probability that $\\theta > 0.5,$ which is about 11 percent.'}
 
 q05 <- quantile(binom_post_df2$theta, 0.05)
 q50 <- quantile(binom_post_df2$theta, 0.50)

diff --git a/mcmc.Rmd b/mcmc.Rmd
@@ -10,13 +10,206 @@ high dimensions.
 
 ## Markov chains
 
-A sequence of random variables $Y = Y_1, Y_2, \ldots $ forms a *Markov
-chain* if each element of the sequence is conditionally independent
-given the previous element,
+A sequence of random variables $Y = Y_1, Y_2, \ldots$ forms a *Markov
+chain* if each elements  of the sequence is conditionally independent
+of the other elements given the previous element,
 
 $$
 p(y_n \mid y_1, y_2, \ldots, y_{n-1})
 =
 p(y_n \mid y_{n-1}).
 $$
 
+In all the examples of simulation we've seen up to now, each simulated
+draw was independent of the others.  In this chapter, we will consider
+sequences of draws that make up a Markov chain, which are potentially
+correlated (or anti-correlated) with each other.
+
+## Finite Markov chains
+
+The following function draws a Markov chain $y$ of length $M$ from the specified initial state $y_1$ given a transition matrix $\theta$ that indicates the probabilities $y_m$ given $y_{m-1}$.  The initial state is included in the final result $y = y_1, \ldots, y_M$.
+
+```{r}
+sample_chain <- function(M, init_state, theta, seed = 1234) {
+  set.seed(seed)
+  y <- c(init_state)
+  for (m in 2:M)
+    y[m] <- sample(x = 0:1, size = 1, prob = theta[y[m - 1] + 1, ])
+  return(y)
+}
+M <- 100
+```
+
+Here, `y[m - 1]` is the value in the previous iteration (`0` or `1`).  If `y[m - 1]` is `0`, then `theta[y[m - 1] + 1, ]` is the first row of `theta`;  if `y[m - 1]` is `1`, it is the second row.
+
+The base R function `sample()` draws a single value (`n = 1`) from the sequence `c(0, 1)`.
+
+
+## Independent draws
+
+```{r, engine="tikz", fig.ext="pdf", fig.width=1.5, fig.cap="State diagram for independent draws."}
+\begin{tikzpicture}[->, auto, node distance=1.75cm, font=\footnotesize]
+\node[circle,draw,semithick] (A) {{\bf 0}};
+\node[circle,draw,semithick] (B) [right of=A] {{\bf 1}};
+\path (A) edge [bend left] node {0.5} (B);
+\path (B) edge [bend left] node {0.5} (A);
+\path (A) edge [loop above] node {0.5} (A);
+\path (B) edge [loop above] node {0.5} (B);
+\end{tikzpicture}
+```
+
+This transition diagram has two nodes representing outputs, which are labeled $0$ and $1$.  The edges are labeled with the probability of a transition.  For example, the edge from $0$ to $1$ labeled $0.5$ indicates that
+
+$$
+\mbox{Pr}[y_m = 1 \mid y_{m - 1} = 0] \ = \ 0.5.
+$$
+
+Because the transition probabilities are the same out of each state, $y_n$ and $y_{n-1}$ are independent, i.e.,
+
+$$
+p(y_m) \ = \ p(y_m \mid y_{m - 1}) \ = \ 0.5.
+$$
+
+```{r}
+y <- sample_chain(M, 1, matrix(c(0.5, 0.5,
+                                   0.5, 0.5), 2, 2))
+```
+
+
+```{r fig.height=2, fig.cap = "Traceplot of chain with independent draws."}
+traceplot_bool <- function (y) {
+  df <- data.frame(iteration = 1:length(y), draw = y)
+  plot <- ggplot(df, aes(x = iteration, y = draw)) +
+    geom_line(aes(y = y)) +
+    scale_x_continuous("iteration", breaks = c(0, 50, 100), expand = c(0, 0)) +
+    scale_y_continuous("y", breaks = c(0, 1), expand = c(0.2, 0)) +
+    ggtheme_tufte()
+  return(plot)
+}
+traceplot_bool(y)
+```
+
+```{r}
+y
+```
+
+
+## Correlated draws
+
+```{r, engine='tikz', fig.ext="pdf", fig.width=1.5, fig.cap = "State diagram for correlated draws."}
+\begin{tikzpicture}[->, auto, node distance=1.75cm, font=\footnotesize]
+\node[circle,draw,semithick] (A) {{\bf 0}};
+\node[circle,draw,semithick] (B) [right of=A] {{\bf 1}};
+\path (A) edge [bend left] node {0.1} (B);
+\path (B) edge [bend left] node {0.1} (A);
+\path (A) edge [loop above] node {0.9} (A);
+\path (B) edge [loop above] node {0.9} (B);
+\end{tikzpicture}
+```
+
+To generate correlated draws, the transition matrix favors remaining in the current state over transitioning to a new state.  In this particular case,
+
+$$
+p(y_{m} | y_{m-1}) =
+\begin{cases}
+0.9 & \mbox{if } y_m = y_{m-1}
+\\[4pt]
+0.1 & \mbox{otherwise}
+\end{cases}
+$$
+
+
+```{r}
+y <- sample_chain(M, 1, matrix(c(0.9, 0.1,
+                                 0.1, 0.9), 2, 2))
+```
+
+```{r fig.height=2, fig.cap = "Traceplot for chain with correlated draws."}
+traceplot_bool(y)
+```
+
+```{r}
+y
+```
+
+As expected with correlated draws, there are long runs of the same value being produced in the chain.  This leads to poor mixing and thus higher MCMC standard errors for the same number of iterations.
+
+## Anticorrelated draws
+
+```{r, engine='tikz', fig.ext="pdf", fig.width = 3, fig.cap="State diagram for anticorrelated draws."}
+\begin{tikzpicture}[->, auto, node distance=1.75cm, font=\footnotesize]
+\node[circle,draw,semithick] (A) {{\bf 0}};
+\node[circle,draw,semithick] (B) [right of=A] {{\bf 1}};
+\path (A) edge [bend left] node {0.9} (B);
+\path (B) edge [bend left] node {0.9} (A);
+\path (A) edge [loop above] node {0.1} (A);
+\path (B) edge [loop above] node {0.1} (B);
+\end{tikzpicture}
+```
+
+To generate anticorrelated draws, the transition matrix favors remaining in the current state over transitioning to a new state.  In this particular case,
+
+$$
+p(y_{m} | y_{m-1}) =
+\begin{cases}
+0.1 & \mbox{if } y_m = y_{m-1}
+\\[4pt]
+0.9 & \mbox{otherwise}
+\end{cases}
+$$
+
+```{r  fig.cap="Traceplot of chain with correlated draws."}
+y <- sample_chain(M, 1, matrix(c(0.1, 0.9,
+                                 0.9, 0.1), 2, 2))
+```
+
+
+```{r fig.cap="Traceplot of chain with anticorrelated draws."}
+traceplot_bool(y)
+```
+
+```{r}
+y
+```
+
+The draws form a dramatic sawtooth pattern as as they alternate between zero and one.
+
+Now let's see how quickly they converge in a side-by-side comparison.  A single
+chain is enough to illustrate the dramatic differences.
+
+```{r}
+M <- 1e4
+y_corr <- cumsum(sample_chain(M, 1, corr_trans, seed = 1234)) / (1:M)
+y_ind <- cumsum(sample_chain(M, 1, ind_trans, seed = 1234)) / (1:M)
+y_anti <- cumsum(sample_chain(M, 1, anti_trans, seed = 1234)) / (1:M)
+```
+
+```{r fig.cap = "Correlated draws."}
+mcmc_corr_plot <- ggplot(data.frame(y = y_corr, x = 1:M), aes(x = x, y = y)) +
+  geom_line() +
+  geom_hline(yintercept = 0.5, color = "red") +
+  scale_x_log10(lim = c(10, M), breaks = c(1e1, 1e2, 1e3, 1e4)) +
+  scale_y_continuous(lim = c(.25, .75)) +
+  ggtheme_tufte()
+mcmc_corr_plot
+```
+
+```{r fig.cap = "Independent draws."}
+mcmc_ind_plot <- ggplot(data.frame(y = y_ind, x = 1:M), aes(x = x, y = y)) +
+  geom_line() +
+  geom_hline(yintercept = 0.5, color = "red") +
+  scale_x_log10(lim = c(10, M), breaks = c(1e1, 1e2, 1e3, 1e4)) +
+  scale_y_continuous(lim = c(.25, .75)) +
+  ggtheme_tufte()
+mcmc_ind_plot
+```
+
+```{r fig.cap = "Anticorrelated draws."}
+mcmc_anti_plot <- ggplot(data.frame(y = y_anti, x = 1:M), aes(x = x, y = y)) +
+  geom_line() +
+  geom_hline(yintercept = 0.5, color = "red") +
+  scale_x_log10(lim = c(10, M), breaks = c(1e1, 1e2, 1e3, 1e4)) +
+  scale_y_continuous(lim = c(.25, .75)) +
+  ggtheme_tufte()
+mcmc_anti_plot
+```
diff --git a/normal.Rmd b/normal.Rmd
@@ -9,7 +9,7 @@ summam terminorum binomii*.]  With 10 trials and a probability of
 success of 0.9 in each trial, the distribution is clearly asymmetric
 (i.e., skewed).
 
-```{r fig.cap = 'Probability of number of successes in $N = 10$ independent trials, each with a 90% chance of success.  The result is $\\mbox{binomial}(y \\mid 10, 0.9)$ by construction. With only ten trials, the distribution is highly asymmetric, with skew (longer tails) to the left.'}
+```{r fig.cap = 'Probability of number of successes in $N = 10$ independent trials, each with a 90 percent chance of success.  The result is $\\mbox{binomial}(y \\mid 10, 0.9)$ by construction. With only ten trials, the distribution is highly asymmetric, with skew (longer tails) to the left.'}
 
 N <- 10
 x <- 0:10
@@ -27,7 +27,7 @@ binomial_limit_plot
 But when we increase the number of trials by a factor of 100 to
 $N = 1\,000$ without changing the 0.9 probability of success, the result is a nearly symmetric bell shape.
 
-```{r fig.cap = 'Probability of number of successes in $N = 1000$ independent trials, each with a 90% chance of success. The familiar bell-shaped curve arises as $N$ grows.'}
+```{r fig.cap = 'Probability of number of successes in $N = 1000$ independent trials, each with a 90 percent chance of success. The familiar bell-shaped curve arises as $N$ grows.'}
 
 x <- 860:940
 y <- dbinom(x, 1000, 0.9)

diff --git a/predictive-inference.Rmd b/predictive-inference.Rmd
@@ -182,7 +182,7 @@ the next twenty observations and calculate the probability for each
 possible $\tilde{y} \in 0:\tilde{N}$.  Then we'll plot them in a bar
 chart to inspect the distribution.
 
-```{r fig.cap = "Probablity mass function for the posterior predictive distribution of the number of boys $\\tilde{y}$ in a subsequent group of $\\tilde{N} = 20$ births based on observing 3 boys in 10 births with no prior information.  Although it is peaked near the 30% level observed in the data, it would not be that surprising to see more boys than girls in the next 20 births."}
+```{r fig.cap = 'Probablity mass function for the posterior predictive distribution of the number of boys, $\\theta^{*} = \\frac{y}{N} = 0.3$, into the sampling distribution, to yield $\\mbox{binomial}(\\tilde{y} \\mid \\tilde{N}, \\theta^{*})$.  Compared to the full posterior taking into account estimation uncertinaty in $\\theta$, this plug-in estimate makes it seem very unlikely there will be more boys than girls born in the next 20 births.'}
 
 set.seed(1234)
 log_sum_exp <- function(u) max(u) + log(sum(exp(u - max(u))))
@@ -218,7 +218,7 @@ boys.^[Inference with a single point estimate of one or more
 parameters is common in live, real-time applications, where the cost
 of averaging over the posterior may be prohibitive.]
 
-```{r fig.cap = "Probability mass function for predictions made by plugging the proportion of boy births observed in the data, $\\theta^* = \\frac{y}{N} = 0.3$, into the sampling distribution, to yield $\\mbox{binomial}(\\tilde{y} \\mid \\tilde{N}, \\theta^*)$.  Compared to the full posterior taking into account estimation uncertinaty in $\\theta$, this plug-in estimate makes it seem very unlikely there will be more boys than girls born in the next 20 births."}
+```{r fig.cap = 'Probability mass function for predictions made by plugging the proportion of boy births observed in the data  $\\tilde{y}$ in a subsequent group of $\\tilde{N} = 20$ births based on observing 3 boys in 10 births with no prior information.  Although it is peaked near the 30 percent level observed in the data, it would not be that surprising to see more boys than girls in the next 20 births.'}
 
 binom_df <- data.frame(y_pred = 0:20, log_p = dbinom(0:20, 20, 0.3))
 binom_plot <-
@@ -244,7 +244,7 @@ Consider the following plot, in which the same proportion of boys is
 provided (30%), but the sample size continues to grow (quadrupling
 each time).  
 
-```{r out.width = "100%", fig.width = 9, fig.asp=0.4, fig.cap = "Illustration of convergence of posterior to binomial prediction based on proportion of boys as number of observations $y$ and $N$ grows, with proportion $\\frac{y}{N}$ fixed.  The central limit theorem tells us that each quadrupling of the data cuts the uncertainty in parameter estimation in half until all that is left is the sampling uncertainty in the binomial, as represented in the final plot, where $\\theta = 0.3$ is fixed."}
+```{r out.width = "100%", fig.width = 9, fig.asp=0.4, fig.cap = 'Illustration of convergence of posterior to binomial prediction based on proportion of boys as number of observations $y$ and $N$ grows, with proportion $\\frac{y}{N}$ fixed.  The central limit theorem tells us that each quadrupling of the data cuts the uncertainty in parameter estimation in half until all that is left is the sampling uncertainty in the binomial, as represented in the final plot, where $\\theta = 0.3$ is fixed.'}
 
 M <- 1000
 

diff --git a/rejection-sampling.Rmd b/rejection-sampling.Rmd
@@ -219,7 +219,7 @@ vertical position $u^{(m)} \sim \mbox{uniform}(0, 5)$. The points
 whose value for $u$ falls below the density at the value for $\theta$
 are retained.
 
-```{r fig.cap = 'A simple instance of rejection sampling from a bounded $\\mbox{beta}(6.8, 1.2)$ distribution, whose density is shown as a solid line.  Points $(\\theta, u)$ are drawn uniformly from the rectangle, then accepted as a draw of $\\theta$ if $u$ falls below the density at $\\theta$.  The accepted draws are rendered as plus signs and the rejected ones as circles.  The acceptance rate here is roughly 20%.'}
+```{r fig.cap = 'A simple instance of rejection sampling from a bounded $\\mbox{beta}(6.8, 1.2)$ distribution, whose density is shown as a solid line.  Points $(\\theta, u)$ are drawn uniformly from the rectangle, then accepted as a draw of $\\theta$ if $u$ falls below the density at $\\theta$.  The accepted draws are rendered as plus signs and the rejected ones as circles.  The acceptance rate here is roughly 20 percent.'}
 
 M <- 500
 alpha <- 6.8
@@ -294,7 +294,7 @@ while (true)
 Let's run this algorithm for $M = 100\,000$ iterations and see what the
 histogram looks like.
 
-```{r, fig.cap="Histogram of $M = 100\\,000$ Draws from $\\mbox{beta}(6.8, 1.2)$ made via rejection sampling.  The true density is plotted over the histogram as a line.  The acceptance rate for draws was roughly 20%."}
+```{r, fig.cap="Histogram of $M = 100\\,000$ Draws from $\\mbox{beta}(6.8, 1.2)$ made via rejection sampling.  The true density is plotted over the histogram as a line.  The acceptance rate for draws was roughly 20 percent."}
 
 set.seed(1234)
 M <- 100000