lecture12-appendix.Rmd

# Generating Continuous Data

```{r, warning = FALSE, message = FALSE, echo = FALSE}
library(tidyverse)
knitr::opts_chunk$set(echo = FALSE, fig.align = "center")
```


Until now, we've sidestepped the actual procedure for how a random outcome is actually generated. For the discrete case, we could get by with the "drawing from a hat" analogy. But this won't get us far in the continuous case, because each outcome has 0 probability of occuring.

The idea is to convert a random number between 0 and 1 into an outcome. Going back to the discrete case, using the Mario Kart example, we can break the interval [0, 1] into sub-intervals with widths equal to their probabilities. Visually, this might look like the following:

```{r, fig.height = 3, fig.width = 8}
mario <- tibble(
	item = c("Banana", "Bob-omb", "Coin", "Horn", "Shell"),
	prob = c(0.12, 0.05, 0.75, 0.03, 0.05)
) %>% 
	mutate(item = fct_reorder(item, prob)) %>% 
	arrange(desc(item)) %>% 
	mutate(right = cumsum(prob),
		   left  = lag(right) %>% replace_na(0)) %>% 
	gather(key = "position", value = "step", left, right) %>% 
	group_by(item) %>% 
	mutate(middle = mean(step))
ggplot(mario, aes(x = step, y = item, group = item)) +
	geom_line() +
	geom_point() +
	geom_text(aes(label = prob, x = middle), position = position_nudge(y = 0.25)) +
	theme_minimal() +
	labs(x = "Random number", y = "") +
	scale_x_reverse(breaks = seq(0, 1, by = 0.25),
					labels = seq(0, 1, by = 0.25) %>% rev())
```

We can make a similar plot for a Poisson(3) random variable (the y-axis is truncated because we can't plot all infinite outcomes):

```{r, fig.height = 3, fig.width = 8}
ggplot(tibble(x = 0:1), aes(x)) + 
	stat_function(fun = function(x) qpois(x, lambda = 3), n = 1000) +
	theme_minimal() +
	scale_y_continuous("Outcome", breaks = 0:8, limits = c(0, 8)) +
	xlab("Random number")
```

Indeed, this plot is nothing other than the quantile function! This idea extends to all random variables. If we want to generate an observation of a random variable $Y$ with quantile function $Q_Y$, just follow these two steps:

1. Generate a number $U$ completely at random between 0 and 1. 
2. Calculate the observation as $Y = Q_Y(U)$.

For continuous random variables only, the opposite of this result also has important implications: if $Y$ is a continuous random variable with cdf $F_Y$, then $$F_Y(Y) \sim \text{Unif}(0,1).$$ This is important for p-values in hypothesis testing (DSCI 552+), transformations, and copulas (optional question on your lab assignment).