/
lecture12-appendix.Rmd
50 lines (40 loc) · 2.46 KB
/
lecture12-appendix.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
# Generating Continuous Data
```{r, warning = FALSE, message = FALSE, echo = FALSE}
library(tidyverse)
knitr::opts_chunk$set(echo = FALSE, fig.align = "center")
```
Until now, we've sidestepped the actual procedure for how a random outcome is actually generated. For the discrete case, we could get by with the "drawing from a hat" analogy. But this won't get us far in the continuous case, because each outcome has 0 probability of occuring.
The idea is to convert a random number between 0 and 1 into an outcome. Going back to the discrete case, using the Mario Kart example, we can break the interval [0, 1] into sub-intervals with widths equal to their probabilities. Visually, this might look like the following:
```{r, fig.height = 3, fig.width = 8}
mario <- tibble(
item = c("Banana", "Bob-omb", "Coin", "Horn", "Shell"),
prob = c(0.12, 0.05, 0.75, 0.03, 0.05)
) %>%
mutate(item = fct_reorder(item, prob)) %>%
arrange(desc(item)) %>%
mutate(right = cumsum(prob),
left = lag(right) %>% replace_na(0)) %>%
gather(key = "position", value = "step", left, right) %>%
group_by(item) %>%
mutate(middle = mean(step))
ggplot(mario, aes(x = step, y = item, group = item)) +
geom_line() +
geom_point() +
geom_text(aes(label = prob, x = middle), position = position_nudge(y = 0.25)) +
theme_minimal() +
labs(x = "Random number", y = "") +
scale_x_reverse(breaks = seq(0, 1, by = 0.25),
labels = seq(0, 1, by = 0.25) %>% rev())
```
We can make a similar plot for a Poisson(3) random variable (the y-axis is truncated because we can't plot all infinite outcomes):
```{r, fig.height = 3, fig.width = 8}
ggplot(tibble(x = 0:1), aes(x)) +
stat_function(fun = function(x) qpois(x, lambda = 3), n = 1000) +
theme_minimal() +
scale_y_continuous("Outcome", breaks = 0:8, limits = c(0, 8)) +
xlab("Random number")
```
Indeed, this plot is nothing other than the quantile function! This idea extends to all random variables. If we want to generate an observation of a random variable $Y$ with quantile function $Q_Y$, just follow these two steps:
1. Generate a number $U$ completely at random between 0 and 1.
2. Calculate the observation as $Y = Q_Y(U)$.
For continuous random variables only, the opposite of this result also has important implications: if $Y$ is a continuous random variable with cdf $F_Y$, then $$F_Y(Y) \sim \text{Unif}(0,1).$$ This is important for p-values in hypothesis testing (DSCI 552+), transformations, and copulas (optional question on your lab assignment).