# CBS Week 7 Exercise:  Bayesian Inference


In [None]:
suppressPackageStartupMessages({
    library(tidyverse)
    library(testthat)
    library(knitr)
    library(kableExtra)
    library(IRdisplay)  
})

# a function for displaying tables
show_table <- function(d) {
    kable(d, "html", align="c")  %>% 
        as.character()  %>% 
        display_html()
}

This notebook is designed to give you practice working with joint probability distributions. Once you have a joint distribution over a set of variables, you can make just about any inference you might be interested in, so it's important to get comfortable working with joint distributions.

We're going to follow the tabular approach to inference described in class. Instead of writing code, please solve the problems in this notebook by drawing up tables on paper. The solutions provided will be implemented using code, but what matters for this week is the concepts, not the code, and manipulating tables using code is a little bit cumbersome. *After* you've solved the problems by hand, you could try going back and writing code to replicate what you did on paper. But please solve the problems on paper first!

For each problem, please draw up a table in which the rows represent all possible settings of the variables of interest. The joint distribution over these variables assigns a probability to each row, and can be represented as a column of the table.

The code-based solutions given in this notebook will represent each probability table using dataframe. To work with these dataframes, we're going to use functions from the tidyverse, including `mutate()`, `filter()`, `pull()`, `left_join()` and the pipe operator `%>%`. If you're not familiar with the tidyverse or would like to refresh your memory, please take a look at [R for psychological science]( https://psyr.djnavarro.net/index.html ) by Danielle Navarro. The following sections discuss core elements of the tidyverse:
* https://psyr.djnavarro.net/prelude-to-data.html
* https://psyr.djnavarro.net/describing-data.html
* https://psyr.djnavarro.net/manipulating-data.html


## Probability Theory

We'll get started with an example inspired by the work of Judea Pearl (you'll hear more about him in Week 9). Suppose that you live on the West Coast of the USA. On any given day, your house may or may not be robbed, an earthquake may or may not occur, and your house alarm may or may not sound. We'll use three binary variables `R` (robbery), `E` (earthquake) and `A` (alarm) to keep track of the three possible events.  Let's use 1 for FALSE and 2 (which rhymes with TRUE) for TRUE. So R = 2 means that your house is robbed and R = 1 means that your house is not robbed.

We'll start by setting up a joint distribution over three binary variables. Because there are three binary variables, there are $2^3$ = 8 possible settings of the variables. We'll directly specify a joint probability distribution `P(R,E,A)` over these settings.

In [None]:
d1 <- tibble(R = c(1,1,1,1,2,2,2,2), 
             E = c(1,1,2,2,1,1,2,2), 
             A = c(1,2,1,2,1,2,1,2), 
             p_r_e_a = c(0.84, 0.01, 0.07, 0.03, 0.003, 0.04, 0.001, 0.006) )

show_table(d1)

### Exercise 1 (1 point)
What is the most likely setting of the three variables? And what is the least likely?


YOUR ANSWER HERE


### Exercise 2 (1 point)

To make sense as a probability distribution `d1$p_r_e_a` needs to sum to 1 -- please check that this condition is satisfied. 


In [None]:
# YOUR CODE HERE
stop('No Answer Given!')

Having the joint distribution `d1` allows you to compute distributions over any subset of variables given observations over any other subset of variables. We'll try a few examples. In all cases we'll use a tabular approach to inference: we'll add columns to the table `d1` as needed then extract the values in these columns that correspond to the quantities we're interested.

### Exercise 3 (1 point)
What's the probability that R, E and A all equal 2? We'll use `p_r2_e2_a2` to denote $P(R=2,E=2,A=2)$.

In [None]:
# compute P(R=2,E=2,A=2)
p_r2_e2_a2 <-
# YOUR CODE HERE
stop('No Answer Given!')

In [None]:
# this cell contains some hidden tests! You can leave it empty except for this comment

The handout for Week 7 includes equations for Marginalization and Conditional Probability that apply when there are just two variables `a` and `b`. Let's try out similar ideas for the three variable case. 

### Exercise 4 (1 point)
We'll now compute $P(A=2)$, or the probability that the alarm sounds on any given day. We'll use `p_a2` to denote this probability. 

As a first step towards computing `p_a2`, add a column to `d1` called `p_a` that specifies the marginal distribution $P(a)$ on $A$. On your first pass through this notebook, please do this on paper.

If you're going through the notebook a second time and writing code, the cleanest way to add the `p_a` column is to use `group_by()` then `mutate()`. Fix the `mutate()` statement below so that `p_a` is defined properly (currently it is set to a constant).

In [None]:
d1 <-  d1 %>%
    group_by(A) %>% 
    mutate(p_a = 1)  %>%  # Fix this line so that p_a specifies the marginal distribution P(a)
    ungroup()  %>% 
    arrange(R,E,A)

# YOUR CODE HERE
stop('No Answer Given!')
show_table(d1)

In [None]:
# this cell contains some hidden tests! You can leave it empty except for this comment

We've introduced some redundancy here --- for example, the odd-numbered rows all have $A = 1$ which means that `p_a` is identical for all four rows. 

Note that column `p_a` does NOT specify a probability distribution over the 8 rows of the table -- for a start, this column does not sum to 1.


### Exercise 5 (1 point)
Now use the new column `p_a` to identify the value of $P(A=2)$:

In [None]:
p_a2 <- 
# YOUR CODE HERE
stop('No Answer Given!')

In [None]:
# this cell contains some hidden tests! You can leave it empty except for this comment

### Exercise 6 (1 point)
Let's now compute $P(R=2,E=2)$, or the probability that R and E both equal 2.

As a first step, add a column to `d1` called `p_r_e` that specifies the marginal distribution $P(r,e)$ on $R$ and $E$. 

In [None]:
d1 <-
# YOUR CODE HERE
stop('No Answer Given!')
show_table(d1)

In [None]:
# this cell contains some hidden tests! You can leave it empty except for this comment

### Exercise 7 (1 point)
Now use the new column `p_r_e` to identify the value of $P(R=2, E=2)$:

In [None]:
p_r2_e2 <-
# YOUR CODE HERE
stop('No Answer Given!')

In [None]:
# this cell contains some hidden tests! You can leave it empty except for this comment

### Exercise 8 (1 point)
Now we'll compute $P(A=2|R=2,E=2)$, or the probability that A=2 given that R and E both equal 2. We'll use `p_a2_given_r2_e2` as the variable name for this conditional probability.

As a first step, add a column to `d1` called `p_a_given_r_e` that captures the conditional distribution $P(a|r,e)$.

In [None]:
d1 <- 
# YOUR CODE HERE
stop('No Answer Given!')
show_table(d1)

In [None]:
# this cell contains some hidden tests! You can leave it empty except for this comment

### Exercise 9 (1 point)
Now use the new column `p_a_given_r_e` to identify the value of $P(A=2 | R=2, E=2)$:

In [None]:
p_a2_given_r2_e2 <- 
# YOUR CODE HERE
stop('No Answer Given!')

In [None]:
# this cell contains some hidden tests! You can leave it empty except for this comment

### Exercise 10 (1 point)
Now compute $P(A=2|R=2)$, or the probability that A equals 2 given that R equals 2. We'll use `p_a2_given_r2` as the name for this conditional probability.

As a first step, add columns to `d1` called `p_r_a`, `p_r` and `p_a_given_r` that capture $P(R,A)$, $P(R)$ and $P(A|R)$ respectively.

In [None]:
d1 <- 
# YOUR CODE HERE
stop('No Answer Given!')
show_table(d1)

In [None]:
# this cell contains some hidden tests! You can leave it empty except for this comment

### Exercise 11 (1 point)
Now use the new column `p_a_given_r` to identify the value of $P(A=2 | R=2)$:

In [None]:
p_a2_given_r2 <- 
# YOUR CODE HERE
stop('No Answer Given!')

In [None]:
# this cell contains some hidden tests! You can leave it empty except for this comment

Now let's think about another distribution on three binary variables X, Y and Z. This time we're not explicitly given the joint distribution `P(x,y,z)` -- instead we are given the distributions `P(x)`, `P(y|x)`, and `P(z|x,y)`.

Here's the distribution `P(x)`:


In [None]:
xtab <- tibble(X = c(1,2), p_x = c(0.2,0.8) )  
show_table(xtab)

Here's the conditional probability distribution `P(y|x)`:

In [None]:
ytab <- tibble(X = c(1,1,2,2), Y = c(1,2,1,2), p_y_given_x = c(0.9,0.1,0.1, 0.9) )
show_table(ytab)

And here's the conditional probability distribution `P(z|x,y)`:

In [None]:
ztab <- tibble(X = c(1,1,1,1,2,2,2,2), Y = c(1,1,2,2,1,1,2,2), Z = c(1,2,1,2,1,2,1,2), p_z_given_x_y = c(1,0,0.5,0.5,0.5,0.5,0,1) )
show_table(ztab)

We can combine these three elements into a larger table `d2` as follows.

In [None]:
d2 <- xtab  %>% 
    left_join(ytab, by = c("X"))  %>% 
    left_join(ztab, by = c("X", "Y"))  %>% 
    relocate(X,Y,Z)
show_table(d2)

Let's pick just a single row in the table -- the third row. The entries in this row tell us  that $P(X = 1) = 0.2$, that  $P(Y = 2 | X = 1) = 0.1$, and that  $P(Z = 1|X = 1, Y = 2) = 0.5$.  

The handout for Week 7 (available on Canvas) includes an equation for the Chain Rule that covers the two variable case. We can use a similar idea here to compute the joint distribution over the three variables `x`, `y` and `z`.

### Exercise 12 (1 point)

Add a column `p_x_y_z` to `d2` that specifies a joint distribution over the 8 possible settings of the three variables.

In [None]:
d2 <- d2  %>% 
# YOUR CODE HERE
stop('No Answer Given!')
show_table(d2)

In [None]:
# this cell contains one visible test in addition to some hidden tests -- you don't need to edit it
expect_equal(sum(d2$p_x_y_z), 1)

Now that we've computed the joint distribution, we can use it to compute distributions over any subset of variables given observations over any other subset of variables. We'll do just one example.

### Exercise 13 (1 point)

Let's compute $P(X=2|Z=2)$, or the probability that $X = 2$ given that $Z = 2$. Following the same tabular approach used earlier, we'll start by adding columns  `p_x_z`, `p_z`, and `p_x_given_z` to `d2` that capture the marginal distributions  $P(X,Z)$ and $P(Z)$ and and the conditional distribution $P(X|Z)$.

In [None]:
d2 <- 
# YOUR CODE HERE
stop('No Answer Given!')
show_table(d2)

In [None]:
# this cell contains some hidden tests! You can leave it empty except for this comment

### Exercise 14 (1 point)
Now use the new column `p_x_given_z` to pull out the value of $P(X=2 | Z=2)$:

In [None]:
p_x2_given_z2 <- 
# YOUR CODE HERE
stop('No Answer Given!')

In [None]:
# this cell contains some hidden tests! You can leave it empty except for this comment

Good work -- you've finished everything! If this is your first time through the notebook, you could now return to the top and try to write code to replicate the solutions you figured out by hand.