# CBS Week 7 Exercise:  Bayesian Inference


In [8]:
suppressPackageStartupMessages({
    library(tidyverse)
    library(testthat)
    library(knitr)
    library(kableExtra)
    library(IRdisplay)  
})

# a function for displaying tables
show_table <- function(d) {
    kable(d, "html", align="c")  %>% 
        as.character()  %>% 
        display_html()
}

This notebook is designed to give you practice working with joint probability distributions. Once you have a joint distribution over a set of variables, you can make just about any inference you might be interested in, so it's important to get comfortable working with joint distributions.

We're going to follow the tabular approach to inference described in class. For each problem, we'll create a dataframe in which the rows represent all possible settings of the variables of interest. The joint distribution over these variables assigns a probability to each row, and can be represented as a column of the dataframe.


## Probability Theory

We'll get started with an example inspired by the work of Judea Pearl (you'll hear more about him in Week 9). Suppose that you live on the West Coast of the USA. On any given day, your house may or may not be robbed, an earthquake may or may not occur, and your house alarm may or may not sound. We'll use three binary variables `R` (robbery), `E` (earthquake) and `A` (alarm) to keep track of the three possible events.  Let's use 1 for FALSE and 2 (which rhymes with TRUE) for TRUE. So R = 2 means that your house is robbed and R = 1 means that your house is not robbed.

We'll start by setting up a joint distribution over three binary variables. Because there are three binary variables, there are $2^3$ = 8 possible settings of the variables. We'll directly specify a joint probability distribution `P(R,E,A)` over these settings.

In [9]:
d1 <- tibble(R = c(1,1,1,1,2,2,2,2), 
             E = c(1,1,2,2,1,1,2,2), 
             A = c(1,2,1,2,1,2,1,2), 
             p_r_e_a = c(0.84, 0.01, 0.07, 0.03, 0.003, 0.04, 0.001, 0.006) )

show_table(d1)

R,E,A,p_r_e_a
1,1,1,0.84
1,1,2,0.01
1,2,1,0.07
1,2,2,0.03
2,1,1,0.003
2,1,2,0.04
2,2,1,0.001
2,2,2,0.006


### Exercise 1 (1 point)
What is the most likely setting of the three variables? And what is the least likely?

=== BEGIN MARK SCHEME === 

The most likely possibility is that all variables take value 1 (ie no robbery occurs, there is no earthquake and your alarm does not go off).

The least likely possibility is that a robbery and earthquake both occur and your alarm does not go off.

=== END MARK SCHEME ===


### Exercise 2 (1 point)

To make sense as a probability distribution `d1$p_r_e_a` needs to sum to 1 -- please check that this condition is satisfied. Provide your answer by writing some code in the next cell.

=== BEGIN MARK SCHEME === 

Full credit for any computation that demonstrates that d1$p_r_e_a sums to 1. E.g

`print(sum(d1$p_r_e_a))`

=== END MARK SCHEME ===

Having the joint distribution `d1` allows you to compute distributions over any subset of variables given observations over any other subset of variables. We'll try a few examples. In all cases we'll use a tabular approach to inference: we'll add columns to the dataframe `d1` as needed then extract the values in these columns that correspond to the quantities we're interested.

### Exercise 3 (1 point)
What's the probability that R, E and A all equal 2? We'll use `p_r2_e2_a2` to denote $P(R=2,E=2,A=2)$.

In [10]:
# compute P(R=2,E=2,A=2)
p_r2_e2_a2 <-
### BEGIN SOLUTION
    d1  %>% 
    filter(R==2, E==2, A==2)  %>% 
    pull(p_r_e_a)
### END SOLUTION

In [5]:
# this cell contains some hidden tests! You can leave it empty except for this comment
### BEGIN HIDDEN TESTS
expect_equal(p_r2_e2_a2, 0.006)
### END HIDDEN TESTS

The handout for Week 7 includes equations for Marginalization and Conditional Probability that apply when there are just two variables `a` and `b`. Let's try out similar ideas for the three variable case. 


### Exercise 4 (1 point)
We'll now compute $P(A=2)$, or the probability that the alarm sounds on any given day. We'll use `p_a2` to denote this probability. 

As a first step towards computing `p_a2`, add a column to `d1` called `p_a` that specifies the marginal distribution $P(a)$ on $A$.

We've introduced some redundancy here --- for example, the odd-numbered rows all have $A = 1$ which means that `p_a` is identical for all four rows. 

Note that column `p_a` does NOT specify a probability distributions over the 8 rows of the table -- for a start, this column does not sum to 1.


### Exercise 5 (1 point)
Now use the new column `p_a` to pull out the value of $P(A=2)$:

In [11]:
p_a2 <- 
###  BEGIN SOLUTION 
    d1  %>% 
    filter(A == 2)  %>% 
    pull(p_a) %>% 
    first()
### END SOLUTION 

ERROR: Error: object 'p_a' not found


In [None]:
# this cell contains some hidden tests! You can leave it empty except for this comment
### BEGIN HIDDEN TESTS
expect_equal(p_a2, 0.086)
### END HIDDEN TESTS

### Exercise 6 (1 point)
Let's now compute $P(R=2,E=2)$, or the probability that R and E both equal 2.

As a first step, add a column to `d1` called `p_r_e` that specifies the marginal distribution $P(r,e)$ on $R$ and $E$. 

In [None]:
### BEGIN HIDDEN TESTS
d1_sorted <- d1  %>% arrange(R,E,A)
expect_equal(d1_sorted$p_r_e, c(0.85, 0.85, 0.1, 0.1, 0.043, 0.043, 0.007, 0.007))
### END HIDDEN TESTS

### Exercise 7 (1 point)
Now use the new column `p_r_e` to pull out the value of $P(R=2, E=2)$:

In [None]:
p_r2_e2 <-
### BEGIN SOLUTION
    d1  %>% 
    filter(R == 2, E==2)  %>% 
    pull(p_r_e) %>% 
    first()
### END SOLUTION

In [None]:
# this cell contains some hidden tests! You can leave it empty except for this comment
### BEGIN HIDDEN TESTS
expect_equal(p_r2_e2, 0.007)
### END HIDDEN TESTS

### Exercise 8 (1 point)
Now we'll compute $P(A=2|R=2,E=2)$, or the probability that A=2 given that R and E both equal 2. We'll use `p_a2_given_r2_e2` as the variable name for this conditional probability.

As a first step, add a column to `d1` called `p_a_given_r_e` that captures the conditional distribution $P(a|r,e)$.

In [None]:
d1 <- 
### BEGIN SOLUTION
    d1  %>% 
    mutate(p_a_given_r_e = p_r_e_a / p_r_e) # add a column for the conditional distribution P(a|r,e)
### END SOLUTION
show_table(d1)

In [None]:
### BEGIN HIDDEN TESTS
d1_sorted <- d1  %>% arrange(R,E,A)
expect_equal(d1_sorted$p_a_given_r_e, c(0.988, 0.011, 0.7, 0.3, 0.070, 0.930, 0.143, 0.857), tolerance = 0.001)
### END HIDDEN TESTS

### Exercise 9 (1 point)
Now use the new column `p_a_given_r_e` to pull out the value of $P(A=2 | R=2, E=2)$:

In [None]:
p_a2_given_r2_e2 <- 
### BEGIN SOLUTION
    d1  %>% 
    filter(R == 2, E==2, A == 2)  %>% 
    pull(p_a_given_r_e) %>% 
    first()
### END SOLUTION

In [None]:
# this cell contains some hidden tests! You can leave it empty except for this comment
### BEGIN HIDDEN TESTS
expect_equal(p_a2_given_r2_e2, 0.857, tolerance = 0.001)
### END HIDDEN TESTS

### Exercise 10 (1 point)
Now compute $P(A=2|R=2)$, or the probability that A equals 2 given that R equals 2. We'll use `p_a2_given_r2` as the name for this conditional probability.

As a first step, add columns to `d1` called `p_r_a`, `p_r` and `p_a_given_r` that capture $P(R,A)$, $P(R)$ and $P(A|R)$ respectively.

In [None]:
d1 <- 
### BEGIN SOLUTION
    d1  %>% 
    group_by(R, A)  %>% 
    mutate(p_r_a = sum(p_r_e_a))  %>%  # add a column for the marginal distribution P(r,a)
    ungroup()  %>% 
    group_by(R)  %>% 
    mutate(p_r = sum(p_r_e_a))  %>%  # add a column for the marginal distribution P(r)
    ungroup()  %>% 
    mutate(p_a_given_r = p_r_a / p_r)
### END SOLUTION
show_table(d1)

In [None]:
### BEGIN HIDDEN TESTS
d1_sorted <- d1  %>% arrange(R,E,A)
expect_equal(d1_sorted$p_r_a, c(0.91, 0.04, 0.91, 0.04, 0.004, 0.046, 0.004, 0.046))
expect_equal(d1_sorted$p_r, c(0.95, 0.95, 0.95, 0.95, 0.05, 0.05, 0.05, 0.05))
expect_equal(d1_sorted$p_a_given_r, c(0.958, 0.042, 0.958, 0.042, 0.08, 0.92, 0.08, 0.92), tolerance = 0.001)
### END HIDDEN TESTS

### Exercise 11 (1 point)
Now use the new column `p_a_given_r` to pull out the value of $P(A=2 | R=2)$:

In [None]:
# this cell contains some hidden tests! You can leave it empty except for this comment
### BEGIN HIDDEN TESTS
expect_equal(p_a2_given_r2, 0.92)
### END HIDDEN TESTS

Now let's think about another distribution on three binary variables X, Y and Z. This time we're not explicitly given the joint distribution `P(x,y,z)` -- instead we are given the distributions `P(x)`, `P(y|x)`, and `P(z|x,y)`.

Here's the distribution `P(x)`:


In [None]:
xtab <- tibble(X = c(1,2), p_x = c(0.2,0.8) )  
show_table(xtab)

Here's the conditional probability distribution `P(y|x)`:

In [None]:
ytab <- tibble(X = c(1,1,2,2), Y = c(1,2,1,2), p_y_given_x = c(0.9,0.1,0.1, 0.9) )
show_table(ytab)

And here's the conditional probability distribution `P(z|x,y)`:

In [None]:
ztab <- tibble(X = c(1,1,1,1,2,2,2,2), Y = c(1,1,2,2,1,1,2,2), Z = c(1,2,1,2,1,2,1,2), p_z_given_x_y = c(1,0,0.5,0.5,0.5,0.5,0,1) )
show_table(ztab)

We can combine these three elements into a larger table `d2` as follows.

In [None]:
d2 <- xtab  %>% 
    left_join(ytab, by = c("X"))  %>% 
    left_join(ztab, by = c("X", "Y"))  %>% 
    relocate(X,Y,Z)
show_table(d2)

Let's pick just a single row in the table -- the third row. The entries in this row tell us  that $P(X = 1) = 0.2$, that  $P(Y = 2 | X = 1) = 0.1$, and that  $P(Z = 1|X = 1, Y = 2) = 0.5$.  

The handout for Week 7 (available on Canvas) includes an equation for the Chain Rule that covers the two variable case. We can use a similar idea here to compute the joint distribution over the three variables `x`, `y` and `z`.

### Exercise 12 (1 point)

Add a column `p_x_y_z` to `d2` that specifies a joint distribution over the 8 possible settings of the three variables.

In [None]:
d2 <- d2  %>% 
### BEGIN SOLUTION
    mutate(p_x_y_z = p_x * p_y_given_x * p_z_given_x_y)
### END SOLUTION
show_table(d2)

In [None]:
expect_equal(sum(d2$p_x_y_z), 1)
### BEGIN HIDDEN TESTS
d2_sorted <- d2  %>% arrange(X,Y,Z)
expect_equal(d2_sorted$p_x_y_z, c(0.18, 0, 0.01, 0.01, 0.04, 0.04, 0, 0.72))
### END HIDDEN TESTS

Now that we've computed the joint distribution, we can use it to compute distributions over any subset of variables given observations over any other subset of variables. We'll do just one example.

### Exercise 13 (1 point)

Let's compute $P(X=2|Z=2)$, or the probability that $X = 2$ given that $Z = 2$. Following the same tabular approach used earlier, we'll start by adding columns  `p_x_z`, `p_z`, and `p_x_given_z` to `d2` that capture the marginal distributions  $P(X,Z)$ and $P(Z)$ and and the conditional distribution $P(X|Z)$.

In [None]:
d2 <- 
### BEGIN SOLUTION
    d2  %>% 
    group_by(X,Z)  %>% 
    mutate(p_x_z = sum(p_x_y_z))  %>% 
    ungroup()  %>% 
    group_by(Z)  %>% 
    mutate(p_z = sum(p_x_y_z)) %>% 
    ungroup()  %>% 
    mutate(p_x_given_z = p_x_z / p_z)
### END SOLUTION
show_table(d2)

In [None]:
### BEGIN HIDDEN TESTS
d2_sorted <- d2  %>% arrange(X,Y,Z)
expect_equal(d2_sorted$p_z, c(0.23, 0.77, 0.23, 0.77, 0.23, 0.77, 0.23, 0.77))
expect_equal(d2_sorted$p_x_z, c(0.19, 0.01, 0.19, 0.01, 0.04, 0.76, 0.04, 0.76))
expect_equal(d2_sorted$p_x_given_z, c(0.826, 0.013, 0.826, 0.013, 0.174, 0.987, 0.174, 0.987), tolerance = 0.001)
### END HIDDEN TESTS

### Exercise 14 (1 point)
Now use the new column `p_x_given_z` to pull out the value of $P(X=2 | Z=2)$:

In [None]:
p_x2_given_z2 <- 
### BEGIN SOLUTION
    d2  %>% 
    filter(X==2, Z==2)  %>% 
    pull(p_x_given_z)  %>% 
    first()
### END SOLUTION

In [None]:
# this cell contains some hidden tests! You can leave it empty except for this comment
### BEGIN HIDDEN TESTS
expect_equal(p_x2_given_z2, 0.987013, tolerance=1e-4)
### END HIDDEN TESTS