# CBS Week 9 Notebook: Bayesian Networks
## Semester 2, 2022

In [1]:
suppressPackageStartupMessages({
    library(tidyverse)
    library(testthat) 
})

There is no tutorial this week, but we encourage you to work through this notebook to practice what you've learned about Bayesian networks (also known as Bayes nets or directed graphical models). Bayes nets provide a natural way to capture and compute with joint probability distributions. In general, a joint distribution defined over $n$ binary variables requires $2^n -1$ numbers to specify. A Bayes net can allow this distribution to be specified using many fewer numbers. The savings arise because a Bayes net allow a high-dimensional joint distribution to be expressed as a product of lower-dimensional distributions, each of which captures a modular piece of a situation.

Here we'll work with a variant of an example introduced by Judea Pearl. Pearl lives in Los Angeles, and suppose that he's just had a new alarm installed. The alarm reliably detects robbers, but also occasionally malfunctions and sounds for no apparent reason. 

In the tutorial for Week 7 recall that we worked with a joint probability distribution over three variables: Robbery (R), Earthquake (E) and Alarm (A). There we just enumerated the entire joint distribution, but here we formulate a joint distribution over the variables using a Bayes net.

We'll get started by considering just two of the variables: Robbery and Alarm. A Bayes net that captures a joint distribution over these two variables is shown below. On any given day, we've assumed that the probability of a robbery occurring is low (0.05). This number will make things simple but is too high to be realistic --- even in LA burglaries don't occur once per twenty days! 

To adopt the same convention from previous weeks, we'll use 1 for FALSE and 2 (which rhymes with TRUE) for TRUE. So A = 2 means that the alarm sounds and A = 1 means that the alarm does not sound.


<figure>
  <img src="images/alarm_2node.png" alt="alarm_2node" style="width:50%">
  <figcaption  class="figure-caption text-center">Figure 1: Bayes net specifying the relationship between the occurrence of a Robbery and the sounding of the Alarm.</figcaption>
</figure>


Whenever you develop a Bayes net you need to think carefully about the numbers that go into the conditional probability distributions (or CPDs for short). Here we've used a noisy-OR function with a background cause. We've assumed that $w_R = 0.94$, where $w_R$ is the "causal strength" of the relationship between robbery and the alarm sounding. This means that if no other causes of the alarm sounding were present, then the alarm sounds with probability 0.94 if there is a robbery.

We've also assumed that the alarm may sound because of some Background cause other than robbery. The Background cause is always present and has a causal strength of $w_B = 0.01$ (which means that the alarm sounds with probability 0.01 when no causes other than the Background cause are present)

To make the meaning of $w_R$ and $w_B$ clear we can explicitly add a variable $B$ to the network that represents the Background cause. The noisy-OR approach assumes that the two possible causes ($R$ and $B$) act independently of one another. Assuming that causes act independently allows the approach to scale to situations where there are many possible causes.


<figure>
  <img src="images/alarm_2node_background.png" alt="alarm_2node_background" style="width:50%">
  <figcaption  class="figure-caption text-center">
       Figure 2: Figure 1 extended with a catch-all node that represents all Background causes of the alarm sounding.
 </figcaption>
</figure>

### Exercise 1

Figure 2 indicates that $P(A=2|R=2, B=2) = 1 - (1-w_R)(1-w_B)$. Your friend Eustace doesn't understand where this equation comes from -- explain to him why this equation makes sense.


=== BEGIN MARK SCHEME ===

Given that $R$ and $B$ are true, in order for the alarm *not* to sound then both $R$ and $B$ would have to fail to produce $A$. $w_R$ is the causal strength of $R$, which means that the probability that $R$ *fails* to produce $A$ is $(1-w_R)$. Similarly, $(1-w_B)$ is the probability that $B$ *fails* to produce $A$. The noisy-OR approach assumes that $R$ and $B$ act independently, so the probability that both fail to produce $A$ is $(1-w_R)(1-w_B)$. So the probability that $A$ *is* produced (by either $R$ or $B$ or both) is $1 - (1-w_R)(1-w_B)$.

=== END MARK SCHEME ===

 
When $w_B =0.01$ and $w_R = 0.94$, the networks in Figures 1 and 2 are equivalent in the sense that they capture the same joint distribution $P(R,A)$ over the Robbery and Alarm variables.

### Exercise 2

Use Figure 2 to show that $P(A = 2 | R = 2) = 0.9406$ when $w_B = 0.01$ and $w_R = 0.94$.

=== BEGIN MARK SCHEME ===

We'll show every step here, but once you get comfortable working with probability distributions you probably won't need to write down every step -- you'll be able to skip directly to the 6th line of the derivation.

\begin{align}
P(A=2 |R = 2) &= \sum_B P(A = 2, B | R = 2)  & \text{(marginalization)}\\
              &= P(A = 2, B = 1 | R = 2) + P(A = 2, B = 2|R = 2) & \\
              &= P(A = 2 |B = 1, R = 2)P(B=1|R=2) + P(A = 2|B = 2,R = 2)P(B=2|R=2) & (\text{chain rule} \\
              &= P(A = 2 |B = 1, R = 2)P(B=1) + P(A = 2|B = 2,R = 2)P(B=2) & (\text{because B and R are independent}) \\
              &= w_R\times 0 + (1-(1-w_R) \times (1-w_B)) \times 1 & \\
              &= 1 - (0.06 * 0.99)  & \\
              &= 0.9406
\end{align}


=== END MARK SCHEME ===

Thinking about the strengths of all possible causes (including the Background cause) is a good way to figure out what numbers should go into a conditional probability distribution. For example, I came up with the CPD in Figure 1 by first figuring out what values for $w_B$ and $w_R$ might be reasonable and then using these numbers to compute $P(A|R)$.

Now that we've used a Bayes net to specify the joint distribution $P(a,r)$, we can use this distribution to answer queries about the variables in the network. Please skip the following question (Exercise 5) for now, but come back to it after you've finished the rest of the notebook.

### Exercise 5

Given that the alarm sounds ($A = 2$), what is the probability that a robbery has occurred? Answer this question by computing $P(R = 2 | A = 2)$ by hand.

=== BEGIN MARK SCHEME ===

There are different ways to compute this conditional probability. The approach presented in week 7 would involve creating a table that specifies the entire joint distribution $P(R,A)$, then using this table to compute $P(R|A)$. In principle, enumerating the joint distribution will always work, although in practice the table may be so big that it is unwieldy or impossible to write down.

Here we use a more direct approach.

\begin{align}
P(R=2|A=2) &= \frac{ P(A=2|R=2)P(R=2) }{P(A=2)} & \text{(Bayes rule)}\\
           &= \frac{ 0.9406 \times 0.05}{P(A=2)} & \\
           &= \frac{0.04703}{P(A=2)} 
\end{align}
and
\begin{align}
P(A=2) &= \sum_R{ P(A=2,R) } & \text{(marginalization)}\\
       &=  P(A=2,R=1) + P(A=2,R=2) & \\
       &=  P(A=2|R=1)P(R=1) + P(A=2|R=2)P(R=2) & \text{(chain rule)} \\
       &=  0.01 \times (1 - 0.05) + 0.9406 \times 0.05 & \\
       &= 0.05653 & 
\end{align}
       
so
\begin{equation}
P(R=2|A=2) = \frac{0.04703}{0.05653} \approx 0.8319
\end{equation}

=== END MARK SCHEME ===

# Extending the network
Let's extend the network to allow for the fact that the alarm can be triggered by earthquakes. We'll assume that the probability of an earthquake occurring is 0.1. We'll continue to assume that the causes of the Alarm variable combine according to a noisy-OR function, and will assume that the causal strength of the Earthquake cause is $w_E = 0.29$. This means that if no other causes of the alarm sounding were present (ie there is no robbery and the Background cause were disabled) then the alarm sounds with probability 0.29 if there is an earthquake.

As before we include a catch-all Background cause with strength $w_B = 0.01$ that stands for causes other than Robbery and Earthquake.

If desired we can leave the Background cause implicit:

<figure>
  <img src="images/alarm_3node.png" alt="alarm_3node" style="width:50%">
      <figcaption class="figure-caption text-center">
        Figure 3: Bayes net that includes Earthquake along with Robbery as a cause of the Alarm sounding.
      </figcaption>
</figure>

or show it explicitly:

<figure>
  <img src="images/alarm_3node_background.png" alt="alarm_3node_background" style="width:70%">
      <figcaption class="figure-caption text-center">
        Figure 4: Figure 3 extended with a Background node.
      </figcaption>
</figure>

Either way, we've defined the CPD for the Alarm node using a noisy-OR function with a background cause. This function assumes that all of the potential causes of Alarm (here Robbery, Earthquake and the Background) operate independently of each other, and that just one of these causes is enough to activate the alarm. Making this assumption means that the CPD for a node with $n$ parents can be specified using $n$ rather than $2^n - 1$ parameters.

Please look carefully at the CPD for node A in Figure 4, and make sure you understand where all of the entries in the column labeled $P(A=2|R,E,B)$ come from.

###  Exercise 3

Use Figure 4 to show that $P(A=2 | R=2, E=2) = 0.957826$ when $w_B = 0.01$, $w_R = 0.94$ and $w_E = 0.29$.

=== BEGIN MARK SCHEME ===

This time we'll write down a derivation that skips some steps
\begin{align}
P(A=2|R=2,E=2) &= P(A=2|R=2,E=2,B=1)P(B=1) + P(A=2|R=2,E=2,B=2)P(B=2) \\
               &= [1 - (1-w_R)(1-w_E)] \times 0 + [1 - (1-w_R)(1-w_E)(1-w_B)] \times 1 \\
               &= [1 - 0.06 \times 0.71 \times 0.99] \\
               &= 0.957826
\end{align}

=== END MARK SCHEME ===

# Extending the network again

Bayes nets are modular and therefore easy to extend. Suppose that Pearl has two neighbours, Jan and Kim, who keep him informed when the alarm sounds, and also sometimes call for other reasons. Jan calls very reliably when the alarm sounds ($P(J=2|A=2) = 0.9$) and Kim is a bit less reliable  ($P(K=2|A=2) = 0.7$). The situation is captured by the following network.


<figure>
  <img src="images/alarm_5node.png" alt="alarm_5node" style="width:60%">
  <figcaption  class="figure-caption text-center">Figure 5: Alarm network with nodes for two neighbours (Jan and Kim) who often call when the alarm sounds.</figcaption>
</figure>

###  Exercise 4

In general a joint distribution over 5 binary variables may require 31 numbers to specify. How many different numbers did we need in order to define the joint distribution using the network in Figure 5? 

=== BEGIN MARK SCHEME ===

We needed 9 numbers ($P(R=2)$, $P(E=2)$, $w_R$, $w_E$, $w_B$ and two numbers each for Jan and Kim). So the Bayes net let us specify the joint distribution relatively compactly.

=== END MARK SCHEME ===

Now that you've finished everything else, please return to Exercise 5 above and complete that too.