# AAI Workshop 3

## EXAMPLE: Sprinkler

Let's consider the scenario in the figure, where the WetGrass can be caused a Sprinkler or the Rain, both of which depend on the Cloudy weather:

![Sprinkler example](sprinkler_dag.jpg)


We want to compute the probability distribution of the wet grass given that is cloudy.
Let's start by exploiting the structure of the Bayesian Network to decompose the query:

$$
\begin{align}
{\bf P}(W | c) &= \alpha {\bf P}(W, c) \nonumber\\
&= \alpha \sum_{R,S} {\bf P}(W, c, R, S) \nonumber\\
&= \alpha \sum_{R,S} {\bf P}(W | R, S) {\bf P}(R | c) {\bf P}(S | c) P(c) \nonumber\\
&= \alpha P(c) \sum_R {\bf P}(R | c) \sum_S {\bf P}(W | R, S)  {\bf P}(S | c)  \nonumber
\end{align}
$$

We have four quantities. The first is the simple probability $P(c)$:

In [1]:
P_c = 0.5
P_W_RS = np.array([[[0.95, 0.90],[0.90, 0.10]],
                   [[0.05, 0.10],[0.10, 0.90]]])

The second is the distribution ${\bf P}(R|c)$:

In [2]:
import numpy as np

# 'true' and 'false' indexes
t, f = 0, 1

P_R_c = np.array([0.8, 0.2])
# this is a 2D vector, the elements of which can be accessed as follows
print('P(r|c) = ', P_R_c[t])
print('P(¬r|c) = ', P_R_c[f])

P(r|c) =  0.8
P(¬r|c) =  0.2


The third is the distribution ${\bf P}(W | R, S)$:

In [3]:
P_W_RS = np.array([[[0.95, 0.90],[0.90, 0.10]],
                   [[0.05, 0.10],[0.10, 0.90]]])
# this is a 2x2x2 matrix, the elements of which can be accessed as follows
print('P(w|¬r,s) = ', P_W_RS[t,f,t])

P(w|¬r,s) =  0.9


The last one is the distribution ${\bf P}(S | c)$:

In [4]:
P_S_c = np.array([0.1, 0.9])

Starting from the right side of the query equation, we can sum out $S$ by computing ${\bf P}(W | R, s) P(s | c) + {\bf P}(W | R, \neg s) P(\neg s | c)$:

In [5]:
Phi_S = P_W_RS[:,:,t] * P_S_c[t] + P_W_RS[:,:,f] * P_S_c[f]
print(Phi_S)

[[0.905 0.18 ]
 [0.095 0.82 ]]


We remain with a 2x2 matrix indexed by $W$ and $R$, which we call $\Phi_S(W, R)$ for short. We can then proceed by summing out $R$ with $P(r | c) \Phi_S(W,r) + P(\neg r | c) \Phi_S(W,\neg r)$:

In [6]:
Phi_R = P_R_c[t] * Phi_S[:,t] + P_R_c[f] * Phi_S[:,f]
print(Phi_R)

[0.76 0.24]


The result is a 2D vector indexed only by $W$, which we call $\Phi_R(W)$. Multiplying the latter for $P(c)$ and normalising, we obtain the final distribution:

In [7]:
P_W_c = P_c * Phi_R
P_W_c = P_W_c / sum(P_W_c)
print('P(W|c) = ', P_W_c)

P(W|c) =  [0.76 0.24]


Notice that the last step does not change the result because the previous sum was already normalised and $P(c)$ is just a constant that could be included in $\alpha$.

---

## EXERCISE: Broad Street cholera outbreak

The following is a simplified version of an example in Judea Pearl's *The Book of Why*. It refers to a case of cholera epidemic, caused by contaminated water, which killed hundreds of people in London between 1853 and 1854. The diagram below illustrates some of the key factors explaining this epidemic, in particular:
- $X$ indicates whether the water company's intake was downstream of the London's sewers;
- $W$ indicates whether the water was contaminated or not;
- $Z$ indicates the presence of other external factors (e.g. poverty, miasma, etc.);
- $Y$ indicates the outbreak of cholera.

(please note the probabilities in the diagram are fake)

![Cholera outbreak](cholera_dag.jpg)

> - Formalise the problem using opportune mathematical notations and derive an expression for computing the probability distribution of the cholera given that the water company's intake is upstream (i.e. what is the query? how can it be decomposed?)
> - Write a Python program that computes the actual probabilities of the above distribution using the information from the given CPTs.

Write a short document (PDF, max 1 page) or Jupyter Notebook file (preferred) describing your solution and send it to **nbellotto@lincoln.ac.uk** with subject *AAI Workshop 3 - NAME SURNAME*. Please submit your work by the <u>3rd Nov 2020</u>. **It will not be graded, but only used by the lecturer to check the progress of the class**.