# Broad street Cholera Outbreak

The following is a simplified version of an example in Judea Pearl's *The Book of Why*. It refers to a case of cholera epidemic, caused by contaminated water, which killed hundreds of people in London between 1853 and 1854. The diagram below illustrates some of the key factors explaining this epidemic, in particular:
- $X$ indicates whether the water company's intake was downstream of the London's sewers;
- $W$ indicates whether the water was contaminated or not;
- $Z$ indicates the presence of other external factors (e.g. poverty, miasma, etc.);
- $Y$ indicates the outbreak of cholera.

(please note the probabilities in the diagram are fake)

![Cholera outbreak](cholera_dag.jpg)

> - Formalise the problem using opportune mathematical notations and derive an expression for computing the probability distribution of the cholera given that the water company's intake is upstream (i.e. what is the query? how can it be decomposed?)
> - Write a Python program that computes the actual probabilities of the above distribution using the information from the given CPTs.

In [56]:
import numpy as np

#true and false indices
t,f = 0,1 

class BroadStreetCholera:
    def __init__(self,p_x,p_z,p_w_xz):
        self.p_x = p_x
        self.p_z = p_z
        self.p_w_xz = p_w_xz
        self.p_y_wz = p_y_wz
        self.calculate_p_w()
        self.calculate_p_y()
    
    def calculate_p_w(self):
        p_w= [0,0]
            
        p_w_xz = self.p_w_xz
        p_x = self.p_x
        p_z = self.p_z
        
        p_w[t] += p_w_xz[t,t,t]*p_x[t]*p_z[t] 
        p_w[t] += p_w_xz[t,f,t]*p_x[f]*p_z[t]
        p_w[t] += p_w_xz[t,t,f]*p_x[t]*p_z[f]
        p_w[t] += p_w_xz[t,f,f]*p_x[f]*p_z[f]

        p_w[f] = 1 - p_w[t]
        
        self.p_w = p_w
    
    def calculate_p_y(self):
        p_y = [0,0]
        
        p_y_wz = self.p_y_wz
        p_w = self.p_w
        p_z = self.p_z
        
        p_y[t] += p_y_wz[t,t,t]*p_w[t]*p_z[t]
        p_y[t] += p_y_wz[t,f,t]*p_w[f]*p_z[t]
        p_y[t] += p_y_wz[t,t,f]*p_w[t]*p_z[f]
        p_y[t] += p_y_wz[t,f,f]*p_w[f]*p_z[f]
        
        p_y[f] = 1 - p_w[t]
        
        self.p_y = p_y

In [57]:
#P(X)
p_x = np.array([0.5,0.5])

#P(Z)
p_z = np.array([0.25,0.75])

#P(W|X,Z)
p_w_xz = np.array([[[0.9,0.85],[0.1,0.02]],
                   [[0.1,0.15],[0.9,0.98]]])

#p(y|W,Z)
p_y_wz = np.array([[[0.8,0.75],[0.15,0.05]],
                   [[0.2,0.25],[0.85,0.95]]])

the probability of cholera given that the intake is upstream, i.e P(x)= [0,1]

$$
\begin{align}
{\bf P}(Y| x) &= \alpha {\bf P}(X, c) \nonumber\\
&= \alpha \sum_{W,Z} {\bf P}(Y, x, W, Z) \nonumber\\
&= \alpha \sum_{x,Z} {\bf P}(W | Z, x) {\bf P}(x) {\bf P}(Z)\nonumber\\
\end{align}
$$


In [58]:
BSC_XTrue = BroadStreetCholera(np.array([0,1]),p_z,p_w_xz)

print("the probability of cholera given X is False is: {}".format(BSC_XTrue.p_y[0]))

the probability of cholera given X is true is: 0.10250000000000001


Probabilities of X Y Z W for the Broad Street Cholera distribution

In [51]:
BSC = BroadStreetCholera(p_x,p_z,p_w_xz)
print("p(X): ",BSC.p_x)
print("p(Y): ",BSC.p_y)
print("p(Z): ",BSC.p_z)
print("p(W): ",BSC.p_w)

p(X):  [0.5 0.5]
p(Y):  [0.385234375, 0.5487500000000001]
p(Z):  [0.25 0.75]
p(W):  [0.45125, 0.5487500000000001]
