# Homework 2

## Directed Graphical Models

In the previous homework, you computed the posterior probabilities of either the cook (C), the butler (B), or both, being a murdered given the choice of weapons (K = knife, P = poison). In this exercise you will construct a Directed Bayesian Graphical Model or belief network for the available evidence. 

As Inspector Markov has continued her investigation she has discovered an unexplained set of footprints, evidence that a third person may have been involved in the crime. Given that there is no evidence of a break in, she realizes that if a third person was involved they must have had assistance from either the cook, the butler or both. In other words, the cook and the butler may be guilty even if they did not commit the actual killing with the knife. 

The inspector also learns that further tests on the body have confirmed Dr Turing's conclusion that the cause of death was only a knife. 

Given this evidence, Inspector Markov must update her beliefs. 

As a first step in creating the belief network, import the packages you will need for this analysis.

In [1]:
from pgmpy.models import BayesianModel
from pgmpy.factors.discrete import TabularCPD

The joint probability distribution is:

$$p(B,C,W,BW,CW,M)$$   
where the letters indicate the following variables;   
$B = $ butler committed the crime,   
$C = $ cook committed the crime,   
$W = $ weapon K = knife, P = poison,   
$BW = $ butler committed the crime with a particular weapon conditional on butler and weapon,   
$CW = $ cook committed the crime with a particular weapon conditional on cook and weapon,   
$M = $ butler, cook, or third party, and combination of 2 or all three committed the crime, conditional on BK, CK.    

Notice that because of the evidence provided by Dr Turing we can neglect and conditional distribution where the weapon was poison. Also, it is possible the cook and butler are guilty without having actually used the knife; $p(BW_0,CW_0) \ne 0$.

Given the independencies, this distribution can be factorized in the following manner:

$$p(B,C,W,BW,CW,M) = p(B)\ p(C)\ p(W)\ p(BW\ |\ B,K)\ p(CW\ |\ C,K)\ p(M\ |\ BW,CW)$$

Now you will need to define the skeleton of the graph. Given the independency relationships of the factorized probability distribution define the skeleton of the model (`m_model`) using the `BayesianModel` function.

>**Hint:** Using paper and pencil make a sketch of the graph before you commit your skeleton structure to code. 

Your next step to create you model is to define the CDP for each independent variable using the `TabularCDP` function. The tables for these variables are:    


$p(B)$   

| Case | p |
|---|---|
|$B_0$ | 0.4 |
|$B_1$ | 0.6 |    

$p(C)$   

| Case | p |
|---|---|
|$C_0$ | 0.7 |
|$C_1$ | 0.3 |

$p(W)$   

| Case | p |
|---|---|
|$W_0$ | 1.0 |

Notice that since the Inspector is sure the weapon was a knife, the cardinality of $W = 1$, $p(K) = 1.0$. This fact reduces the cardinality of other variables as you will see. 

Using the above tables define the CDPs. Make sure you use variable names consistent with your model.

Next, define the variables $BK$ and $CK$, the conditional probabilities that the butler or the cook are guilty, given the murder weapon. You need not consider cases of poison, $P$, as the probabilities are zero, reducing the number of states with non-zero probabilities. Thus, the probability tables for these variables are:

$$p(BW)$$

| | p | p |
|---|---|
| | $B_0$ | $B_1$|
| | $W_0$ | $W_0$ |
|$BW_0$ | 1.0 | 1.0 |

$$p(CW)$$

| | p | p |
|---|---|
| | $C_0$ | $C_1$|
| | $W_0$ | $W_0$ |
|$CW_0$ | 1.0 | 1.0 |

There are two odd aspects of these tables. First, convention is broken by having the positive case of a knife labeled as 0. Second, probabilities are all 1.0 since a knife was used and this fact is independent of the perpetrator. 

Give the above tables define the CDPs. 

**Question:** If $p(Poison) \ne 0$ how many possible states would each of these CDPs have?  

ANS: 

Finally, you must define the probability of the murder which are coded:

- **M0:** Named party (cook or butler or both) alone with no third unnamed party, 
- **M1:** Only the third unnamed party alone (not cook and not butler), 
- **M2:** Named party (butler or cook or both) and unnamed party together. 

This CDP is conditional on both $BK$ and $CK$. Since there are three possible guilty parties here are 12 possible states; $N_{BK} * N_{CK} * N_{M} = 2 * 2 * 3 = 12$ as shown here:

| | p | p | p | p|
|---|---|---|---|
|| $CW_0$ | $CW_0$ | $CW_1$ | $CW_1$|
|| $BW_0$ | $BW_1$ | $BW_0$ | $BW_1$ |
|$M_0$| 0.0 | 0.7 | 0.1 | 0.3 |
|$M_1$| 1.0  | 0.1 | 0.7 | 0.3 |
|$M_2$| 0.0 | 0.2 | 0.2 | 0.4 |

Where:
$CW_0$ = not cook 
$BW_0$ = not butler
$CW_1$ = cook with weapon
$BW_1$ = Butler with weapon

**Question:** There are 12 possible states of the variable $M$. If $p(Poison) \ne 0$ how many possible states would there be?

ANS: 

To complete your belief network, use the `add_cpds` method. 

> **Hint:** Before going any further make sure you apply the `check_model` method to your complete model. 

Next investigate the independencies of all the variables in your model using the `local_independencies` method. 

**Question:** Is this graphical model an I-map of the distribution discussed at the start of this lab and why?

ANS: 

The the trails that are not active from the following pairs of variables:

- B and C
- B and W
- C and W
- C and CW
- B and BW
- BW and CW
- B and M
- C and M

Create and execute the code using the `is_active_trail` method on the model object. 

**Question:** How can you best explain the blocked trails given the evidence variables and V-structures in the graph? What are the trials with V-structures which are blocked? 

ANS: