# Homework 1

## Directed Graphical Models

In the practice homework, you computed the posterior probabilities of either the cook (C), the butler (B), being a murderer given the choice of weapons (K = knife, P = poison). In this exercise you will construct a Directed Bayesian Graphical Model or belief network for the available evidence. 

It is a cliche that a criminal must have opportunity, $OP$, means, $W$, and a motive, $MO$:   
- Since both the cook and butler were at the house and had access to the victim when the murder was committed, we can say that $P(OP) = 1.0$. Any model can be simplified by eliminating this variable.  
- The Inspector has already established that the means was either a knife, $K$, or poison, $P$.    
- Upon questioning the suspects, Inspector Markov believe that inheriting part of the fortune of the victim is a likely motive. She suspects that the butler may be due an inheritance, but doubts the cook does. 

Further, as Inspector Markov has continued her investigation she has discovered an unexplained set of footprints, evidence that a third person may have been involved in the crime. There is no evidence linking the cook or the butler to any other possible perpetrator, the model can neglect the possibility of collaboration with a third party. In other words, $p(C,B\ |\ third\ party) = 0$, $p(third\ party\ |\ cook = 0)$, and $p(third\ party\ |\ butler = 0)$.  

We will also continue to assume that the butler and the cook did not influence each other to commit the crime. Therefore, $p(B\ |\ C) = p(B)$ and $p(C\ |\ B) = p(C)$.

As a first step in creating the belief network, import the packages you will need for this analysis.

In [None]:
from pgmpy.models import BayesianModel
from pgmpy.factors.discrete import TabularCPD

The joint probability distribution is:

$$p(B,C,W,MO,M)$$   
where the letters indicate the following variables;   
$B = $ unconditional probability that the butler committed the crime,   
$C = $ unconditional probability that the cook committed the crime,   
$W = $ the probability of the weapon, K = knife, P = poison, conditional on B and C.   
$MO = $ the probability of a motive, conditional on C and B.    
$M = $ the probability that the third party, the cook, C, or the butler, B, committed the crime, conditional on B, C, W, and MO.    

Given the independencies, this distribution can be factorized in the following manner:

$$p(B,C,W,MO,M) = p(B)\ p(C)\ p(W\ |\ B, C)\ p(MO\ |\ B,C)\ p(M\ |\ B, C, W, MO)$$

Now you will define the skeleton of the graph. Given the independency relationships of the factorized probability distribution define the skeleton of the model (`m_model`) using the `BayesianModel` function.

>**Hint:** Using paper and pencil make a sketch of the graph before you commit your skeleton structure to code. 

In [None]:
## Define the network structure.


Your next step to create you model is to define the conditional probability tables (CPT) for each independent variable using the `TabularCDP` function. The tables for these variables are:    


$p(B)$   

| Case | p |
|---|---|
|$B_0$ | 0.4 |
|$B_1$ | 0.6 |    

$p(C)$   

| Case | p |
|---|---|
|$C_0$ | 0.7 |
|$C_1$ | 0.3 |


Using the above tables define and print the CPTs. Make sure you use variable names consistent with your model.

In [None]:
## Define the independent variables


Next, define the variables $W$ and $MO$, the conditional probabilities of weapon choice and motive given the butler and the cook. The conditional probability tables for these variables are:

$$p(W)$$

| Case | B0, C0 | B0, C1 | B1, C0 | B1, C1 |
|---|---|---|--|---|
|$W_0$ | 0.1 | 0.5 | 0.4 | 0.7 |
|$W_1$ | 0.9 | 0.5 | 0.6 | 0.3 |

Where $W_0$ is poison and $W_1$ is knife.   

$$p(MO)$$

| Case | B0, C0 | B0, C1 | B1, C0 | B1, C1 |
|---|---|---|--|---|
|$MO_0$ | 1.0 | 0.7 | 0.1 | 0.3 |
|$MO_1$ | 0.0 | 0.3 | 0.9 | 0.7 |

Where $MO_0$ is no motive and $MO_1$ is motive.

Give the above tables define and print these CPTs. 

**Question:** If poison is rulled out, $p(Poison) = 0$, how many possible states would each of these CPTs have?  

ANS: 

Finally, you must define a CPT for the conditional probability of the murderer. The marginal distribution of this CPT will be the probabilities of each of the suspects having committed the crime. The tree cases are coded as follows:

- **M0:** The murder is committed by a third unnamed party, 
- **M1:** the cook is a murderer, and
- **M2:** the butler is a murderer. 

This CPT is conditional on $B$, $C$, $W$, and $MO$. Since there are three possible guilty parties (cardinality of 3) there are 48 possible states; $N_{B} * N_{C} * N_{M} * N_W * N_M = 2 * 2 * 2 * 2* 3 = 48$ as shown here:

| | p | p | p | p| p | p | p | p| p | p | p | p| p | p | p | p|  
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|   
|| $CW_0$ | $CW_0$ | $CW_0$ | $CW_0$| $CW_1$ | $CW_1$ | $CW_1$ | $CW_1$| $CW_0$ | $CW_0$ | $CW_0$ | $CW_0$| $CW_1$ | $CW_1$ | $CW_1$ | $CW_1$|     
|| $BW_0$ | $BW_0$ | $BW_0$ | $BW_0$ | $BW_0$ | $BW_0$ | $BW_0$ | $BW_0$ | $BW_1$ | $BW_1$ | $BW_1$ | $BW_1$ | $BW_1$ | $BW_1$ | $BW_1$ | $BW_1$ |    
|| $W_0$ | $W_0$ | $W_1$ | $W_1$ | $W_0$ | $W_0$ | $W_1$ | $W_1$ | $W_0$ | $W_0$ | $W_1$ | $W_1$ | $W_0$ | $W_0$ | $W_1$ | $W_1$ | $W_0$ | $W_0$ | $W_1$ | $W_1$ |    
|| $MO_0$ | $MO_1$ | $MO_0$ | $MO_1$ | $MO_0$ | $MO_1$ | $MO_0$ | $MO_1$ | $MO_0$ | $MO_1$ | $MO_0$ | $MO_1$ | $MO_0$ | $MO_1$ | $MO_0$ | $MO_1$ | 
|$M_0$| 1.0 | 1.0 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0. | 0.0 | 0.0 | 0.0 | 0.0 |     
|$M_1$| 0.0  | 0.0 | 0.0 | 0.0 | 1.0  | 1.0 | 1.0 | 1.0 | 0.0  | 0.0 | 0.0 | 0.0 | 0.5  | 0.5 | 0.5 | 0.5 |     
|$M_2$| 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 | 1.0 | 1.0 |0.5 | 0.5 | 0.5 | 0.5 |

Where:     
$M_0$ = The third party     
$M_1$ = The cook     
$M_2$ = The butler     

Create and print this CPT. 

**Question:** If $p(Poison) = 0$ how many possible states would there be in this CPT?

ANS:

To complete your belief network, use the `add_cpds` method. 

> **Hint:** Before going any further make sure you apply the `check_model` method to your complete model. 

Next investigate the independencies of all the variables in your model using the `local_independencies` method. Be sure to include all the variables in your list. 

**Question:** Is this graphical model an I-map of the distribution discussed at the start of this lab and why?

ANS: 

**Question:** is the graphical model a perfect map of the distribution, and why? 

ANS:

Next, you will determine which of all possible trails in the graph are active. Create and execute the code using the `is_active_trail` method on the model object. Make sure you account for all possible pairs of variables. 

**Question:** How can you best explain the blocked trails given the independent variables and V-structures in the graph? What are the trials with V-structures that are blocked? **Hint:** Be careful, as there can be several paths between a pair of variables.  

ANS: 

**Question:** Does {W,MO} D-separate {C,B} from {M}, and why? 

ANS:

**Question:**  What is the Markov blanket of the node W and the node MO?

ANS: