# Bayesian Models
We are now going to dig further into a specific type of **Probabilistic Graphical Model**, specifically **Bayesian Networks**. We will discuss the following:
1. What are Bayesian Models
2. Independencies in Bayesian Networks
3. How is Bayesian Model encoding the Joint Distribution
4. How we do inference from Bayesian models

---

## 1. What are Bayesian Models? 
A Bayesian Network is a probabilistic graphical model (a type of statistical model) that represents a set of **random variables** and their **conditional dependencies** via a **directed acyclic graph** (DAG). Bayesian networks are often used when we want to represent *causal relationships* between the random variables. They are parameterized by using **Conditional Probability Distributions** (CPD). Each node in the network is parameterized using:

#### $$P(node|Pa(node))$$
Where $Pa(node)$ represents the parents of the nodes in the network. We can dig into this further by looking at the following student model:

<img src="images/student_full_param.png">

If we the use the library **pgmpy**, then we create the above model as follows:
> 1. Define network structure (or learn it from data)
2. Define CPD's between nodes (random variables)
3. Associated CPD's with structure

We can see this implemented below.

### 1.1 Implementation

In [6]:
# Imports needed from pgmpy
from pgmpy.models import BayesianModel
from pgmpy.factors.discrete import TabularCPD

### 1.1.1 Set the Structure
So, with our imports taken care of, we start by defining the model structure. We are able to define this by passing in a list of edges. Note, these edges are *directional*; for example, we have the tuple `(D, G)`, which means that `difficulty` influences `grade`. 

In [55]:
student_model = BayesianModel([('difficulty', 'grade'), 
                       ('intelligence', 'grade'), 
                       ('grade', 'letter'), 
                       ('intelligence', 'sat')])

### 1.1.2 Setup the relationships (CPDs)
We then want to set up our relationshisp in the form of CPD's. A few things to note:
> 1. `variable_card`: this is meant ot represent the number of discrete possibilities that the random variable can take on.
2. `evidence`: this is referring to the parent of the random variable, i.e. $Pa(node)$.


In [56]:
difficulty_cpd = TabularCPD(variable='difficulty',
                       variable_card=2,
                       values=[[0.6, 0.4]])

In [57]:
intelligence_cpd = TabularCPD(variable='intelligence',
                              variable_card=2,
                              values=[[0.7, 0.3]])

In [58]:
grade_cpd = TabularCPD(variable='grade', 
                       variable_card=3, 
                       values=[[0.3, 0.05, 0.9,  0.5],
                               [0.4, 0.25, 0.08, 0.3],
                               [0.3, 0.7,  0.02, 0.2]],
                      evidence=['intelligence', 'difficulty'],
                      evidence_card=[2, 2])

In [59]:
letter_cpd = TabularCPD(variable='letter', variable_card=2, 
                   values=[[0.1, 0.4, 0.99],
                           [0.9, 0.6, 0.01]],
                   evidence=['grade'],
                   evidence_card=[3])

In [60]:
sat_cpd = TabularCPD(variable='sat', variable_card=2,
                   values=[[0.95, 0.2],
                           [0.05, 0.8]],
                   evidence=['intelligence'],
                   evidence_card=[2])

### 1.1.3 Add the relationships (CPDs) to the Model
The next step is to actually add our CPD's to our model. The way in whcih PGMPY specifies models is highly modular, which is great because it allows us to add and take away different CPD's very easily. 

In [61]:
student_model.add_cpds(difficulty_cpd, intelligence_cpd, grade_cpd, letter_cpd, sat_cpd)

At this point we can actually check our model for the network structure and CPDs and verifies that the CPDs are correctly defined and sum to 1.

In [62]:
student_model.check_model()

True

### 1.1.4 Examine the Structure of the Graph
We can see our model with the respective CPD's incorporated:

In [63]:
student_model.get_cpds()

[<TabularCPD representing P(difficulty:2) at 0x11352e198>,
 <TabularCPD representing P(intelligence:2) at 0x11352e128>,
 <TabularCPD representing P(grade:3 | intelligence:2, difficulty:2) at 0x1134fa7f0>,
 <TabularCPD representing P(letter:2 | grade:3) at 0x1134fcbe0>,
 <TabularCPD representing P(sat:2 | intelligence:2) at 0x1134fc710>]

And we can examine specific nodes to ensure that the corresponding distributions are correct. 

In [64]:
print(student_model.get_cpds('difficulty'))

╒══════════════╤═════╕
│ difficulty_0 │ 0.6 │
├──────────────┼─────┤
│ difficulty_1 │ 0.4 │
╘══════════════╧═════╛


In [65]:
print(student_model.get_cpds('intelligence'))

╒════════════════╤═════╕
│ intelligence_0 │ 0.7 │
├────────────────┼─────┤
│ intelligence_1 │ 0.3 │
╘════════════════╧═════╛


In [66]:
print(student_model.get_cpds('grade'))

╒══════════════╤════════════════╤════════════════╤════════════════╤════════════════╕
│ intelligence │ intelligence_0 │ intelligence_0 │ intelligence_1 │ intelligence_1 │
├──────────────┼────────────────┼────────────────┼────────────────┼────────────────┤
│ difficulty   │ difficulty_0   │ difficulty_1   │ difficulty_0   │ difficulty_1   │
├──────────────┼────────────────┼────────────────┼────────────────┼────────────────┤
│ grade_0      │ 0.3            │ 0.05           │ 0.9            │ 0.5            │
├──────────────┼────────────────┼────────────────┼────────────────┼────────────────┤
│ grade_1      │ 0.4            │ 0.25           │ 0.08           │ 0.3            │
├──────────────┼────────────────┼────────────────┼────────────────┼────────────────┤
│ grade_2      │ 0.3            │ 0.7            │ 0.02           │ 0.2            │
╘══════════════╧════════════════╧════════════════╧════════════════╧════════════════╛


---

## 2. Independencies in Bayesian Networks 
Independencies implied the by the structure of our bayesian network can be categorized in 2 types:
> 1. **Local Independencies:** Any variable in the network that is independent of its non-descendents given its parents. Mathematically it can be written as:<br>
<br>
$$X \perp NonDesc(X)|Pa(X)$$
where $NonDesc(X)$ is the set of variables which are not descendents of $X$ and $Pa(X)$ is the set of variables whcih are parents of $X$. 
2. **Global Independencies:** For discussing global independencies in bayesian networks we need to look at the various network structures possible. Starting with the case of 2 nodes, there are only 2 possible ways for it to be connected:

<img src="images/two_nodes.png">

In the above two caes it is obvious that change in either node will effect the other. For the first case we can take the example of $difficulty \rightarrow grade$. If we increase the difficulty of the course the probability of getting a higher grade decreases. For the second case we can take the example of $ SAT \leftarrow Intel $. Now if we increase the probability of getting a good score in SAT that would imply that the student is intelligent, hence increasing the probability of $ i_1 $. Therefore in both the cases shown above any change in the variables leads to change in the other variable.

Now, there are four possible ways of connection between 3 nodes:

<img src="images/three_nodes.png">


Now in the above cases we will see the flow of influence from $ A $ to $ C $ under various cases.

1. **Causal**: In the general case when we make any changes in the variable $ A $, it will have an effect on variable $ B $ (as we discussed above) and this change in $ B $ will change the values in $ C $. One other possible case can be when $ B $ is observed i.e. we know the value of $ B $. So, in this case any change in $ A $ won't affect $ B $ since we already know the value. And hence there won't be any change in $ C $ as it depends only on $ B $. Mathematically we can say that: 
$$ (A \perp C | B) $$
2. **Evidential**: Similarly in this case also observing $ B $ renders $ C $ independent of $ A $. Otherwise when $ B $ is not observed the influence flows from $ A $ to $ C $. Hence:
$$ (A \perp C | B) $$
3. **Common Cause**: The influence flows from $ A $ to $ C $ when $ B $ is not observed. But when $ B $ is observed and change in $ A $ doesn't affect $ C $ since it's only dependent on $ B $. Hence here also:
$$ ( A \perp C | B) $$
4. **Common Evidence**: This case is a bit different from the others. When $ B $ is not observed any change in $ A $ reflects some change in $ B $ but not in $ C $. Let's take the example of $ D \rightarrow G \leftarrow I $. In this case if we increase the difficulty of the course the probability of getting a higher grade reduces but this has no effect on the intelligence of the student. But when $ B $ is observed let's say that the student got a good grade. Now if we increase the difficulty of the course this will increase the probability of the student to be intelligent since we already know that he got a good grade. Hence in this case 
$$ (A \perp C) $$ 
and 
$$ ( A \not\perp C | B) $$
This structure is also commonly known as **V structure**. 

We can see this in greater detail by utilizing pgmpy.

### 2.1 Find Local Independencies
We can look at the independencies for specific nodes.

In [67]:
student_model.local_independencies('difficulty')

(difficulty _|_ intelligence, letter, sat, grade)

In [68]:
student_model.local_independencies('grade')

(grade _|_ letter, sat | intelligence, difficulty)

In [69]:
student_model.local_independencies(['difficulty', 'intelligence', 'sat', 'grade', 'letter'])

(difficulty _|_ intelligence, letter, sat, grade)
(intelligence _|_ letter, difficulty, sat, grade)
(sat _|_ difficulty, letter, grade | intelligence)
(grade _|_ letter, sat | intelligence, difficulty)
(letter _|_ intelligence, difficulty, sat | grade)

In [70]:
student_model.get_independencies()

(difficulty _|_ intelligence, sat)
(difficulty _|_ sat | intelligence)
(difficulty _|_ intelligence | sat)
(difficulty _|_ letter | grade)
(difficulty _|_ sat | intelligence, letter)
(difficulty _|_ letter, sat | intelligence, grade)
(difficulty _|_ letter | sat, grade)
(difficulty _|_ sat | intelligence, letter, grade)
(difficulty _|_ letter | intelligence, sat, grade)
(grade _|_ sat | intelligence)
(grade _|_ sat | intelligence, letter)
(grade _|_ sat | intelligence, difficulty)
(grade _|_ sat | difficulty, letter, intelligence)
(intelligence _|_ difficulty)
(intelligence _|_ difficulty | sat)
(intelligence _|_ letter | grade)
(intelligence _|_ letter | difficulty, grade)
(intelligence _|_ letter | sat, grade)
(intelligence _|_ letter | difficulty, sat, grade)
(letter _|_ sat | intelligence)
(letter _|_ intelligence, difficulty, sat | grade)
(letter _|_ sat | intelligence, difficulty)
(letter _|_ difficulty, sat | intelligence, grade)
(letter _|_ intelligence, sat | difficulty, grade

### 2.2 Find Active Trail Nodes
We can also look for **active trail nodes**. We can think of active trail nodes as path's of influence; what can give you information about something else?   

In [43]:
student_model.active_trail_nodes('difficulty')

{'difficulty': {'difficulty', 'grade', 'letter'}}

In [44]:
student_model.active_trail_nodes('grade')

{'grade': {'difficulty', 'grade', 'intelligence', 'letter', 'sat'}}

Notice that for `grade` we had everything be fully returned. This is because everything provides information about grade, meaning grade is dependent upon all other random variables. 

We can also see how the active trails to difficulty change when we observed `grade`.

In [78]:
student_model.active_trail_nodes('difficulty')

{'difficulty': {'difficulty', 'grade', 'letter'}}

In [79]:
student_model.active_trail_nodes('difficulty', observed='grade')

{'difficulty': {'difficulty', 'intelligence', 'sat'}}