# CS486 - Artificial Intelligence
## Lesson 25 - D-Separation

A Bayes' Net compactly represents the joint distribution across a set of random variables. It is compact becuase it only computes conditional distributions between nodes that influence each other. Influence is indicated by an edge in the graph, but observations can changes the dependency relationships between variables.

Today we'll look closely at the structure of a Bayes' Net and introduce the **D-Separation** algorithm that determines which variables are dependent given the structure of the network and observations. 

In [1]:
from helpers import *

### Causal Chains

Consider the following example of a causal chain in which a *Weather Report* has some chance of predicting a *Low Pressure* system that might cause *Rain* which may cause *Traffic*. Each edge in the chain indicates direct influence:

<center><img src="images/chain.png" width="300"></center>
    
The Chain Rule says that we carry the givens up the chain:

$$P(T) = P(W)P(L\mid{W})P(R\mid{W,L})P(T\mid{W,L,R})$$

In a Bayes' Net, we assume that every variable is independent given *only* its parents, so we can rewrite this more compactly as:

$$P(T) = P(W)P(L\mid{W})P(R\mid{L})P(T\mid{R})$$

Since, in this example, the only influence comes from the parent then being given a parent (i.e. observing the parent) *separates* the dependencies between the nodes on the left and the nodes on the right. In our example, *Traffic* is independent of *Low Pressure* and the *Weather Report* if we have directly observed *Rain*. In this instance, you can think of it as snipping the path that connects the observed variable from the rest of the graph:

<center><img src="images/chain_separated.png" width="300"></center>

The path between *Low Pressure* and *Traffic* is **inactive**. If there is no active path between two variables in the Bayes' Net, then those variables are independent. The *dependency-separation*, or **D-Separation**, algorithm determines when paths are active or inactive. 

### Triplets

Every path between two nodes can be decomposed into a sequence of three-node paths. The table below enumerates the three types of three-node paths and the observations that makes them active or inactive:

<center><img src="images/triplets.png" width="600"></center>

We've already talked about the **Causal Chain** instance. **Common Cause** and **Common Effect** both appear in the Alarm Network:

<img src="images/bayes_net.jpg" width="400">

### Common Cause

Consider the common cause triplet between *JohnCalls* and *MaryCalls*. According to our table above, the path is active so there is dependency between *JohnCalls* and *MaryCalls*. Let's instantiate the network in AIMA to explore:

In [10]:
alarm_network = (BayesNet()
    .add('Burglary', [], 0.001)
    .add('Earthquake', [], 0.002)
    .add('Alarm', ['Burglary', 'Earthquake'], {(T, T): 0.95, (T, F): 0.94, (F, T): 0.29, (F, F): 0.001})
    .add('JohnCalls', ['Alarm'], {T: 0.90, F: 0.05})
    .add('MaryCalls', ['Alarm'], {T: 0.70, F: 0.01}))  

# so we can reference the variables outside the instance
globals().update(alarm_net.lookup)

First, let's confirm that *JohnCalls* and *MaryCalls* are dependent. We'll use the `query` method which builds and queries the full joint distribution for the variables. 

In [14]:
print( query(JohnCalls, {MaryCalls: F}, alarm_network) )
print( query(JohnCalls, {MaryCalls: T}, alarm_network) )

{F: 0.9493506867254092, T: 0.0506493132745907}
{F: 0.8224233999127043, T: 0.17757660008729567}


Yep. There is a dependency. But why? If Mary is calling, then the alarm is more likely to be going off. If the alarm is going off, John is also more likely to call. According to our table, if we observe *Alarm*, the path will become inactive. Let's see:

In [15]:
print( query(JohnCalls, {Alarm: T, MaryCalls: F}, alarm_network) )
print( query(JohnCalls, {Alarm: T, MaryCalls: T}, alarm_network) )

{F: 0.09999999999999998, T: 0.9}
{F: 0.09999999999999998, T: 0.9}


### Common Effect

Consider the common effect triplet between *Burglary* and *Earthquake*. According to our table above, the path is inactive so the two variables are independent. Let's see:

In [18]:
print( query(Burglary, {Earthquake: F}, alarm_network) )
print( query(Burglary, {Earthquake: T}, alarm_network) )

{F: 0.999, T: 0.0010000000000000002}
{F: 0.9990000000000001, T: 0.0010000000000000002}


According to the table, observing *Alarm* will active the path and introduce a dependency between the two variables:

In [19]:
print( query(Burglary, {Alarm: T, Earthquake: F}, alarm_network) )
print( query(Burglary, {Alarm: T, Earthquake: T}, alarm_network) )

{F: 0.5152140278494068, T: 0.4847859721505931}
{F: 0.996731576412303, T: 0.003268423587696965}


What's happening here? If the alarm is going off, then something is causing. So the more probable that one cause, the *Burglary* is happening then the less likely the other cause, *Earthquake*, is also happening. 

It is important to note that **an observation of any dependent children of the common effect node will active the path**. For example:

In [20]:
print( query(Burglary, {MaryCalls: T, Earthquake: T}, alarm_network) )
print( query(Burglary, {MaryCalls: T, Earthquake: F}, alarm_network) )

{F: 0.9968393116490958, T: 0.0031606883509043226}
{F: 0.9419116927193812, T: 0.058088307280618735}


Since a call from Mary indicates an alarm might be going off, it is tantamount - as far as path activiation is concerned - to observing the alarm itself. 

### D-Separation

To check if two variables are independent, check *every* path between them. If there is an active path between the node, they are **not** independent. 