# ECEN760 Homework 2 : Bayesian Network

   **Name:**  "Anil B Murthy"                      
   
   **UIN: **  "525006147"
   

## Installation instructons:

**Step-1: Setup Anaconda Enviornment**

   We're going to use Python 2.7 (and a bunch of helpful libraries) using Anaconda platform. Anaconda advertises itself as "the leading open data science platform powered by Python."
 
Let's start by downloading the latest version of Anaconda:

https://www.continuum.io/downloads
 
Make sure to grab the Python 2.7 version (not the Python 3.5 one). You should find installers for Windows, Mac, and Linux.
Once you install, you should find that the path to your python installation looks like:
 
#Open the terminal window and type the following

#windows users may skip the "which python" command

> which python

#you'll see something like this in linux:

/Users/Nagaraj/anaconda/bin/python
 
 
#Create environment for our class

> conda create --name pgm760 python=2.7

#To see list of our environments type

> conda info --envs

#update packages in this environment

> conda install --name pgm760 matplotlib
> conda install --name pgm760 jupyter
> conda install --name pgm760 pyparsing

#plus many others that we'll install later in the semester.

#switch over to our new environment before you start coding

> source activate pgm760

#In Windows, the commands are "activate pgm760" and "deactivate"

#you will see (pgm760) now


#To open the Jupyter Notebook:

> jupyter notebook 
 
#when you are done and close out jupyter, you can deactivate the current environment

> source deactivate

#You can find more info on conda here: http://conda.pydata.org/docs/using/pkgs.html

**Step-2: Download and Install "pgmpy" library**
 
Download:
 
 If you have git already setup, you can download the following repository by
 
> git clone https://github.com/pgmpy/pgmpy

or download and unzip from: https://github.com/pgmpy/pgmpy/archive/dev.zip
 
Install:
 
 To install the dependencies switch to the pgmpy directory using:
 
> cd /[path-to]/pgmpy
 
 Open the requirements.txt file and change "pyparsing==2.1.8" to "pyparsing" and save it.
 Remember to activate the "pgm760" enviornment
 Now install all the necessary requirements using conda:
 
> conda install --file requirements.txt
 
 Then install the library using
 
> sudo python setup.py install  

#Windows users don't need the keyword "sudo"
 
**Step-3: Validate the installation**

Open jupyter notebook from the "pgmpy" directory:

cd /path/to/pgmpy>> jupyter notebook

Now open the file "examples/Creating a Bayesian Network.ipynb"

Check if you can run the program without errors.



# General Instructions:

In this assignment, you will write a python program to build a Bayesian Model using the pgmpy module. 

* Since this is your first programming assignment using python, you're provided with the sketch of the program. You only need to complete the cells that says " # Write your code here ". 

* Feel free to organize your code into multiple cells for better readability. 
* You may also create sub-functions/ definitions.
* You may also import more standard libraries in addition to the libraries that are already imported in the preprocessing part.


## Bayesian Network:

Consider the following example from Koller's book to model a student's chance of getting a recommendation letter.

![PGM](koller_example.png)


In [23]:
from pgmpy.models import BayesianModel
from pgmpy.factors.discrete import TabularCPD
from pgmpy.inference import VariableElimination

Build your network here and assign the CPDs to the model.

In [24]:
student_recoLetter_model = BayesianModel([('Difficulty', 'Grade'),
                                         ('Intelligence', 'Grade'),
                                         ('Intelligence', 'SAT'),
                                         ('Grade', 'Letter')])

cpd_diff = TabularCPD(variable = 'Difficulty', variable_card = 2, values=[[0.6],[0.4]])
cpd_intel = TabularCPD(variable = 'Intelligence', variable_card = 2, values=[[0.7],[0.3]])
cpd_grade = TabularCPD(variable = 'Grade', variable_card = 3, 
                       values=[[0.3, 0.05, 0.9, 0.5],[0.4, 0.25, 0.08, 0.3],[0.3, 0.7, 0.02, 0.2]],
                      evidence = ['Difficulty', 'Intelligence'],
                      evidence_card = [2, 2])
cpd_SAT = TabularCPD(variable = 'SAT', variable_card = 2, values=[[0.95, 0.2],[0.05, 0.8]],
                    evidence = ['Intelligence'],
                    evidence_card = [2])
cpd_letter = TabularCPD(variable = 'Letter', variable_card = 2, values=[[0.1, 0.4, 0.99],[0.9, 0.6, 0.01]],
                       evidence = ['Grade'],
                       evidence_card = [3])

student_recoLetter_model.add_cpds(cpd_diff, cpd_intel, cpd_grade, cpd_SAT, cpd_letter)

student_recoLetter_model.check_model()

True

Print the summary of the network - node and edges information. Also check if the CPDs match with the network. 

In [25]:
student_recoLetter_model.nodes()

['Grade', 'Difficulty', 'SAT', 'Letter', 'Intelligence']

In [26]:
student_recoLetter_model.edges()

[('Grade', 'Letter'),
 ('Difficulty', 'Grade'),
 ('Intelligence', 'Grade'),
 ('Intelligence', 'SAT')]

In [27]:
student_recoLetter_model.check_model()

True

## Questions:

**Qn 1.** Find the marginal distribution for all the 5 variables

In [36]:
student_recoLetter_model.get_cpds()

[<TabularCPD representing P(Difficulty:2) at 0xa32ec50L>,
 <TabularCPD representing P(Intelligence:2) at 0xa32ecc0L>,
 <TabularCPD representing P(Grade:3 | Difficulty:2, Intelligence:2) at 0xa32ecf8L>,
 <TabularCPD representing P(SAT:2 | Intelligence:2) at 0xa32ed30L>,
 <TabularCPD representing P(Letter:2 | Grade:3) at 0xa32ed68L>]

In [37]:
student_infer = VariableElimination(student_recoLetter_model)
q = student_infer.query(variables=['Difficulty'])
print(q['Difficulty'])

+--------------+-------------------+
| Difficulty   |   phi(Difficulty) |
|--------------+-------------------|
| Difficulty_0 |            0.6000 |
| Difficulty_1 |            0.4000 |
+--------------+-------------------+


In [38]:
q = student_infer.query(variables=['Intelligence'])
print(q['Intelligence'])

+----------------+---------------------+
| Intelligence   |   phi(Intelligence) |
|----------------+---------------------|
| Intelligence_0 |              0.7000 |
| Intelligence_1 |              0.3000 |
+----------------+---------------------+


In [39]:
q = student_infer.query(variables=['SAT'])
print(q['SAT'])

+-------+------------+
| SAT   |   phi(SAT) |
|-------+------------|
| SAT_0 |     0.7250 |
| SAT_1 |     0.2750 |
+-------+------------+


In [40]:
q = student_infer.query(variables=['Grade'])
print(q['Grade'])

+---------+--------------+
| Grade   |   phi(Grade) |
|---------+--------------|
| Grade_0 |       0.4470 |
| Grade_1 |       0.2714 |
| Grade_2 |       0.2816 |
+---------+--------------+


In [41]:
q = student_infer.query(variables=['Letter'])
print(q['Letter'])

+----------+---------------+
| Letter   |   phi(Letter) |
|----------+---------------|
| Letter_0 |        0.4320 |
| Letter_1 |        0.5680 |
+----------+---------------+


 **Qn 2.** Is the trail from Difficulty to SAT active given Letter ?

In [42]:
student_recoLetter_model.is_active_trail('Difficulty', 'SAT', observed=['Letter'])

True

**Qn 3.** List all the conditional independencies that are satisfied by the Bayesian network

In [46]:
student_recoLetter_model.get_independencies()

(Grade _|_ SAT | Intelligence)
(Grade _|_ SAT | Difficulty, Intelligence)
(Grade _|_ SAT | Intelligence, Letter)
(Grade _|_ SAT | Difficulty, Letter, Intelligence)
(Difficulty _|_ Intelligence, SAT)
(Difficulty _|_ Letter | Grade)
(Difficulty _|_ SAT | Intelligence)
(Difficulty _|_ Intelligence | SAT)
(Difficulty _|_ Letter, SAT | Grade, Intelligence)
(Difficulty _|_ Letter | Grade, SAT)
(Difficulty _|_ SAT | Intelligence, Letter)
(Difficulty _|_ SAT | Grade, Intelligence, Letter)
(Difficulty _|_ Letter | Grade, Intelligence, SAT)
(SAT _|_ Difficulty)
(SAT _|_ Letter | Grade)
(SAT _|_ Grade, Difficulty, Letter | Intelligence)
(SAT _|_ Letter | Grade, Difficulty)
(SAT _|_ Difficulty, Letter | Grade, Intelligence)
(SAT _|_ Grade, Letter | Difficulty, Intelligence)
(SAT _|_ Grade, Difficulty | Intelligence, Letter)
(SAT _|_ Letter | Grade, Difficulty, Intelligence)
(SAT _|_ Difficulty | Grade, Intelligence, Letter)
(SAT _|_ Grade | Difficulty, Letter, Intelligence)
(Letter _|_ Difficulty,