# Practical 5: Introduction to Bayesian Networks

The aim of this lab is to provide an overview of the main concepts of constraint satisfaction problems through a worked example

We base our analysis in this notebook on the open source __[pgmpy toolbox](http://pgmpy.org)__

<hr style="border:1px solid black"> </hr>

### Step 1 Install toolbox

In [None]:
pip install pgmpy==0.1.14

Collecting pgmpy==0.1.14
  Downloading pgmpy-0.1.14-py3-none-any.whl (331 kB)
[K     |████████████████████████████████| 331 kB 3.3 MB/s eta 0:00:01
[?25hCollecting statsmodels
  Downloading statsmodels-0.13.2-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (9.9 MB)
[K     |████████████████████████████████| 9.9 MB 12.8 MB/s eta 0:00:01
[?25hCollecting scipy
  Using cached scipy-1.8.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (41.6 MB)
Collecting torch
  Downloading torch-1.11.0-cp38-cp38-manylinux1_x86_64.whl (750.6 MB)
[K     |████████████████████████████████| 750.6 MB 26 kB/s  eta 0:00:012     |███████▉                        | 184.7 MB 15.0 MB/s eta 0:00:38     |███████████████████████████     | 633.7 MB 1.8 MB/s eta 0:01:04     |████████████████████████████    | 656.0 MB 5.3 MB/s eta 0:00:18     |████████████████████████████▋   | 670.4 MB 12.7 MB/s eta 0:00:07    |██████████████████████████████  | 704.6 MB 8.6 MB/s eta 0:00:06


<hr style="border:1px solid black"> </hr>

### Step 2 Defining a Structure with nodes and edges

**Bayesian Networks have 2 main components:**
-  The graph structure (conditional independence assumptions)
-  The numerical probabilities (for each variable given its parents), the Conditional Probability Tables 

We start be defining the graph stucture for the following network

![Picture%201.png](attachment:Picture%201.png)

**Question:**
-  Given the above graph what is the joint probability fof P(X,D,C,S,P)?

**Starting with defining the network structure**

In [None]:
from pgmpy.models import BayesianModel

cancer_model = BayesianModel([('Pollution', 'Cancer'), 
                              ('Smoker', 'Cancer'),
                              ('Cancer', 'Xray'),
                              ('Cancer', 'Dyspnoea')])

Check the network structure, does it match the graph?

In [None]:
print("Nodes in the model:", cancer_model.nodes())
print("Edges in the model:", cancer_model.edges())

**Now enter the Conditional Probability Tables into the model**
Note: The CPD table has one row for each possible tuple of parent values (or just a single row if there are no parents). The rows are ordered according to a lexicographic ordering of tuples ("true" comes before "false", and enumerated objects come in the order they were declared). The parameters of the CPD should be a list of row vectors, each representing a row. Each row vector consists of real numbers, one per possible value of the child variable. Values of the child variable are ordered the same way as parent values

In [None]:
from pgmpy.factors.discrete import TabularCPD

cpd_poll = TabularCPD(variable='Pollution', variable_card=2,
                      values=[[0.9], [0.1]])
cpd_smoke = TabularCPD(variable='Smoker', variable_card=2,
                       values=[[0.3], [0.7]])
cpd_cancer = TabularCPD(variable='Cancer', variable_card=2,
                        values=[[0.03, 0.05, 0.001, 0.02],
                                [0.97, 0.95, 0.999, 0.98]],
                        evidence=['Smoker', 'Pollution'],
                        evidence_card=[2, 2])
cpd_xray = TabularCPD(variable='Xray', variable_card=2,
                      values=[[0.9, 0.2], [0.1, 0.8]],
                      evidence=['Cancer'], evidence_card=[2])
cpd_dysp = TabularCPD(variable='Dyspnoea', variable_card=2,
                      values=[[0.65, 0.3], [0.35, 0.7]],
                      evidence=['Cancer'], evidence_card=[2])

**Note**  The parameter TabularCPD are defined as follows
- `variable` (int, string (any hashable python object)) – The variable whose CPD is defined.
- `variable_card` (integer) – cardinality (size) of variable
- `values` (2d array, 2d list or 2d tuple) – values of the cpd table
- `evidence` (array-like) – evidences(if any) w.r.t. which cpd is defined
- `evidence_card` (integer, array-like) – cardinality (size) of evidences (if any)

<hr style="border:1px solid black"> </hr>

### Step 3 Associating Conditional probabilities with the Bayesian Structure

Associate the parameters with the model structure.

In [None]:
cancer_model.add_cpds(cpd_poll, cpd_smoke, cpd_cancer, cpd_xray, cpd_dysp)

Check if the cpds are valid for the model

In [None]:
cancer_model.check_model()

Doing some simple queries on the network, check it matches the given network.

In [None]:
cancer_model.is_active_trail('Pollution', 'Smoker')

In [None]:
cancer_model.is_active_trail('Pollution', 'Smoker', observed=['Cancer'])

In [None]:
print(cancer_model.get_cpds('Smoker'))

In [None]:
print(cancer_model.get_cpds('Pollution'))

In [None]:
print(cancer_model.get_cpds('Cancer'))

In [None]:
print(cancer_model.get_cpds('Xray'))

In [None]:
print(cancer_model.get_cpds('Dyspnoea'))

<hr style="border:1px solid black"> </hr>

### Step 4 Check the local independencies

In [None]:
cancer_model.get_independencies()

<hr style="border:1px solid black"> </hr>

### Step 5 Inferencing with Bayesian Network

In [None]:
# Doing exact inference using Variable Elimination
from pgmpy.inference import VariableElimination
cancer_infer = VariableElimination(cancer_model)

In [None]:
# Computing the probability of bronc given smoke.
q = cancer_infer.query(variables=['Cancer'], evidence={'Smoker': 1})
print(q)

In [None]:
# Computing the probability of bronc given smoke.
q = cancer_infer.query(variables=['Cancer'], evidence={'Smoker': 0})
print(q)

In [None]:
# Computing the probability of bronc given smoke.
q = cancer_infer.query(variables=['Cancer'], evidence={'Smoker': 1,'Pollution': 1})
print(q)

<hr style="border:1px solid black"> </hr>

## Exercise

Repeat similar analysis as above for the following network

![Asia.png](attachment:Asia.png)

In [None]:
<hr style="border:1px solid black"> </hr>