## Bayesian Network

In Naive Bayes, the key for the algorithm is conditional independence hypothesis of feature. But it is usually too strict to be true in reality and the correlation between features limits the performance of Naive Bayes. Therefore, there is another bayes algorithm that relaxes the restrictions of conditional independence hypothesis, Bayesian Network.

A Bayesian network usually consists of a directed acyclic graph (DAG) and a probability table corresponding to the nodes. DAG is composed of nodes and directed edges. Nodes represent characteristic attributes or random variables, and directed edges represent dependencies between variables. An important property of Bayesian network is that when the probability distribution of a node's parent is determined, the node is conditionally independent of all its indirect parent nodes. This property makes it convenient to calculate the joint probability distribution between variables.

In general, the formula for calculating the joint probability distribution of multi dependent random variables is as follows:

$$
P(x_{1}, x_{2}, \cdots , x_{n}) = P(x_{1})P(x_{2}|x_{1})P(x_{3}|x_{1}, x_{2}) \cdots P(x_{n}|x_{1},x_{2}, \cdots ,x-{n-1})
$$

With the property above, the formula can be simplified as follow:
$$
P(x_{1}, x_{2}, \cdots , x_{n}) = \prod_{i=1}^{n} P(x_{i}|Parents(x_{i}))
$$

Here uses pgmpy to construct the bayesian network, and takes the following DAG and probability table as an example.

<img src="/Users/imchengliang/Downloads/Code/ML/Bayesian Network/1.png" />

In [11]:
from pgmpy.factors.discrete import TabularCPD
from pgmpy.models import BayesianModel
from pgmpy.estimators import MaximumLikelihoodEstimator, BayesianEstimator

In [4]:
# set the dependency of different variables
student_model = BayesianModel([('D', 'G'), ('I', 'G'), ('G', 'L'), ('I', 'S')])



In [5]:
# build the nodes and set their probability table
grade_cpd = TabularCPD(
    variable='G', # node name
    variable_card=3, # number of variable in this node
    values=[[0.3, 0.05, 0.9, 0.5], # probability of node
    [0.4, 0.25, 0.08, 0.3],
    [0.3, 0.7, 0.02, 0.2]],
    evidence=['I', 'D'], # parent node
    evidence_card=[2, 2] # number of variable in each parent node
)

difficulty_cpd = TabularCPD(
            variable='D',
            variable_card=2,
            values=[[0.6], [0.4]]
)

intel_cpd = TabularCPD(
            variable='I',
            variable_card=2,
            values=[[0.7], [0.3]]
)

letter_cpd = TabularCPD(
            variable='L',
            variable_card=2,
            values=[[0.1, 0.4, 0.99],
            [0.9, 0.6, 0.01]],
            evidence=['G'],
            evidence_card=[3]
)

sat_cpd = TabularCPD(
            variable='S',
            variable_card=2,
            values=[[0.95, 0.2],
            [0.05, 0.8]],
            evidence=['I'],
            evidence_card=[2]
)

In [7]:
# add nodes to the model to build the network
student_model.add_cpds(
    grade_cpd, difficulty_cpd, intel_cpd, letter_cpd, sat_cpd
)

# obtain the conditional probability distribution
student_model.get_cpds()

[<TabularCPD representing P(G:3 | I:2, D:2) at 0x7fe91138d3a0>,
 <TabularCPD representing P(D:2) at 0x7fe9105b4fa0>,
 <TabularCPD representing P(I:2) at 0x7fe911108d00>,
 <TabularCPD representing P(L:2 | G:3) at 0x7fe911224fa0>,
 <TabularCPD representing P(S:2 | I:2) at 0x7fe91117c760>]

In [8]:
# obtain the dependency relationship between each node
student_model.get_independencies()

(D ⟂ S, I)
(D ⟂ I | S)
(D ⟂ S | I)
(D ⟂ L | G)
(D ⟂ S | L, I)
(D ⟂ L | S, G)
(D ⟂ L, S | I, G)
(D ⟂ S | L, I, G)
(D ⟂ L | S, I, G)
(L ⟂ S | I)
(L ⟂ D, I, S | G)
(L ⟂ S | I, D)
(L ⟂ S, I | D, G)
(L ⟂ S, D | I, G)
(L ⟂ D, I | S, G)
(L ⟂ S | I, D, G)
(L ⟂ I | S, D, G)
(L ⟂ D | S, I, G)
(S ⟂ D)
(S ⟂ L, D, G | I)
(S ⟂ L | G)
(S ⟂ D, G | L, I)
(S ⟂ L, G | I, D)
(S ⟂ L | D, G)
(S ⟂ L, D | I, G)
(S ⟂ G | L, I, D)
(S ⟂ D | L, I, G)
(S ⟂ L | I, D, G)
(I ⟂ D)
(I ⟂ D | S)
(I ⟂ L | G)
(I ⟂ L | S, G)
(I ⟂ L | D, G)
(I ⟂ L | S, D, G)
(G ⟂ S | I)
(G ⟂ S | L, I)
(G ⟂ S | I, D)
(G ⟂ S | L, I, D)

In [9]:
from pgmpy.inference import VariableElimination 
student_infer = VariableElimination(student_model)
# predict the result that a good student faces an easy exam
prob_G = student_infer.query( variables=['G'], evidence={'I': 1, 'D': 0}) 
print(prob_G)

Finding Elimination Order: : : 0it [00:00, ?it/s]
0it [00:00, ?it/s]

+------+----------+
| G    |   phi(G) |
| G(0) |   0.9000 |
+------+----------+
| G(1) |   0.0800 |
+------+----------+
| G(2) |   0.0200 |
+------+----------+





In [10]:
# generate data
import numpy as np
import pandas as pd

raw_data = np.random.randint(low=0, high=2, size=(1000, 5))
data = pd.DataFrame(raw_data, columns=['D', 'I', 'G', 'L', 'S'])
data.head()

Unnamed: 0,D,I,G,L,S
0,0,1,1,1,1
1,0,1,0,1,1
2,0,0,0,0,0
3,1,0,1,1,1
4,1,1,0,0,0


In [13]:
# use generated data to train the model based on MLE
student_model.fit(data, estimator=MaximumLikelihoodEstimator) 
for cpd in student_model.get_cpds():
    print("CPD of {variable}:".format(variable=cpd.variable)) 
    print(cpd)

CPD of G:
+------+--------------------+-----+--------------------+
| D    | D(0)               | ... | D(1)               |
+------+--------------------+-----+--------------------+
| I    | I(0)               | ... | I(1)               |
+------+--------------------+-----+--------------------+
| G(0) | 0.4743083003952569 | ... | 0.5103734439834025 |
+------+--------------------+-----+--------------------+
| G(1) | 0.525691699604743  | ... | 0.4896265560165975 |
+------+--------------------+-----+--------------------+
CPD of D:
+------+-------+
| D(0) | 0.521 |
+------+-------+
| D(1) | 0.479 |
+------+-------+
CPD of I:
+------+-------+
| I(0) | 0.491 |
+------+-------+
| I(1) | 0.509 |
+------+-------+
CPD of L:
+------+--------------------+---------------------+
| G    | G(0)               | G(1)                |
+------+--------------------+---------------------+
| L(0) | 0.4896694214876033 | 0.46511627906976744 |
+------+--------------------+---------------------+
| L(1) | 0.510330