we are building a bayesian network to predict a student's performance based on various factors such as intelligence, attendence, difficulty, grade. Performing an exact inference to predict the probability of good performance.

In [53]:
pip install pgmpy



pgmpy is a python library for probabilistic graphical model, specially used to work with bayesian networks, markov models, and other graphical model.

In [54]:
from pgmpy.models import BayesianNetwork
from pgmpy.factors.discrete import TabularCPD
from pgmpy.inference import VariableElimination

We are importing the neccesary modules from pgmpy library laike BayesianNetwork, TabularCPD, VariableElimination

In [55]:
model = BayesianNetwork([('Intelligence','Grade'),
                         ('Difficulty','Grade'),
                         ('Attendence','Performance'),
                         ('Grade','Performance')])

Here, We create a Bayesian network and define its structure. In our case, there are 5 variables : (Intelligence, Difficulty, Attendence, Grade, Performance)

.Intelligence and Difficulty influence the Grade.

.Grade and Attendance influence Performance.


In [72]:
cpd_intelligence = TabularCPD(variable='Intelligence', variable_card=2, values=[[0.2], [0.8]])

We define the prior probability for intelligence, the variable is binary (0=not intelligent, 1=intelligent)

In [57]:
print(cpd_intelligence)

+-----------------+-----+
| Intelligence(0) | 0.2 |
+-----------------+-----+
| Intelligence(1) | 0.7 |
+-----------------+-----+


CPD for Intelligence.

In [73]:
cpd_difficulty = TabularCPD(variable='Difficulty', variable_card=2, values=[[0.8], [0.2]])

We define the prior probability for Difficulty, the variable is binary (0=not difficult, 1=difficult)

In [59]:
print(cpd_difficulty)

+---------------+-----+
| Difficulty(0) | 0.7 |
+---------------+-----+
| Difficulty(1) | 0.2 |
+---------------+-----+


CPD for Difficulty

In [60]:
cpd_attendence = TabularCPD(variable='Attendence', variable_card=2, values=[[0.2], [0.8]])

We define the prior probability for attendence, the variable is binary (0=not attended, 1=attended)

In [61]:
print(cpd_attendence)

+---------------+-----+
| Attendence(0) | 0.2 |
+---------------+-----+
| Attendence(1) | 0.8 |
+---------------+-----+


CPD for Attendence

In [62]:
cpd_grade = TabularCPD(variable='Grade', variable_card=3,
                       values=[[0.8,0.5,0.9,0.6],
                               [0.15,0.3,0.08,0.3],
                               [0.05,0.2,0.02,0.1]],
                       evidence=['Intelligence','Difficulty'], evidence_card=[2,2])

The Grade depends on both Intelligence and Difficulty. variable_card=3: This means Grade can take 3 values (A, B, C). evidence=['Intelligence', 'Difficulty']: The Grade depends on two parent nodes: Intelligence and Difficulty. evidence_card=[2, 2]: Both Intelligence and Difficulty are binary variables (True/False), so they each have 2 possible values.

In [63]:
print(cpd_grade)

+--------------+-----------------+-----------------+-----------------+-----------------+
| Intelligence | Intelligence(0) | Intelligence(0) | Intelligence(1) | Intelligence(1) |
+--------------+-----------------+-----------------+-----------------+-----------------+
| Difficulty   | Difficulty(0)   | Difficulty(1)   | Difficulty(0)   | Difficulty(1)   |
+--------------+-----------------+-----------------+-----------------+-----------------+
| Grade(0)     | 0.8             | 0.5             | 0.9             | 0.6             |
+--------------+-----------------+-----------------+-----------------+-----------------+
| Grade(1)     | 0.15            | 0.3             | 0.08            | 0.3             |
+--------------+-----------------+-----------------+-----------------+-----------------+
| Grade(2)     | 0.05            | 0.2             | 0.02            | 0.1             |
+--------------+-----------------+-----------------+-----------------+-----------------+


CPD for Grade

In [70]:
cpd_performance = TabularCPD(variable='Performance', variable_card=2,
                             values=[[0.1, 0.3, 0.6, 0.2, 0.5, 0.9],  # Performance=False
                                     [0.9, 0.7, 0.4, 0.8, 0.5, 0.1]], # Performance=True
                             evidence=['Attendence', 'Grade'], evidence_card=[2, 3])

The Performance depends on Attendance and Grade. variable_card=2: Performance is binary (False or True). evidence=['Attendance', 'Grade']: Performance depends on both Attendance and Grade. evidence_card=[2, 3]: Attendance is binary, while Grade has 3 possible values (A, B, C).

In [65]:
print(cpd_performance)

+----------------+---------------+-----+---------------+---------------+
| Attendence     | Attendence(0) | ... | Attendence(1) | Attendence(1) |
+----------------+---------------+-----+---------------+---------------+
| Grade          | Grade(0)      | ... | Grade(1)      | Grade(2)      |
+----------------+---------------+-----+---------------+---------------+
| Performance(0) | 0.1           | ... | 0.5           | 0.9           |
+----------------+---------------+-----+---------------+---------------+
| Performance(1) | 0.9           | ... | 0.5           | 0.1           |
+----------------+---------------+-----+---------------+---------------+


CPD for Performance

In [75]:
model.add_cpds(cpd_intelligence, cpd_difficulty, cpd_attendence, cpd_grade, cpd_performance)



This line will add all the Conditional Probability Distribution (CPD) to the Bayesian Network Model.

In [76]:
assert model.check_model()


This line will check if the model structure and the CPD's are consistent (sum of all the probability is 1).

In [77]:
infer = VariableElimination(model)

Here, we create an infer object using the VariableElimination algorithm to performs exact inference on the Bayesian Network, computing the exact probabilities based on the observed evidence.

In [78]:
posterior = infer.query(variables=['Performance'], evidence={'Difficulty': 1, 'Attendence': 0})

This line queries the posterior probability of the Performance variable given the observed evidence:


.Difficulty=1 (the exam is difficult)


.Attendance=0 (the student did not attend classes)

In [79]:
print(posterior)

+----------------+--------------------+
| Performance    |   phi(Performance) |
| Performance(0) |             0.2200 |
+----------------+--------------------+
| Performance(1) |             0.7800 |
+----------------+--------------------+


This line will print the resulting Probabilities for performance.