In [1]:
!pip install numpy
!pip install pandas
!pip install pgmpy

Defaulting to user installation because normal site-packages is not writeable



[notice] A new release of pip is available: 25.1.1 -> 25.3
[notice] To update, run: C:\Program Files\Python313\python.exe -m pip install --upgrade pip


Defaulting to user installation because normal site-packages is not writeable



[notice] A new release of pip is available: 25.1.1 -> 25.3
[notice] To update, run: C:\Program Files\Python313\python.exe -m pip install --upgrade pip


Defaulting to user installation because normal site-packages is not writeable



[notice] A new release of pip is available: 25.1.1 -> 25.3
[notice] To update, run: C:\Program Files\Python313\python.exe -m pip install --upgrade pip


In [3]:
import numpy as np
import pandas as pd
from pgmpy.models import DiscreteBayesianNetwork as BayesianModel
from pgmpy.estimators import MaximumLikelihoodEstimator
from pgmpy.inference import VariableElimination

# Read the Cleveland Heart Disease dataset
heartDisease = pd.read_csv('heart.csv')
heartDisease = heartDisease.replace('?', np.nan)

print(f"Few examples from the dataset:\n\n{heartDisease.head()}")
print("\nColumns in dataset:", heartDisease.columns.tolist())

# If the target column is named 'target', rename it for consistency
if 'target' in heartDisease.columns:
    heartDisease.rename(columns={'target': 'heartdisease'}, inplace=True)

# Model Bayesian Network (using renamed column)
model = BayesianModel([
    ('age', 'trestbps'),
    ('age', 'fbs'),
    ('sex', 'trestbps'),
    ('exang', 'trestbps'),
    ('trestbps', 'heartdisease'),
    ('fbs', 'heartdisease'),
    ('heartdisease', 'restecg'),
    ('heartdisease', 'thalach'),
    ('heartdisease', 'chol')
])

print('\nLearning CPD using Maximum likelihood estimators...')
model.fit(heartDisease, estimator=MaximumLikelihoodEstimator)

# Inferencing
print('\nInferencing with Bayesian Network:')
HeartDisease_infer = VariableElimination(model)

print('\n1. Probability of HeartDisease given Age=38')
q1 = HeartDisease_infer.query(variables=['heartdisease'], evidence={'age': 38})
print(q1)

print('\n2. Probability of HeartDisease given Cholesterol=212')
q2 = HeartDisease_infer.query(variables=['heartdisease'], evidence={'chol': 212})
print(q2)


INFO:pgmpy: Datatype (N=numerical, C=Categorical Unordered, O=Categorical Ordered) inferred from data: 
 {'age': 'N', 'sex': 'N', 'cp': 'N', 'trestbps': 'N', 'chol': 'N', 'fbs': 'N', 'restecg': 'N', 'thalach': 'N', 'exang': 'N', 'oldpeak': 'N', 'slope': 'N', 'ca': 'N', 'thal': 'N', 'heartdisease': 'N'}


Few examples from the dataset:

   age  sex  cp  trestbps  chol  fbs  restecg  thalach  exang  oldpeak  slope  \
0   63    1   0       145   233    1        2      150      0      2.3      2   
1   67    1   3       160   286    0        2      108      1      1.5      1   
2   67    1   3       120   229    0        2      129      1      2.6      1   
3   37    1   2       130   250    0        0      187      0      3.5      2   
4   41    0   1       130   204    0        2      172      0      1.4      0   

   ca  thal  target  
0   0     2       0  
1   3     1       1  
2   2     3       1  
3   0     1       0  
4   0     1       0  

Columns in dataset: ['age', 'sex', 'cp', 'trestbps', 'chol', 'fbs', 'restecg', 'thalach', 'exang', 'oldpeak', 'slope', 'ca', 'thal', 'target']

Learning CPD using Maximum likelihood estimators...

Inferencing with Bayesian Network:

1. Probability of HeartDisease given Age=38
+-----------------+---------------------+
| heartdisease    |   phi(hea

Purpose of the Practical
The purpose is to build a probabilistic model that represents "cause and effect" (or more accurately, "dependency") between different variables.

Unlike other models that just find a final answer, a Bayesian Network lets you ask "what if?" questions and see how probabilities change.

For example, you're not just predicting if a patient has heart disease. You are building a model to answer questions like:

"What is the probability this patient has heart disease, given that their age is 38?"

"If we find out their cholesterol is 212, how does that change the probability?"

It's a powerful way to model uncertainty and reasoning, just like how a doctor thinks.

üß† Core Theory (How it Works)
A Bayesian Network (also called a Bayes Net) has two parts:

The Structure (The Graph): A Directed Acyclic Graph (DAG) where each node is a variable (like 'age', 'sex', 'heartdisease'). The arrows (edges) show the dependencies.

The arrow (age, trestbps) means "Age has a direct influence on Resting Blood Pressure (trestbps)."

'Age' is the parent and 'trestbps' is the child.

This structure is defined by you (the programmer) based on domain knowledge.

The Probabilities (The CPTs): Each node has a Conditional Probability Table (CPT).

Root nodes (with no parents, like 'age') have a simple probability (e.g., P(age)).

Child nodes (with parents, like 'heartdisease') have a conditional probability (e.g., P(heartdisease | age, fbs)). This table stores the probability of heart disease for every possible combination of its parents.

The program's "learning" (model.fit) part is just its process of calculating all these CPTs from your dataset. "Inference" (HeartDisease_infer.query) is the process of using these tables and Bayes' theorem to calculate the final probability you asked for.

üìã Step-by-Step Code Explanation
Import Libraries:

pandas: To load and clean the heart.csv data.

BayesianModel: The main class from pgmpy to create the structure of your network.

MaximumLikelihoodEstimator: This is the "learning" algorithm. It will look at your data and calculate all the CPTs.

VariableElimination: This is the "inference" algorithm. It's the tool you use to ask questions (.query()).

Load Data: The code loads heart.csv into a pandas DataFrame.

Define Model Structure:

model = BayesianModel([...]): This is where you define the graph structure.

The list of tuples, like ('age', 'trestbps') and ('fbs', 'heartdisease'), tells pgmpy which arrows to draw between which nodes. This is the "expert knowledge" part of the practical.

Train the Model (Learning):

model.fit(heartDisease, ...): This line "trains" the model. It uses the MaximumLikelihoodEstimator to go through the heartDisease DataFrame and calculate all the CPTs based on the structure you defined.

Create Inference Engine:

HeartDisease_infer = VariableElimination(model): This line prepares the model to answer questions.

Perform Inference (Querying):

HeartDisease_infer.query(variables=['heartdisease'], evidence={'age': 38}): This is the key part. You are asking the model:

variables=['heartdisease']: "What is the probability of 'heartdisease'..."

evidence={'age': 38}: "...given the evidence that the 'age' is 38?"

The code then does the same thing for evidence={'chol': 212}.

The output is the probability table showing the final answer (e.g., 28.59% chance of heart disease for age=38).

üõ†Ô∏è Key Libraries & Functions
pgmpy: The main library for "Probabilistic Graphical Models" in Python.

BayesianModel(list_of_edges): The class used to define the structure (the graph) of your network.

model.fit(data, estimator=...): The function that "learns" the probabilities (CPTs) from your data.

MaximumLikelihoodEstimator: The specific method of learning you are using. It's the most common and simplest one.

VariableElimination(model): The class used to create an "inference engine" that can answer questions.

.query(variables=..., evidence=...): The function you call on the inference engine to ask for a specific probability.