In [1]:
# Install pgmpy in the Jupyter Notebook environment (if not installed)
!pip install pgmpy

# Import required libraries
import pandas as pd 
from pgmpy.models import BayesianNetwork 
from pgmpy.estimators import MaximumLikelihoodEstimator 
from pgmpy.inference import VariableElimination 
from sklearn.preprocessing import LabelEncoder 
import numpy as np

# Load the dataset
df = pd.read_csv("heart.csv")
print("Data preview:")
print(df.head())  # Display first few rows of the dataset

# Check for missing values
print("Missing values in each column:")
print(df.isnull().sum())

# Discretize 'age' and 'chol' columns into categorical values
df['age'] = pd.cut(df['age'], bins=3, labels=['Young', 'Middle', 'Old'])
df['chol'] = pd.cut(df['chol'], bins=3, labels=['Low', 'Normal', 'High'])

# Display dataset after discretization
print("Data after discretizing 'age' and 'chol':")
print(df.head())

# Define Bayesian Network structure
model = BayesianNetwork([('age', 'target'),
                         ('chol', 'target'),
                         ('cp', 'target'),
                         ('target', 'thalach')])
print("Edges in the model:", model.edges())

# Fit the model using Maximum Likelihood Estimator
model.fit(df, estimator=MaximumLikelihoodEstimator)

# Perform inference
infer = VariableElimination(model)

# Query the model with specified evidence
result1 = infer.query(variables=['target'], evidence={'age': 'Old', 'chol': 'High'})
print("Query result with evidence {'age': 'Old', 'chol': 'High'}:")
print(result1)

result2 = infer.query(variables=['target'], evidence={'age': 'Middle', 'chol': 'Normal', 'cp': 2})
print("Query result with evidence {'age': 'Middle', 'chol': 'Normal', 'cp': 2}:")
print(result2)


Defaulting to user installation because normal site-packages is not writeable


DEPRECATION: Loading egg at c:\program files\python311\lib\site-packages\vboxapi-1.0-py3.11.egg is deprecated. pip 24.3 will enforce this behaviour change. A possible replacement is to use pip for package installation.. Discussion can be found at https://github.com/pypa/pip/issues/12330

[notice] A new release of pip is available: 24.0 -> 24.3.1
[notice] To update, run: python.exe -m pip install --upgrade pip


Data preview:
   age  sex  cp  trestbps  chol  fbs  restecg  thalach  exang  oldpeak  slope  \
0   52    1   0       125   212    0        1      168      0      1.0      2   
1   53    1   0       140   203    1        0      155      1      3.1      0   
2   70    1   0       145   174    0        1      125      1      2.6      0   
3   61    1   0       148   203    0        1      161      0      0.0      2   
4   62    0   0       138   294    1        1      106      0      1.9      1   

   ca  thal  target  
0   2     3       0  
1   0     3       0  
2   0     3       0  
3   1     3       0  
4   3     2       0  
Missing values in each column:
age         0
sex         0
cp          0
trestbps    0
chol        0
fbs         0
restecg     0
thalach     0
exang       0
oldpeak     0
slope       0
ca          0
thal        0
target      0
dtype: int64
Data after discretizing 'age' and 'chol':
      age  sex  cp  trestbps    chol  fbs  restecg  thalach  exang  oldpeak  \
0  Mid

Here's a step-by-step summary of what each part of the code is doing:

1. **Install pgmpy**: Installs the `pgmpy` library, which is used for probabilistic graphical models like Bayesian Networks.

2. **Import Libraries**: Imports necessary libraries:
   - `pandas` for data manipulation,
   - `pgmpy` for Bayesian Network modeling and inference,
   - `sklearn.preprocessing.LabelEncoder` for potential label encoding (though not used in the final code),
   - `numpy` for numerical operations.

3. **Load the Dataset**: Reads the `heart.csv` file into a DataFrame (`df`) and previews the first few rows of the dataset to understand its structure.

4. **Check for Missing Values**: Prints the number of missing values in each column to check data completeness before processing.

5. **Discretize 'age' and 'chol' Columns**:
   - Converts the continuous values in the `age` and `chol` columns into categorical bins (e.g., 'Young', 'Middle', 'Old' for age and 'Low', 'Normal', 'High' for cholesterol).
   - This step helps convert continuous data into categories that are easier for Bayesian Networks to handle.

6. **Define Bayesian Network Structure**:
   - Creates a Bayesian Network model with specified edges (connections).
   - The structure specifies that `age`, `chol`, and `cp` influence `target`, and `target` influences `thalach`.

7. **Display Edges**: Prints out the edges of the Bayesian Network to confirm the model structure.

8. **Train the Model (Fit)**:
   - Trains the Bayesian Network on the dataset using the Maximum Likelihood Estimator, which estimates parameters of the network based on the data provided.

9. **Set Up Inference (Variable Elimination)**:
   - Initializes the Variable Elimination algorithm for inference on the trained model.
   - This allows querying the model for probability distributions under given conditions.

10. **Query the Model**:
    - Queries the model for the probability distribution of `target` given specific evidence:
      - First query: `{'age': 'Old', 'chol': 'High'}`,
      - Second query: `{'age': 'Middle', 'chol': 'Normal', 'cp': 2}`.
    - Prints the results of these queries to see the predicted probabilities for `target` given each set of evidence.

This sequence builds and trains a Bayesian Network, prepares it for inference, and performs probability queries based on observed values.