In [1]:
import numpy as np
import pandas as pd
from pgmpy.estimators import MaximumLikelihoodEstimator
from pgmpy.models import BayesianNetwork
from pgmpy.inference import VariableElimination

In [2]:
df = pd.read_csv('heartdisease.csv')
df.head()

Unnamed: 0,age,Gender,Family,diet,Lifestyle,cholestrol,heartdisease
0,0,0,1,1,3,0,1
1,0,1,1,1,3,0,1
2,1,0,0,0,2,1,1
3,4,0,1,1,3,2,0
4,3,1,1,0,0,2,0


In [7]:
print('Attributes\tDatatypes')
df.dtypes

Attributes	Datatypes


age             int64
Gender          int64
Family          int64
diet            int64
Lifestyle       int64
cholestrol      int64
heartdisease    int64
dtype: object

In [24]:
model = BayesianNetwork([('age', 'Lifestyle'),
                         ('Gender', 'Lifestyle'),
                         ('Family', 'heartdisease'),
                         ('diet', 'cholestrol'),
                         ('Lifestyle', 'diet'),
                         ('cholestrol', 'heartdisease'),
                         ('diet', 'cholestrol')])

In [25]:
model.fit(df, estimator=MaximumLikelihoodEstimator)

In [21]:
df_inference = VariableElimination(model)

In [29]:
new_data_instance = {
    'age': 1,
    'Gender': 1,
    'Family': 1,
    'diet': 1,
} # Add or remove parameters as required

# Perform inference for diagnosis
result = df_inference.query(variables=['heartdisease'], evidence=new_data_instance)
print(result)

+-----------------+---------------------+
| heartdisease    |   phi(heartdisease) |
| heartdisease(0) |              0.5241 |
+-----------------+---------------------+
| heartdisease(1) |              0.4759 |
+-----------------+---------------------+


## Explanation
The Bayesian network you provided is a graphical model that represents conditional dependencies among a set of random variables. Each node in the graph represents a random variable, and the edges between nodes indicate probabilistic dependencies between them. In your provided Bayesian network model:

- `age` and `Gender` influence the `Lifestyle` of a person.
- `Family` influences the likelihood of `heartdisease`.
- `diet` influences `cholestrol`.
- `Lifestyle` influences `diet`.
- `cholestrol` influences the likelihood of `heartdisease`.
- `diet` also influences `cholestrol`.

Here's an explanation of how it works:

1. **Model Structure**:
   - The nodes in the network represent random variables. For example, `age` can take on different values, and `Lifestyle` can take on different values based on a person's age and gender.

2. **Conditional Dependencies**:
   - The edges (arrows) between nodes represent conditional dependencies. For example, `age` and `Gender` influence the random variable `Lifestyle`. This means that a person's lifestyle is dependent on their age and gender.

3. **Learning Parameters**:
   - To use this Bayesian network, you need to learn the conditional probability distributions for each node based on your dataset. The model's structure defines which variables are conditionally dependent on others. For example, you can learn the conditional probability distribution of `cholestrol` given `diet`.

4. **Inference**:
   - After learning the model's parameters, you can perform inference, such as making predictions or diagnoses. You can query the network to find the probability of a specific variable given evidence on other variables.

For example, you can use this network to answer questions like:
- Given a person's age, gender, and family history, what is the probability of them having heart disease (`heartdisease`)?
- Given a person's lifestyle and diet, what is the probability of their cholestrol level (`cholestrol`)?

To use this Bayesian network, you would need a library that supports Bayesian networks, like `pgmpy` in Python. You would define the structure, learn the conditional probability distributions from data, and then perform inference to make predictions or diagnoses based on the learned model.