# Bayesian Network for Medical Diagnosis

## Introduction
This project develops a Bayesian Network model to infer the likelihood of various diseases based on patient attributes such as Age, Gender, and Blood Type. The model demonstrates the application of Bayesian Networks in healthcare for diagnostic purposes.

### Dataset Overview
The dataset contains patient data including Age, Gender, Blood Type, and Medical Condition. It will be used to train the Bayesian Network for medical diagnosis prediction.

### Objectives
- Construct a Bayesian Network that models the relationship between patient attributes and medical conditions.
- Perform probabilistic inference to predict medical conditions.
- Evaluate the effectiveness of the Bayesian Network in a medical diagnostic context.


In [11]:
# Importing necessary libraries
import pandas as pd
from pgmpy.models import BayesianModel
from pgmpy.estimators import MaximumLikelihoodEstimator
from pgmpy.inference import VariableElimination

# Load the dataset
file_path = r"directory to \healthcare_dataset.csv"
data = pd.read_csv(file_path)



In [12]:
# Display the first few rows of the dataset
data.head()


Unnamed: 0,Name,Age,Gender,Blood Type,Medical Condition,Date of Admission,Doctor,Hospital,Insurance Provider,Billing Amount,Room Number,Admission Type,Discharge Date,Medication,Test Results
0,Tiffany Ramirez,81,Female,O-,Diabetes,2022-11-17,Patrick Parker,Wallace-Hamilton,Medicare,37490.983364,146,Elective,2022-12-01,Aspirin,Inconclusive
1,Ruben Burns,35,Male,O+,Asthma,2023-06-01,Diane Jackson,"Burke, Griffin and Cooper",UnitedHealthcare,47304.064845,404,Emergency,2023-06-15,Lipitor,Normal
2,Chad Byrd,61,Male,B-,Obesity,2019-01-09,Paul Baker,Walton LLC,Medicare,36874.896997,292,Emergency,2019-02-08,Lipitor,Normal
3,Antonio Frederick,49,Male,B-,Asthma,2020-05-02,Brian Chandler,Garcia Ltd,Medicare,23303.322092,480,Urgent,2020-05-03,Penicillin,Abnormal
4,Mrs. Brandy Flowers,51,Male,O-,Arthritis,2021-07-09,Dustin Griffin,"Jones, Brown and Murray",UnitedHealthcare,18086.344184,477,Urgent,2021-08-02,Paracetamol,Normal


## Data Preprocessing

Before constructing the Bayesian Network, it's essential to preprocess the data. This step includes selecting relevant features, encoding categorical variables, and handling missing data.


In [15]:
# Copy the data to a new DataFrame to avoid modifying the original DataFrame
data_selected = data[relevant_columns].copy()

# Encoding categorical variables 
data_selected['Gender'] = data_selected['Gender'].astype('category').cat.codes
data_selected['Blood Type'] = data_selected['Blood Type'].astype('category').cat.codes

# Display the processed data
data_selected.head()


Unnamed: 0,Age,Gender,Blood Type,Medical Condition
0,81,0,7,Diabetes
1,35,1,6,Asthma
2,61,1,5,Obesity
3,49,1,5,Asthma
4,51,1,7,Arthritis


## Constructing the Bayesian Network

The Bayesian Network will model the dependencies between Age, Gender, Blood Type, and Medical Condition. The network's structure and parameters will be learned from the data.


In [16]:
# Constructing the Bayesian Network
model = BayesianModel([
    ('Age', 'Medical Condition'),
    ('Gender', 'Medical Condition'),
    ('Blood Type', 'Medical Condition')
])

# Learning Parameters using Maximum Likelihood Estimator
model.fit(data_selected, estimator=MaximumLikelihoodEstimator)

# Displaying the learned CPDs (Conditional Probability Distributions)
for cpd in model.get_cpds():
    print("CPD of {variable}:".format(variable=cpd.variable))
    print(cpd, "\n")


CPD of Age:
+---------+--------+
| Age(18) | 0.0164 |
+---------+--------+
| Age(19) | 0.0132 |
+---------+--------+
| Age(20) | 0.0169 |
+---------+--------+
| Age(21) | 0.0153 |
+---------+--------+
| Age(22) | 0.0123 |
+---------+--------+
| Age(23) | 0.0155 |
+---------+--------+
| Age(24) | 0.0136 |
+---------+--------+
| Age(25) | 0.0149 |
+---------+--------+
| Age(26) | 0.0153 |
+---------+--------+
| Age(27) | 0.0125 |
+---------+--------+
| Age(28) | 0.0153 |
+---------+--------+
| Age(29) | 0.0162 |
+---------+--------+
| Age(30) | 0.0129 |
+---------+--------+
| Age(31) | 0.0172 |
+---------+--------+
| Age(32) | 0.0139 |
+---------+--------+
| Age(33) | 0.0146 |
+---------+--------+
| Age(34) | 0.0125 |
+---------+--------+
| Age(35) | 0.0169 |
+---------+--------+
| Age(36) | 0.0161 |
+---------+--------+
| Age(37) | 0.0148 |
+---------+--------+
| Age(38) | 0.0159 |
+---------+--------+
| Age(39) | 0.0147 |
+---------+--------+
| Age(40) | 0.0138 |
+---------+--------+
|

## Inference

Using the trained Bayesian Network, we can perform inference to predict the probability of different medical conditions given certain patient attributes.


In [17]:
# Setting up the inference
infer = VariableElimination(model)

# Example: Predicting medical condition for a specific case
# Update the evidence with actual values
evidence = {'Age': 40, 'Gender': 1, 'Blood Type': 2}  # Example evidence
prediction = infer.query(variables=['Medical Condition'], evidence=evidence)

# Display the prediction
print(prediction)


Finding Elimination Order: : : 0it [00:00, ?it/s]
0it [00:00, ?it/s]

+---------------------------------+--------------------------+
| Medical Condition               |   phi(Medical Condition) |
| Medical Condition(Arthritis)    |                   0.2222 |
+---------------------------------+--------------------------+
| Medical Condition(Asthma)       |                   0.2222 |
+---------------------------------+--------------------------+
| Medical Condition(Cancer)       |                   0.2222 |
+---------------------------------+--------------------------+
| Medical Condition(Diabetes)     |                   0.1111 |
+---------------------------------+--------------------------+
| Medical Condition(Hypertension) |                   0.0000 |
+---------------------------------+--------------------------+
| Medical Condition(Obesity)      |                   0.2222 |
+---------------------------------+--------------------------+





Equal Probabilities: Arthritis, Asthma, Cancer, and Obesity all have equal probabilities of 22.22%, which is intriguing.  
It suggests that, based on the evidence provided and the learned parameters of the network, these conditions are equally likely.  
Lower Probability for Diabetes: Diabetes is less likely compared to the other conditions, with a probability of 11.11%.  
Unlikely Hypertension: The model predicts that Hypertension is highly unlikely (0.00%) given the evidence.

## Conclusion

This project illustrates the use of Bayesian Networks in medical diagnosis prediction. The model provides insights into how patient attributes can influence medical conditions, showcasing the potential of probabilistic graphical models in healthcare.
