## Heart Disease Prediction Using Bayes Theorem

In [4]:
import pandas as pd
import numpy as np

In [6]:
data_url = "https://archive.ics.uci.edu/ml/machine-learning-databases/heart-disease/processed.cleveland.data"
columns = ['age', 'sex', 'cp', 'trestbps', 'chol', 'fbs', 'restecg', 'thalach', 
           'exang', 'oldpeak', 'slope', 'ca', 'thal', 'target']

In [8]:
data = pd.read_csv(data_url, names=columns, na_values="?")
data = data.dropna()

In [10]:
data.head()

Unnamed: 0,age,sex,cp,trestbps,chol,fbs,restecg,thalach,exang,oldpeak,slope,ca,thal,target
0,63.0,1.0,1.0,145.0,233.0,1.0,2.0,150.0,0.0,2.3,3.0,0.0,6.0,0
1,67.0,1.0,4.0,160.0,286.0,0.0,2.0,108.0,1.0,1.5,2.0,3.0,3.0,2
2,67.0,1.0,4.0,120.0,229.0,0.0,2.0,129.0,1.0,2.6,2.0,2.0,7.0,1
3,37.0,1.0,3.0,130.0,250.0,0.0,0.0,187.0,0.0,3.5,3.0,0.0,3.0,0
4,41.0,0.0,2.0,130.0,204.0,0.0,2.0,172.0,0.0,1.4,1.0,0.0,3.0,0


In [12]:
data = data[['age','cp', 'trestbps', 'chol', 'target']]

Bayes' Theorem for Multiple Evidence Variables:

$$
P(H|E) = \frac{P(H) \cdot P(E_1|H) \cdot P(E_2|H) \cdots P(E_n|H)}{P(E)}
$$

Where:
- \( H \) is the hypothesis (e.g., the patient has heart disease).
- \( E_1, E_2, \ldots, E_n \) are the individual pieces of evidence (features).

Marginal Probability:

$$
P(E) = P(E|H) \cdot P(H) + P(E|¬H) \cdot P(¬H)
$$

For a new patient with observed features \( E_1, E_2, E_3 \):

$$
P(H|E) \propto P(H) \cdot P(E_1|H) \cdot P(E_2|H) \cdots P(E_n|H)
$$

And for the alternative hypothesis:

$$
P(¬H|E) = P(¬H) \cdot P(E_1|¬H) \cdot P(E_2|¬H) \cdots P(E_n|¬H)
$$

In [14]:
def calculate_conditional_probabilities(data, feature, value, target_class):
    subset = data[data[feature] == value]
    target_subset = subset[subset['target'] == target_class]
    return len(target_subset) / len(subset) if len(subset) > 0 else 0

In [16]:
def calculate_prior(data, target_class):
    return len(data[data['target'] == target_class]) / len(data)

In [18]:
def predict_heart_disease(cp, trestbps, chol):
    
    prior_disease = calculate_prior(data, 1)
    prior_no_disease = calculate_prior(data, 0)

    # Calculate likelihood for target = 1
    prob_cp_given_disease = calculate_conditional_probabilities(data, 'cp', cp, 1)
    prob_trestbps_given_disease = calculate_conditional_probabilities(data, 'trestbps', trestbps, 1)
    prob_chol_given_disease = calculate_conditional_probabilities(data, 'chol', chol, 1)
    likelihood_disease = prob_cp_given_disease * prob_trestbps_given_disease * prob_chol_given_disease

    # Calculate likelihood for target = 0
    prob_cp_given_no_disease = calculate_conditional_probabilities(data, 'cp', cp, 0)
    prob_trestbps_given_no_disease = calculate_conditional_probabilities(data, 'trestbps', trestbps, 0)
    prob_chol_given_no_disease = calculate_conditional_probabilities(data, 'chol', chol, 0)
    likelihood_no_disease = prob_cp_given_no_disease * prob_trestbps_given_no_disease * prob_chol_given_no_disease

    # Posterior probabilities
    posterior_disease = likelihood_disease * prior_disease
    posterior_no_disease = likelihood_no_disease * prior_no_disease

    
    if posterior_disease > posterior_no_disease:
        return 1  
    else:
        return 0  

In [20]:
test_cases = [
    {'cp': 2, 'trestbps': 140, 'chol': 230},
    {'cp': 1, 'trestbps': 120, 'chol': 200},
    {'cp': 3, 'trestbps': 150, 'chol': 260},
    {'cp': 2, 'trestbps': 130, 'chol': 220},
    {'cp': 1, 'trestbps': 160, 'chol': 300}
]

In [22]:
for i, test in enumerate(test_cases, start=1):
    prediction = predict_heart_disease(test['cp'], test['trestbps'], test['chol'])
    print(f"Test Case {i}: cp={test['cp']}, trestbps={test['trestbps']}, chol={test['chol']} -> Prediction: {prediction}")

Test Case 1: cp=2, trestbps=140, chol=230 -> Prediction: 1
Test Case 2: cp=1, trestbps=120, chol=200 -> Prediction: 0
Test Case 3: cp=3, trestbps=150, chol=260 -> Prediction: 0
Test Case 4: cp=2, trestbps=130, chol=220 -> Prediction: 0
Test Case 5: cp=1, trestbps=160, chol=300 -> Prediction: 1
