<a href="https://colab.research.google.com/github/Dinesh18092006/dinesh_project1/blob/main/Disease_prediction.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
"""
===========================================
DISEASE PREDICTION USING BAYES' THEOREM
Foundations of AI - Mini Project
===========================================

OBJECTIVE:
To predict the probability of a person having a disease based on observed symptoms
using Bayes' Theorem and Naive Bayes Classifier.

THEORY - BAYES' THEOREM:
P(Disease|Symptoms) = P(Symptoms|Disease) * P(Disease) / P(Symptoms)

Where:
- P(Disease|Symptoms) = Probability of having disease given the symptoms (POSTERIOR)
- P(Symptoms|Disease) = Probability of symptoms given the disease (LIKELIHOOD)
- P(Disease) = Prior probability of disease (PRIOR)
- P(Symptoms) = Probability of symptoms occurring (EVIDENCE)
"""

import pandas as pd
import numpy as np
from sklearn.naive_bayes import GaussianNB
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report
import warnings
warnings.filterwarnings('ignore')

print("=" * 60)
print("DISEASE PREDICTION SYSTEM USING BAYES' THEOREM")
print("=" * 60)

# ============================================================================
# PART 1: MANUAL IMPLEMENTATION OF BAYES' THEOREM
# ============================================================================

print("\n" + "=" * 60)
print("PART 1: MANUAL BAYES' THEOREM CALCULATION")
print("=" * 60)

# Create a sample dataset manually
# Disease: 0 = No Disease, 1 = Has Disease
# Symptoms: Fever (0-1), Cough (0-1), Fatigue (0-1), Difficulty Breathing (0-1)

data = {
    'Fever': [1, 1, 0, 1, 0, 1, 1, 0, 1, 1, 0, 0, 1, 1, 0, 1, 0, 1, 1, 0,
              1, 1, 1, 0, 1, 1, 0, 1, 1, 0, 1, 1, 0, 1, 0, 1, 1, 1, 0, 1],
    'Cough': [1, 1, 0, 1, 0, 1, 0, 0, 1, 1, 1, 0, 1, 1, 0, 1, 0, 1, 1, 0,
              1, 0, 1, 1, 1, 1, 0, 1, 1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 0, 1],
    'Fatigue': [1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 0, 1, 1, 0, 1, 1, 1, 0, 1, 1,
                1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 0, 1, 1],
    'Breathing_Difficulty': [1, 1, 0, 1, 0, 1, 1, 0, 1, 1, 0, 0, 1, 1, 0, 1, 0, 1, 1, 0,
                             1, 1, 1, 0, 1, 1, 0, 1, 1, 0, 1, 1, 0, 1, 0, 1, 1, 1, 0, 1],
    'Disease': [1, 1, 0, 1, 0, 1, 1, 0, 1, 1, 0, 0, 1, 1, 0, 1, 0, 1, 1, 0,
                1, 1, 1, 0, 1, 1, 0, 1, 1, 0, 1, 1, 0, 1, 0, 1, 1, 1, 0, 1]
}

df = pd.DataFrame(data)

print("\nDataset Sample (First 10 rows):")
print(df.head(40))
print(f"\nTotal Records: {len(df)}")
print(f"Patients with Disease: {df['Disease'].sum()}")
print(f"Patients without Disease: {len(df) - df['Disease'].sum()}")

# Manual Bayes' Theorem Calculation
def manual_bayes_prediction(df, symptoms):
    """
    Manually calculate probability using Bayes' Theorem

    Parameters:
    df: DataFrame with training data
    symptoms: dict with symptom values {symptom_name: value}
    """
    print("\n" + "-" * 60)
    print("MANUAL BAYES' THEOREM CALCULATION")
    print("-" * 60)

    # Calculate Prior Probabilities
    total_patients = len(df)
    disease_patients = df['Disease'].sum()
    no_disease_patients = total_patients - disease_patients

    # P(Disease = 1)
    prior_disease = disease_patients / total_patients
    # P(Disease = 0)
    prior_no_disease = no_disease_patients / total_patients

    print(f"\nPRIOR PROBABILITIES:")
    print(f"P(Disease) = {prior_disease:.4f}")
    print(f"P(No Disease) = {prior_no_disease:.4f}")

    # Calculate Likelihoods for each symptom
    print(f"\nLIKELIHOODS:")

    # For Disease = 1
    likelihood_disease = 1.0
    disease_data = df[df['Disease'] == 1]

    print("\nGiven Disease:")
    for symptom, value in symptoms.items():
        # P(Symptom|Disease)
        prob = (disease_data[symptom] == value).sum() / len(disease_data)
        likelihood_disease *= prob
        print(f"  P({symptom}={value}|Disease) = {prob:.4f}")

    print(f"\nCombined Likelihood P(Symptoms|Disease) = {likelihood_disease:.4f}")

    # For Disease = 0
    likelihood_no_disease = 1.0
    no_disease_data = df[df['Disease'] == 0]

    print("\nGiven No Disease:")
    for symptom, value in symptoms.items():
        # P(Symptom|No Disease)
        prob = (no_disease_data[symptom] == value).sum() / len(no_disease_data)
        likelihood_no_disease *= prob
        print(f"  P({symptom}={value}|No Disease) = {prob:.4f}")

    print(f"\nCombined Likelihood P(Symptoms|No Disease) = {likelihood_no_disease:.4f}")

    # Calculate Posterior Probabilities (unnormalized)
    posterior_disease = likelihood_disease * prior_disease
    posterior_no_disease = likelihood_no_disease * prior_no_disease

    # Normalize
    total_posterior = posterior_disease + posterior_no_disease

    posterior_disease_normalized = posterior_disease / total_posterior
    posterior_no_disease_normalized = posterior_no_disease / total_posterior

    print(f"\n" + "=" * 60)
    print("BAYES' THEOREM RESULT:")
    print("=" * 60)
    print(f"P(Disease|Symptoms) = {posterior_disease_normalized:.4f} ({posterior_disease_normalized*100:.2f}%)")
    print(f"P(No Disease|Symptoms) = {posterior_no_disease_normalized:.4f} ({posterior_no_disease_normalized*100:.2f}%)")

    if posterior_disease_normalized > 0.5:
        print(f"\n✓ PREDICTION: Patient LIKELY HAS the disease")
    else:
        print(f"\n✗ PREDICTION: Patient LIKELY DOES NOT HAVE the disease")

    return posterior_disease_normalized

# Test Case: Patient with all symptoms
test_symptoms = {
    'Fever': 0,
    'Cough': 1,
    'Fatigue': 1,
    'Breathing_Difficulty': 1
}

print(f"\nTEST PATIENT SYMPTOMS:")
for symptom, value in test_symptoms.items():
    print(f"  {symptom}: {'Yes' if value == 1 else 'No'}")

manual_prob = manual_bayes_prediction(df, test_symptoms)





DISEASE PREDICTION SYSTEM USING BAYES' THEOREM

PART 1: MANUAL BAYES' THEOREM CALCULATION

Dataset Sample (First 10 rows):
    Fever  Cough  Fatigue  Breathing_Difficulty  Disease
0       1      1        1                     1        1
1       1      1        1                     1        1
2       0      0        1                     0        0
3       1      1        0                     1        1
4       0      0        1                     0        0
5       1      1        1                     1        1
6       1      0        1                     1        1
7       0      0        0                     0        0
8       1      1        1                     1        1
9       1      1        1                     1        1
10      0      1        0                     0        0
11      0      0        1                     0        0
12      1      1        1                     1        1
13      1      1        0                     1        1
14      0      0      

In [None]:
# ============================================================================
# PART 2: NAIVE BAYES CLASSIFIER (SKLEARN)
# ============================================================================

print("\n\n" + "=" * 60)
print("PART 2: SKLEARN NAIVE BAYES CLASSIFIER")
print("=" * 60)

# Prepare data for sklearn
X = df[['Fever', 'Cough', 'Fatigue', 'Breathing_Difficulty']]
y = df['Disease']

# Split data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.30, random_state=42)

print(f"\nTraining Set Size: {len(X_train)}")
print(f"Testing Set Size: {len(X_test)}")

# Create and train Naive Bayes model
nb_model = GaussianNB()
nb_model.fit(X_train, y_train)

print("\n✓ Model Training Complete!")

# Make predictions on test set
y_pred = nb_model.predict(X_test)

# Calculate accuracy
accuracy = accuracy_score(y_test, y_pred)
print(f"\nModel Accuracy on Test Set: {accuracy*100:.2f}%")

# Confusion Matrix
print("\nConfusion Matrix:")
cm = confusion_matrix(y_test, y_pred)
print(cm)
print("\n  [[True Negatives  False Positives]")
print("   [False Negatives True Positives]]")

# Classification Report
print("\nClassification Report:")
print(classification_report(y_test, y_pred, target_names=['No Disease', 'Disease']))

# Predict for our test patient
test_patient = [[test_symptoms['Fever'], test_symptoms['Cough'],
                 test_symptoms['Fatigue'], test_symptoms['Breathing_Difficulty']]]

prediction = nb_model.predict(test_patient)
probability = nb_model.predict_proba(test_patient)

print("\n" + "=" * 60)
print("SKLEARN MODEL PREDICTION FOR TEST PATIENT:")
print("=" * 60)
print(f"Prediction: {'Has Disease' if prediction[0] == 1 else 'No Disease'}")
print(f"Probability of No Disease: {probability[0][0]:.4f} ({probability[0][0]*100:.2f}%)")
print(f"Probability of Disease: {probability[0][1]:.4f} ({probability[0][1]*100:.2f}%)")




PART 2: SKLEARN NAIVE BAYES CLASSIFIER

Training Set Size: 28
Testing Set Size: 12

✓ Model Training Complete!

Model Accuracy on Test Set: 100.00%

Confusion Matrix:
[[4 0]
 [0 8]]

  [[True Negatives  False Positives]
   [False Negatives True Positives]]

Classification Report:
              precision    recall  f1-score   support

  No Disease       1.00      1.00      1.00         4
     Disease       1.00      1.00      1.00         8

    accuracy                           1.00        12
   macro avg       1.00      1.00      1.00        12
weighted avg       1.00      1.00      1.00        12


SKLEARN MODEL PREDICTION FOR TEST PATIENT:
Prediction: Has Disease
Probability of No Disease: 0.0509 (5.09%)
Probability of Disease: 0.9491 (94.91%)


In [None]:
# ============================================================================
# PART 3: INTERACTIVE PREDICTION
# ============================================================================

print("\n\n" + "=" * 60)
print("PART 3: INTERACTIVE DISEASE PREDICTION")
print("=" * 60)

def predict_disease_interactive():
    """Interactive function to predict disease for custom symptoms"""
    print("\nEnter patient symptoms (1 for Yes, 0 for No):")

    try:
        fever = int(input("Fever (0/1): "))
        cough = int(input("Cough (0/1): "))
        fatigue = int(input("Fatigue (0/1): "))
        breathing = int(input("Difficulty Breathing (0/1): "))

        if all(s in [0, 1] for s in [fever, cough, fatigue, breathing]):
            patient_data = [[fever, cough, fatigue, breathing]]
            prediction = nb_model.predict(patient_data)
            probability = nb_model.predict_proba(patient_data)

            print("\n" + "-" * 60)
            print("PREDICTION RESULT:")
            print("-" * 60)
            print(f"Disease Status: {'POSITIVE' if prediction[0] == 1 else 'NEGATIVE'}")
            print(f"Confidence: {max(probability[0])*100:.2f}%")
            print(f"Probability of Disease: {probability[0][1]:.4f}")
            print(f"Probability of No Disease: {probability[0][0]:.4f}")
        else:
            print("\n✗ Invalid input! Please enter only 0 or 1.")
    except ValueError:
        print("\n✗ Invalid input! Please enter only 0 or 1.")

# Uncomment the line below to enable interactive prediction
predict_disease_interactive()




PART 3: INTERACTIVE DISEASE PREDICTION

Enter patient symptoms (1 for Yes, 0 for No):
Fever (0/1): 0
Cough (0/1): 1
Fatigue (0/1): 1
Difficulty Breathing (0/1): 1

------------------------------------------------------------
PREDICTION RESULT:
------------------------------------------------------------
Disease Status: POSITIVE
Confidence: 94.91%
Probability of Disease: 0.9491
Probability of No Disease: 0.0509


In [None]:
# ============================================================================
# PROJECT SUMMARY AND CONCLUSION
# ============================================================================

print("\n\n" + "=" * 60)
print("PROJECT SUMMARY")
print("=" * 60)

print("""
OBJECTIVE:
To build a disease prediction system using Bayes' Theorem that calculates
the probability of a patient having a disease based on symptoms.

METHOD:
1. Created a dataset with 40 patient records containing 4 symptoms
2. Implemented manual Bayes' Theorem calculation from scratch
3. Used Sklearn's Gaussian Naive Bayes classifier for comparison
4. Evaluated model performance using accuracy and confusion matrix

KEY FORMULAS USED:
• Bayes' Theorem: P(Disease|Symptoms) = P(Symptoms|Disease) × P(Disease) / P(Symptoms)
• Naive Bayes assumes independence between features (symptoms)

RESULTS:
• Manual calculation and sklearn model show consistent predictions
• Model achieved high accuracy on test data
• System successfully predicts disease probability given symptoms

CONCLUSION:
Bayes' Theorem provides a probabilistic framework for disease prediction.
The Naive Bayes classifier is efficient, interpretable, and works well even
with small datasets. It's particularly useful in medical diagnosis systems
where we need to quantify uncertainty in predictions.

REAL-WORLD APPLICATIONS:
• Medical diagnosis support systems
• Disease outbreak prediction
• Patient risk assessment
• Symptom-based triage systems
""")

print("=" * 60)
print("PROJECT COMPLETE!")
print("=" * 60)



PROJECT SUMMARY

OBJECTIVE:
To build a disease prediction system using Bayes' Theorem that calculates
the probability of a patient having a disease based on symptoms.

METHOD:
1. Created a dataset with 40 patient records containing 4 symptoms
2. Implemented manual Bayes' Theorem calculation from scratch
3. Used Sklearn's Gaussian Naive Bayes classifier for comparison
4. Evaluated model performance using accuracy and confusion matrix

KEY FORMULAS USED:
• Bayes' Theorem: P(Disease|Symptoms) = P(Symptoms|Disease) × P(Disease) / P(Symptoms)
• Naive Bayes assumes independence between features (symptoms)

RESULTS:
• Manual calculation and sklearn model show consistent predictions
• Model achieved high accuracy on test data
• System successfully predicts disease probability given symptoms

CONCLUSION:
Bayes' Theorem provides a probabilistic framework for disease prediction.
The Naive Bayes classifier is efficient, interpretable, and works well even
with small datasets. It's particularly us