# Medical Diagnosis with Naive Bayes

# 1. Data Exploration:

a. Load and explore the medical dataset using Python libraries like pandas. Describe the features, labels, and the distribution of diagnoses.

In [1]:
import pandas as pd

# Load the dataset
data = pd.read_csv('Disease.csv')

data.head()

Unnamed: 0,Disease,Fever,Cough,Fatigue,Difficulty Breathing,Age,Gender,Blood Pressure,Cholesterol Level,Outcome Variable
0,Influenza,Yes,No,Yes,Yes,19,Female,Low,Normal,Positive
1,Common Cold,No,Yes,Yes,No,25,Female,Normal,Normal,Negative
2,Eczema,No,Yes,Yes,No,25,Female,Normal,Normal,Negative
3,Asthma,Yes,Yes,No,Yes,25,Male,Normal,Normal,Positive
4,Asthma,Yes,Yes,No,Yes,25,Male,Normal,Normal,Positive


# 2. Data Preprocessing:

a. Explain the necessary data preprocessing steps for preparing the medical data. This may include handling missing values, normalizing or scaling features, and encoding categorical variables.

b. Calculate the prior probabilities P(Condition) and P(No Condition) based on the class distribution.

In [2]:
#changing the values
from sklearn.preprocessing import LabelEncoder
le=LabelEncoder()

data['Disease']=le.fit_transform(data['Disease'])
data['Fever']=le.fit_transform(data['Fever'])
data['Cough']=le.fit_transform(data['Cough'])
data['Fatigue']=le.fit_transform(data['Fatigue'])
data['Difficulty Breathing']=le.fit_transform(data['Difficulty Breathing'])
data['Gender']=le.fit_transform(data['Gender'])
data['Blood Pressure']=le.fit_transform(data['Blood Pressure'])
data['Cholesterol Level']=le.fit_transform(data['Cholesterol Level'])
data['Outcome Variable']=le.fit_transform(data['Outcome Variable'])

data.head()

Unnamed: 0,Disease,Fever,Cough,Fatigue,Difficulty Breathing,Age,Gender,Blood Pressure,Cholesterol Level,Outcome Variable
0,56,1,0,1,1,19,0,1,2,1
1,24,0,1,1,0,25,0,2,2,0
2,37,0,1,1,0,25,0,2,2,0
3,6,1,1,0,1,25,1,2,2,1
4,6,1,1,0,1,25,1,2,2,1


In [3]:
data=data.fillna(method="bfill")
data.isna().sum()

Disease                 0
Fever                   0
Cough                   0
Fatigue                 0
Difficulty Breathing    0
Age                     0
Gender                  0
Blood Pressure          0
Cholesterol Level       0
Outcome Variable        0
dtype: int64

In [4]:
Positive = data['Outcome Variable'].sum()
Negative = len(data) - Positive

# Calculate the proportion of positive or negative transactions
Positive = Positive / len(data)
Negative = Negative / len(data)

print("Class Distribution:")
print(f"Positive: {Positive} ({Positive * 100:.2f}%)")
print(f"Negative: {Negative} ({Negative * 100:.2f}%)")

Class Distribution:
Positive: 0.5329512893982808 (53.30%)
Negative: 0.4670487106017192 (46.70%)


# 3. Feature Engineering:

a. Describe how to convert the medical test results and patient Information into suitable features for the Naive Bayes model.
b. Discuss the importance of feature selection or dimensionality reduction in medical diagnosis.

# 4. Implementing Naive Bayes:

a. Choose the appropriate Naive Bayes variant (e.g., Gaussian, Multinomial, or Bernoulli Naive Bayes) for the medical diagnosis task and implement the classifier using Python libraries like scikit-learn.

b. Split the dataset into training and testing sets.

In [5]:
x=data.iloc[:,:-1]
x.head(2)

Unnamed: 0,Disease,Fever,Cough,Fatigue,Difficulty Breathing,Age,Gender,Blood Pressure,Cholesterol Level
0,56,1,0,1,1,19,0,1,2
1,24,0,1,1,0,25,0,2,2


In [6]:
y=data.iloc[:,-1:]
y.head()

Unnamed: 0,Outcome Variable
0,1
1,0
2,0
3,1
4,1


In [9]:
from sklearn.model_selection import train_test_split

xtrain,xtest,ytrain,ytest=train_test_split(x,y,test_size=0.2,random_state=2)

from sklearn.naive_bayes import GaussianNB
gauss_nb=GaussianNB()
gauss_nb.fit(xtrain,ytrain)
print('Training completed  $$$$$$$$$$$')
print()
ypred=gauss_nb.predict(xtest)
print('Predicted Lable for the input samples:\n',ypred)
print('Testing is done')

Training completed  $$$$$$$$$$$

Predicted Lable for the input samples:
 [1 0 1 0 1 1 0 1 1 1 1 1 1 1 1 1 0 0 1 0 1 0 1 1 1 1 0 0 1 1 0 0 0 1 1 1 1
 0 0 0 0 1 0 0 0 0 0 0 1 0 1 1 0 0 0 1 0 1 1 0 1 1 1 1 0 0 0 0 1 1]
Testing is done


  y = column_or_1d(y, warn=True)


# 5. Model Training:

a. Train the Naive Bayes model using the feature-engineered dataset. Explain the probability estimation process in Naive Bayes for medical diagnosis.

# 6. Model Evaluation:

a. Assess the performance of the medical diagnosis model using relevant evaluation metrics, such as accuracy, precision, recall, and F1-score.
b. Interpret the results and discuss the model's ability to accurately classify medical conditions.

In [10]:
from sklearn.metrics import accuracy_score
from sklearn.metrics import confusion_matrix
from sklearn.metrics import classification_report



print('****Naive Bayes Model****')
print('Accuracy Score:\t\n',accuracy_score(ytest,ypred))
print()
print('='*80)
print('Confussinon matrix:\n',confusion_matrix(ytest,ypred))
print()
print('='*80)
print('Classificatino report: \n',classification_report(ytest,ypred))
print('='*80)

****Naive Bayes Model****
Accuracy Score:	
 0.6428571428571429

Confussinon matrix:
 [[25 18]
 [ 7 20]]

Classificatino report: 
               precision    recall  f1-score   support

           0       0.78      0.58      0.67        43
           1       0.53      0.74      0.62        27

    accuracy                           0.64        70
   macro avg       0.65      0.66      0.64        70
weighted avg       0.68      0.64      0.65        70



# 7. Laplace Smoothing:

a. Explain the concept of Laplace (add-one) smoothing and discuss its potential application in the context of medical diagnosis.
b. Discuss the impact of Laplace smoothing on model performance.


# 8. Real-World Application:

a. Describe the importance of accurate medical diagnosis in healthcare and research.
b. Discuss the practical implications of implementing a diagnostic system based on Naive Bayes.

# 9. Model Limitations:

a. Identify potential limitations of the Naive Bayes approach to medical diagnosis and discuss scenarios inwhich it may not perform well.