# Naive Bayes Classifier for Health and Diet Prediction
**Author:** Magudeshwaran and Senthilkumaran

**Goal:** Build a Naive Bayes model to predict diet habits based on health and exercise data.

### Step 1: Import Libraries
We need `numpy` for data, `sklearn.model_selection` for splitting data, `sklearn.naive_bayes` for our model, `sklearn.metrics` for evaluation, and `sklearn.preprocessing` for encoding categorical data.

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import confusion_matrix, accuracy_score, classification_report
from sklearn.preprocessing import LabelEncoder

### Step 2: Create the Dataset
We use a small, sample dataset with categorical information about health, exercise, and diet.

In [None]:
health = np.array(['Good', 'Poor', 'Good', 'Good', 'Poor', 'Good', 'Poor', 'Good', 'Poor', 'Good'])
exercise = np.array(['Yes', 'No', 'Yes', 'Yes', 'No', 'Yes', 'No', 'Yes', 'No', 'Yes'])
diet = np.array(['Healthy', 'Unhealthy', 'Healthy', 'Healthy', 'Unhealthy', 'Healthy', 'Unhealthy', 'Healthy', 'Unhealthy', 'Healthy'])

### Step 3: Encode Categorical Data
Machine learning models work with numbers, so we convert our text labels (like 'Good', 'Poor') into numerical values using `LabelEncoder`.

In [None]:
le_health = LabelEncoder()
le_exercise = LabelEncoder()
le_diet = LabelEncoder()

health_encoded = le_health.fit_transform(health)
exercise_encoded = le_exercise.fit_transform(exercise)
diet_encoded = le_diet.fit_transform(diet)

print("Encoded Health:", health_encoded)
print("Encoded Exercise:", exercise_encoded)
print("Encoded Diet (Target):", diet_encoded)

### Step 4: Prepare Features and Target
We combine our encoded features (`health` and `exercise`) into a single input array `X` and define our target variable `y` (`diet`).

In [None]:
X = np.column_stack((health_encoded, exercise_encoded))
y = diet_encoded

### Step 5: Split Data into Training and Testing Sets
We split our data to train the model on one part and test its performance on unseen data.

In [None]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42, stratify=y)

print(f"Training samples: {len(X_train)}")
print(f"Testing samples: {len(X_test)}")

### Step 6: Train the Naive Bayes Model
We use `GaussianNB` (Gaussian Naive Bayes) which is suitable for continuous data, but can also work with encoded categorical data. We train the model with our training data.

In [None]:
model = GaussianNB()
model.fit(X_train, y_train)

print("Model trained successfully!")

### Step 7: Make Predictions
Now, we use our trained model to predict the diet habits for the test data.

In [None]:
y_pred = model.predict(X_test)

print("Predictions:", y_pred)
print("Actual values:", y_test)

### Step 8: Evaluate the Model
We check the model's accuracy and use a confusion matrix to see how many predictions were correct and incorrect.

In [None]:
accuracy = accuracy_score(y_test, y_pred)
conf_matrix = confusion_matrix(y_test, y_pred)

print(f"Accuracy: {accuracy:.2%}")
print("\nConfusion Matrix:")
print(conf_matrix)
print("\nClassification Report:")
print(classification_report(y_test, y_pred, target_names=le_diet.classes_))

### Step 9: Visualize the Confusion Matrix
A heatmap of the confusion matrix helps us easily see the model's performance.

In [None]:
plt.figure(figsize=(8, 6))
sns.heatmap(conf_matrix, annot=True, fmt='d', cmap='Blues', cbar=False,
            xticklabels=le_diet.classes_, yticklabels=le_diet.classes_)
plt.title('Confusion Matrix - Diet Prediction', fontsize=14, fontweight='bold')
plt.xlabel('Predicted Label', fontsize=12)
plt.ylabel('True Label', fontsize=12)
plt.tight_layout()
plt.show()

### Step 10: Test with New Data (Optional)
Let's test our model with new predictions to see how it performs.

In [None]:
# Example predictions with new data
new_data = np.array([
    [le_health.transform(['Good'])[0], le_exercise.transform(['Yes'])[0]],
    [le_health.transform(['Poor'])[0], le_exercise.transform(['No'])[0]]
])

new_predictions = model.predict(new_data)

print("New Predictions:")
print("Person 1 (Good health, Exercises):", le_diet.inverse_transform([new_predictions[0]])[0])
print("Person 2 (Poor health, No exercise):", le_diet.inverse_transform([new_predictions[1]])[0])