<a href="https://colab.research.google.com/github/MussaddikKhan/Data-Science-College-Practicals-/blob/main/Experiment_No_5.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Experiment – 5**  
**Date:**  
**Roll No.: 24201013**  
**Title:** *Naive Bayes Classification Algorithm*

---

## **Theory**

Naive Bayes is a **supervised classification algorithm** based on **Bayes’ Theorem**.  
It predicts the class of a data point by calculating the probability of each class and selecting the one with the **highest posterior probability**.

It is called *“Naive”* because the algorithm assumes that all features are **independent** of each other, even though in real-world data this is rarely true.  
Despite this assumption, Naive Bayes performs extremely well for text-based applications such as:

- Email spam detection  
- Sentiment analysis  
- Document classification  

---

## **Bayes' Theorem**

<br>

$$
P(A \mid B) = \frac{P(B \mid A)\, P(A)}{P(B)}
$$

<br>

Where:  
- \(P(A \mid B)\) = Posterior probability  
- \(P(B \mid A)\) = Likelihood  
- \(P(A)\) = Prior probability  
- \(P(B)\) = Evidence  

---

## **Naive Bayes Classification Formula**

For a class \(C\) with features \(x_1, x_2, \dots, x_n\):

<br>

$$
P(C \mid x_1, x_2, \dots, x_n) =
\frac{P(C)\, P(x_1 \mid C)\, P(x_2 \mid C)\dots P(x_n \mid C)}
{P(x_1, x_2, \dots, x_n)}
$$

<br>

Since the denominator is constant for all classes, we use:

<br>

$$
P(C \mid X) \propto P(C)\; \prod_{i=1}^{n} P(x_i \mid C)
$$

<br>

Higher value → class with maximum probability → final prediction.

---

## **Step-wise Explanation of Naive Bayes**

### **Step 1 — Calculate Prior Probability**
Probability of each class occurring in the dataset.

### **Step 2 — Calculate Likelihood**
Probability of each feature value given each class.

### **Step 3 — Apply Naive Independence**
Multiply all likelihoods assuming features are independent.

### **Step 4 — Apply Bayes’ Theorem**
Compute the posterior for each class.

### **Step 5 — Choose the Class with Highest Posterior**
Class with maximum probability becomes the final prediction.

---

## **Advantages**
- Fast and efficient  
- Works well for text and categorical data  
- Requires very little training data  
- Very easy to implement  

## **Disadvantages**
- Assumes features are independent (rare in real data)  
- Performs poorly when strong feature dependency exists  

---


In [1]:
import pandas as pd
from sklearn.naive_bayes import CategoricalNB
from sklearn.metrics import accuracy_score

# ---------------------------------------
# Step 1: Create Dataset (PlayTennis)
# ---------------------------------------
data = [
    [1, 1, 1, 1],
    [1, 1, 1, 2],
    [2, 1, 1, 1],
    [3, 2, 1, 1],
    [3, 3, 2, 1],
    [3, 3, 2, 2],
    [2, 3, 2, 2],
    [1, 2, 1, 1],
    [1, 3, 2, 1],
    [3, 2, 2, 1],
    [1, 2, 2, 2],
    [2, 2, 1, 2],
    [2, 1, 2, 1],
    [3, 2, 1, 2]
]

# Labels: 1 = Yes, 0 = No
target = [0, 0, 1, 1, 1, 0, 1, 0, 1, 1, 1, 1, 1, 0]

attributes = ['Outlook', 'Temperature', 'Humidity', 'Wind']

# Convert to DataFrame
data_df = pd.DataFrame(data, columns=attributes)
target_series = pd.Series(target, dtype="category")

# ---------------------------------------
# Step 2: Train Naive Bayes Classifier
# ---------------------------------------
model = CategoricalNB()
model.fit(data_df, target_series)

# ---------------------------------------
# Step 3: Predict on Training Data
# ---------------------------------------
predicted_labels = model.predict(data_df)
accuracy = accuracy_score(target_series, predicted_labels) * 100

print(f"Accuracy on training dataset: {accuracy:.2f}%")

# ---------------------------------------
# Step 4: Predict a New Example
# Example: [Outlook=1, Temperature=1, Humidity=1, Wind=2]
# ---------------------------------------
new_data = pd.DataFrame([[1, 1, 1, 2]], columns=attributes)
prediction = model.predict(new_data)[0]

print("Prediction for new data:", "Yes" if prediction == 1 else "No")


Accuracy on training dataset: 92.86%
Prediction for new data: No
