<a href="https://colab.research.google.com/github/Ambuj-Tiwari/ml-45-day-diary/blob/main/Day24.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Naive Bayes Algorithm


##  What is Naive Bayes Algorithm?

Naive Bayes is a **supervised learning** algorithm based on **Bayes‚Äô Theorem**.

It is used for **classification** problems and works well when the input features are **independent** (naive assumption).

---

## Real-World Example

Imagine you're building a spam filter. You want to classify emails as:

üì© **Spam** or ‚úÖ **Not Spam**

Based on words in the email:

* If it contains "free", "win", "cash" ‚Üí more likely **spam**
* If it contains "project", "meeting", "report" ‚Üí more likely **not spam**

---

##  Bayes‚Äô Theorem

The algorithm is based on this formula:

$$
P(A|B) = \frac{P(B|A) \cdot P(A)}{P(B)}
$$

Where:

* **P(A|B)** = Probability of class A given feature B (posterior)
* **P(B|A)** = Probability of feature B given class A (likelihood)
* **P(A)** = Probability of class A (prior)
* **P(B)** = Probability of feature B (evidence)

---

## üß™ How Naive Bayes Works (Simplified Steps)

1. **Calculate Prior Probabilities**
   Example:

   ```
   P(Spam) = No. of spam emails / Total emails
   P(Not Spam) = No. of not spam emails / Total emails
   ```

2. **Calculate Likelihood for Each Feature**
   Example:

   ```
   P("free" | Spam) = No. of spam emails with "free" / Total spam emails
   ```

3. **Apply Bayes‚Äô Theorem** to get `P(Class | Features)` for each class.

4. **Choose the class with the highest probability**.

---

## ‚úÖ Types of Naive Bayes

| Type               | Use Case                                                                    |
| ------------------ | --------------------------------------------------------------------------- |
| **Gaussian NB**    | When features are continuous and follow Gaussian distribution (bell curve). |
| **Multinomial NB** | For text classification (e.g., spam detection).                             |
| **Bernoulli NB**   | When features are binary (0/1).                                             |
| **Categorical NB** | For categorical features (e.g., sunny, rainy)                               |

---

## üí° Key Assumption

> Naive Bayes assumes that all features are **independent** of each other.
> That's why it's called **naive**.

---


## ‚úÖ Advantages

* Simple and fast
* Works well with large data
* Performs well with text classification (e.g., spam detection)

## ‚ùå Limitations

* Assumes features are independent
* Doesn't perform well if features are correlated or if data is very complex






In [1]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import CategoricalNB
from sklearn.preprocessing import LabelEncoder
from sklearn.metrics import accuracy_score, classification_report


In [None]:
df = pd.read_csv("naive_bayes_dataset.csv")
df.head()

In [None]:
le_weather = LabelEncoder()
le_temp = LabelEncoder()
le_play = LabelEncoder()

In [None]:
df['Weather'] = le_weather.fit_transform(df['Weather'])
df['Temperature'] = le_temp.fit_transform(df['Temperature'])
df['Play'] = le_play.fit_transform(df['Play'])

In [None]:
df.head()

In [None]:
X = df[['Weather', 'Temperature']]
y = df['Play']


In [None]:
# Example

#  X1 | X2 | Y
# --------------
#  3    5    9
#  3    5    9
#  3    5    9
#  3    5    9
#  3    5    9
# ------------------
#  3    5    9
#  3    5    9
#  3    5    9
#  3    5    9


# Training Part - 80
# 20

In [None]:
# Step 4: Train-test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=23)



In [None]:
# Step 5: Train Naive Bayes model
model = CategoricalNB()

model.fit(X_train, y_train)

In [None]:
# Step 6: Make predictions
y_pred = model.predict(X_test)
y_pred

In [None]:
y_test

In [None]:
# Step 7: Evaluate
print("Accuracy:", accuracy_score(y_test, y_pred))
print("\nClassification Report:\n", classification_report(y_test, y_pred))

# Naive Bayes Example 2

In [None]:
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score, classification_report

In [None]:
from sklearn.datasets import load_iris
data = load_iris()
data

In [None]:
X = data.data
y = data.target

In [None]:
y

In [None]:
# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=67)

In [None]:
# Create a Gaussian Naive Bayes model
model = GaussianNB()

In [None]:
# Train the model
model.fit(X_train, y_train)

In [None]:
# Make predictions on the test set
y_pred = model.predict(X_test)
y_pred

In [None]:
y_test

In [None]:
# Evaluate the model
print("Accuracy:", accuracy_score(y_test, y_pred))
print("\nClassification Report:\n", classification_report(y_test, y_pred))