# Machine Learning 36: Naive Bayes in ML
## **1. Basic Concept & Intuition**

Naive Bayes is a **probabilistic classifier** based on **Bayes’ Theorem**. It’s called *“naive”* because it assumes that **all features are independent** of each other, which is rarely true in real-world data, but it works surprisingly well in practice.

**Intuition:**
Imagine you want to classify emails as **Spam** or **Not Spam**. Naive Bayes looks at each feature (like words in the email) and calculates the probability of the email being spam given those words. It assumes that the presence of one word does **not** depend on the presence of another word (naive assumption). Then it picks the class with the **highest probability**.


## **2. Bayes Theorem**

The core formula of Naive Bayes is **Bayes’ Theorem**:

$$
P(C|X) = \frac{P(X|C) \cdot P(C)}{P(X)}
$$

Where:

* $P(C|X)$ = Posterior probability of class $C$ given features $X$ (what we want to compute).
* $P(C)$ = Prior probability of class $C$ (based on training data).
* $P(X|C)$ = Likelihood of features $X$ given class $C$.
* $P(X)$ = Probability of features $X$ (can be ignored during classification as it’s the same for all classes).

**Stepwise reasoning:**

1. Compute **prior probability** of each class.
2. Compute **likelihood** of each feature for each class.
3. Multiply likelihoods assuming feature independence.
4. Multiply by the class prior.
5. Pick the class with the **highest posterior probability**.


## **3. Assumption of Feature Independence**

Naive Bayes assumes:

$$
P(X_1, X_2, \dots, X_n | C) = P(X_1|C) \cdot P(X_2|C) \cdot \dots \cdot P(X_n|C)
$$

* This simplifies computation.
* Even if features are correlated, Naive Bayes often performs surprisingly well.


## **4. Types of Naive Bayes**

1. **Gaussian Naive Bayes**

   * Assumes features are **continuous** and follow a **Gaussian (normal) distribution**.
   * Likelihood for a feature $x_i$ is computed as:

   $$
   P(x_i|C) = \frac{1}{\sqrt{2\pi\sigma_C^2}} \exp\Big(-\frac{(x_i-\mu_C)^2}{2\sigma_C^2}\Big)
   $$

   * Used for continuous numerical features.

2. **Multinomial Naive Bayes**

   * Used for **discrete count data**, like **word counts in text classification**.
   * Likelihood is calculated based on frequency of each feature in each class.

3. **Bernoulli Naive Bayes**

   * Used for **binary/boolean features** (0 or 1).
   * Example: word presence/absence in documents.


## **5. Step-by-Step Example**

Let’s do a small example. Suppose we want to predict if someone plays tennis based on weather:

| Outlook  | Temperature | PlayTennis |
| -------- | ----------- | ---------- |
| Sunny    | Hot         | No         |
| Sunny    | Hot         | No         |
| Overcast | Hot         | Yes        |
| Rain     | Mild        | Yes        |
| Rain     | Cool        | Yes        |
| Rain     | Cool        | No         |
| Overcast | Cool        | Yes        |
| Sunny    | Mild        | No         |
| Sunny    | Cool        | Yes        |
| Rain     | Mild        | Yes        |
| Sunny    | Mild        | Yes        |
| Overcast | Mild        | Yes        |
| Overcast | Hot         | Yes        |
| Rain     | Mild        | No         |

We want to predict: **PlayTennis = ?** if **Outlook = Sunny** and **Temperature = Cool**.


### **Step 1: Compute Prior Probabilities**

* $P(Yes) = \frac{9}{14}$
* $P(No) = \frac{5}{14}$


### **Step 2: Compute Likelihoods**

* $P(Outlook=Sunny | Yes) = \frac{2}{9}$
* $P(Temperature=Cool | Yes) = \frac{3}{9}$
* $P(Outlook=Sunny | No) = \frac{3}{5}$
* $P(Temperature=Cool | No) = \frac{1}{5}$


### **Step 3: Compute Posterior Probabilities**

$$
P(Yes|Sunny,Cool) \propto P(Yes) \cdot P(Sunny|Yes) \cdot P(Cool|Yes) 
= \frac{9}{14} \cdot \frac{2}{9} \cdot \frac{3}{9} = 0.0476
$$

$$
P(No|Sunny,Cool) \propto P(No) \cdot P(Sunny|No) \cdot P(Cool|No) 
= \frac{5}{14} \cdot \frac{3}{5} \cdot \frac{1}{5} = 0.0429
$$

* Since $0.0476 > 0.0429$, **predict Yes**.


## **6. Advantages of Naive Bayes**

* Simple and easy to implement.
* Works well with **high-dimensional data** (like text classification).
* Fast training and prediction.
* Performs well even with small datasets.


## **7. Disadvantages**

* Assumes **feature independence** — rarely true in real life.
* If a class feature never appears in training, probability becomes **0** (can be handled with **Laplace smoothing**).
* Less accurate than more complex models like Random Forest or XGBoost for some tasks.


## **8. Typical Use Cases**

* **Text Classification**: Spam detection, sentiment analysis.
* **Medical Diagnosis**: Predicting diseases based on symptoms.
* **Recommendation Systems**: Classifying items based on user preferences.
* **Real-time Prediction**: Quick, lightweight classifier for streaming data.

    
# **Summary** :
Naive Bayes is a **probabilistic, easy-to-use classifier** that works surprisingly well, especially for **text data**. Its simplicity and efficiency make it a great starting point for classification problems.


In [1]:
# Import libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.naive_bayes import GaussianNB, MultinomialNB, BernoulliNB
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report
from sklearn.datasets import load_iris


In [2]:
# load the dataset
iris = load_iris()
X = iris.data
y = iris.target

In [3]:
# train test split the data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

## Gaussian Naive Bayes

In [4]:
# model initialize
gnb = GaussianNB()

In [5]:
# train the model
gnb.fit(X_train, y_train)

In [6]:
# predict the test data
y_pred = gnb.predict(X_test)

In [7]:
# evaluate the model
print("Accuracy Score: ", accuracy_score(y_test, y_pred))
print("Confusion Matrix: \n", confusion_matrix(y_test, y_pred))
print("Classification Report: \n", classification_report(y_test, y_pred))

Accuracy Score:  0.9777777777777777
Confusion Matrix: 
 [[19  0  0]
 [ 0 12  1]
 [ 0  0 13]]
Classification Report: 
               precision    recall  f1-score   support

           0       1.00      1.00      1.00        19
           1       1.00      0.92      0.96        13
           2       0.93      1.00      0.96        13

    accuracy                           0.98        45
   macro avg       0.98      0.97      0.97        45
weighted avg       0.98      0.98      0.98        45



## Multinomial Naive Bayes

In [8]:
# model initialize
mnb = MultinomialNB()

# train the model
mnb.fit(X_train, y_train)

# predict the test data
y_pred = mnb.predict(X_test)

# evaluate the model
print("Accuracy Score: ", accuracy_score(y_test, y_pred))
print("Confusion Matrix: \n", confusion_matrix(y_test, y_pred))
print("Classification Report: \n", classification_report(y_test, y_pred))

Accuracy Score:  0.9555555555555556
Confusion Matrix: 
 [[19  0  0]
 [ 0 12  1]
 [ 0  1 12]]
Classification Report: 
               precision    recall  f1-score   support

           0       1.00      1.00      1.00        19
           1       0.92      0.92      0.92        13
           2       0.92      0.92      0.92        13

    accuracy                           0.96        45
   macro avg       0.95      0.95      0.95        45
weighted avg       0.96      0.96      0.96        45



## Bernoulli Naive Bayes

In [9]:
# model initialize
bnb = BernoulliNB()

# train the model
bnb.fit(X_train, y_train)

# predict the test data
y_pred = bnb.predict(X_test)

# evaluate the model
print("Accuracy Score: ", accuracy_score(y_test, y_pred))
print("Confusion Matrix: \n", confusion_matrix(y_test, y_pred))
print("Classification Report: \n", classification_report(y_test, y_pred))

Accuracy Score:  0.28888888888888886
Confusion Matrix: 
 [[ 0 19  0]
 [ 0 13  0]
 [ 0 13  0]]
Classification Report: 
               precision    recall  f1-score   support

           0       0.00      0.00      0.00        19
           1       0.29      1.00      0.45        13
           2       0.00      0.00      0.00        13

    accuracy                           0.29        45
   macro avg       0.10      0.33      0.15        45
weighted avg       0.08      0.29      0.13        45



  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
