# Naive Bayes Classifier
Welcome to this notebook on Naive Bayes Classification. We will cover the theory, manual calculation example, and implementation using `sklearn`. This is ideal for beginners!

## 📌 Bayes’ Theorem
Bayes’ Theorem is the foundation of the Naive Bayes Classifier:

$$ P(Y|X) = \frac{P(X|Y) \cdot P(Y)}{P(X)} $$

Where:
- $P(Y|X)$: Posterior
- $P(X|Y)$: Likelihood
- $P(Y)$: Prior
- $P(X)$: Evidence

## Example Dataset (Play Tennis)

In [3]:

import pandas as pd

# Small dataset
data = pd.DataFrame({
    'Outlook': ['Sunny', 'Sunny', 'Overcast', 'Rainy', 'Rainy'],
    'Temp': ['Hot', 'Mild', 'Hot', 'Mild', 'Cool'],
    'PlayTennis': ['No', 'No', 'Yes', 'Yes', 'Yes']
})

data


Unnamed: 0,Outlook,Temp,PlayTennis
0,Sunny,Hot,No
1,Sunny,Mild,No
2,Overcast,Hot,Yes
3,Rainy,Mild,Yes
4,Rainy,Cool,Yes


## Manual Probability Calculation 
We will calculate the probability of PlayTennis = Yes/No for new input: **Outlook = Sunny**, **Temp = Cool**.

In [2]:

from collections import Counter

# Count priors
prior_counts = Counter(data['PlayTennis'])
total = len(data)
print("Prior Probabilities:")
for label in prior_counts:
    print(f"P({label}) = {prior_counts[label]}/{total} 
          = {prior_counts[label]/total:.2f}")

# Conditional probabilities with Laplace Smoothing
def laplace_smooth(count, total, num_classes):
    return (count + 1) / (total + num_classes)

# For PlayTennis = Yes
yes_data = data[data['PlayTennis'] == 'Yes']
P_Sunny_Yes = laplace_smooth(len(yes_data[yes_data['Outlook'] == 'Sunny']), len(yes_data), 3)
P_Cool_Yes = laplace_smooth(len(yes_data[yes_data['Temp'] == 'Cool']), len(yes_data), 3)
P_Yes = prior_counts['Yes'] / total

# For PlayTennis = No
no_data = data[data['PlayTennis'] == 'No']
P_Sunny_No = laplace_smooth(len(no_data[no_data['Outlook'] == 'Sunny']), len(no_data), 3)
P_Cool_No = laplace_smooth(len(no_data[no_data['Temp'] == 'Cool']), len(no_data), 3)
P_No = prior_counts['No'] / total

# Compute final probabilities
P_yes_given_X = P_Yes * P_Sunny_Yes * P_Cool_Yes
P_no_given_X = P_No * P_Sunny_No * P_Cool_No

print(f"Posterior for Yes: {P_yes_given_X:.5f}")
print(f"Posterior for No: {P_no_given_X:.5f}")

prediction = 'Yes' if P_yes_given_X > P_no_given_X else 'No'
print(f"🔮 Predicted: PlayTennis = {prediction}")


SyntaxError: unterminated f-string literal (detected at line 8) (3328918357.py, line 8)

## Using `sklearn.naive_bayes`
Now, let's use Scikit-learn to train a Naive Bayes classifier on the same data.

In [13]:
import pandas as pd
from sklearn.preprocessing import LabelEncoder
from sklearn.naive_bayes import CategoricalNB

# Sample data (as an example)
data = pd.DataFrame({
    'Outlook': ['Sunny', 'Overcast', 'Rain', 'Sunny'],
    'Temp': ['Hot', 'Mild', 'Cool', 'Cool'],
    'PlayTennis': ['No', 'Yes', 'Yes', 'No']
})

# Encode categorical data using separate encoders
df_encoded = data.copy()

encoders = {}
for col in ['Outlook', 'Temp', 'PlayTennis']:
    le = LabelEncoder()
    df_encoded[col] = le.fit_transform(df_encoded[col])
    encoders[col] = le  # store the encoder

# Features and target
X = df_encoded[['Outlook', 'Temp']]
y = df_encoded['PlayTennis']

# Train Naive Bayes model
model = CategoricalNB()
model.fit(X, y)

# New sample: Outlook = Sunny, Temp = Cool
sample = pd.DataFrame({
    'Outlook': [encoders['Outlook'].transform(['Sunny'])[0]],
    'Temp': [encoders['Temp'].transform(['Cool'])[0]]
})

# Predict
pred = model.predict(sample)
print("Predicted class (0=No, 1=Yes):", pred[0])


Predicted class (0=No, 1=Yes): 0


## Variants of Naive Bayes Algorithms in `sklearn`
Scikit-learn provides different types of Naive Bayes classifiers based on the data:

- `GaussianNB` – for continuous features (assumes normal distribution)
- `MultinomialNB` – for discrete count features (like word counts)
- `BernoulliNB` – for binary features (0 or 1)
- `CategoricalNB` – for categorical features (introduced in sklearn 1.0)

Let’s see examples of each where applicable.

### GaussianNB

In [6]:
from sklearn.naive_bayes import GaussianNB
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn.datasets import load_iris

# Load the Iris dataset (continuous numerical features)
data = load_iris()
X, y = data.data, data.target

# Split into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Train the Gaussian Naïve Bayes model
gnb = GaussianNB(priors=[0.3,0.5,0.2])
gnb.fit(X_train, y_train)

# Make predictions
y_pred = gnb.predict(X_test)

# Print Accuracy
print("Gaussian Naïve Bayes Accuracy:", accuracy_score(y_test, y_pred))


Gaussian Naïve Bayes Accuracy: 1.0


### MultinomialNB Example (Word Counts)

In [7]:

from sklearn.naive_bayes import MultinomialNB
from sklearn.feature_extraction.text import CountVectorizer

# Simple spam classification example
texts = ["Buy cheap now", "Limited offer", "Hi friend", "Let’s catch up", "Free discount"]
labels = [1, 1, 0, 0, 1]  # 1 = spam, 0 = ham

vec = CountVectorizer()
X = vec.fit_transform(texts)

model = MultinomialNB()
model.fit(X, labels)
print("Prediction for 'free offer':", model.predict(vec.transform(["free offer"]))[0])


Prediction for 'free offer': 1


In [8]:
from sklearn.naive_bayes import MultinomialNB
from sklearn.feature_extraction.text import CountVectorizer

# Sample text data (emails labeled as spam=1 or not spam=0)
text_data = ["Buy cheap medicines online", "Congratulations! You won a lottery",
             "Meeting at 3 PM", "Schedule for next week", "Discounts on your favorite items"]
labels = [1, 1, 0, 0, 1]  # 1 = Spam, 0 = Not Spam

# Convert text into a bag-of-words representation
vectorizer = CountVectorizer()
X = vectorizer.fit_transform(text_data)  
"""CountVectorizer():
This is a function from scikit-learn used to convert text into numerical feature vectors.
It tokenizes the text (splits into words), removes punctuation, lowercases by default, 
and builds a vocabulary of all unique words in the dataset.
It then counts how many times each word appears in each document"""

# Train the Multinomial Naïve Bayes model
mnb = MultinomialNB()
mnb.fit(X, labels)

# Make predictions on a new text
new_text = ["Win a free iPhone now", "Meeting rescheduled to 5 PM"]
X_new = vectorizer.transform(new_text)
predictions = mnb.predict(X_new)

print("Predictions:", predictions)  # Output: [1, 0] (Spam, Not Spam)


Predictions: [1 0]


### BernoulliNB Example (Binary Features)

In [9]:

from sklearn.naive_bayes import BernoulliNB

# Binary features (e.g., presence/absence)
X = [[1, 0, 1], [1, 1, 0], [0, 0, 1], [0, 1, 1]]
y = [1, 1, 0, 0]  # 1 = spam, 0 = ham

bnb = BernoulliNB()
bnb.fit(X, y)
print("Prediction for [1, 0, 0]:", bnb.predict([[1, 0, 0]])[0])


Prediction for [1, 0, 0]: 1


In [10]:
from sklearn.naive_bayes import BernoulliNB

# Sample binary data (e.g., whether a customer buys a product based on three features)
X = [[1, 0, 1], [1, 1, 0], [0, 1, 1], [1, 1, 1], [0, 0, 0]]
y = [1, 0, 1, 1, 0]  # 1 = Buys, 0 = Doesn't buy

# Train the Bernoulli Naïve Bayes model
bnb = BernoulliNB()
bnb.fit(X, y)

# Make predictions
new_data = [[1, 0, 0], [0, 1, 1]]
predictions = bnb.predict(new_data)

print("Predictions:", predictions)  # Output: [1, 1] (Both customers buy)


Predictions: [0 1]


## Tuning Possibilities with Naive Bayes
Although Naive Bayes has few hyperparameters, you can still explore:

- **`alpha`**: Smoothing parameter in `MultinomialNB` and `BernoulliNB`
- **`fit_prior`**: Whether to learn class priors or not

You can use **GridSearchCV** or **manual tuning**.

In [11]:

from sklearn.model_selection import GridSearchCV

params = {'alpha': [0.5, 1.0, 1.5], 'fit_prior': [True, False]}
grid = GridSearchCV(MultinomialNB(), param_grid=params, cv=3)
grid.fit(X, labels)

print("Best parameters:", grid.best_params_)
print("Best score:", grid.best_score_)


Best parameters: {'alpha': 0.5, 'fit_prior': True}
Best score: 0.3333333333333333




## Summary
- Naive Bayes is based on Bayes’ Theorem.
- It works well with small datasets and is fast.
- Assumes features are conditionally independent.
- Used widely in spam filtering, sentiment analysis, and text classification.