# **Naive Bayes**




### **1️⃣ What is Naive Bayes?**

* A **probabilistic classifier** based on **Bayes’ Theorem**.
* Assumes **features are independent given the class** (Naive assumption).
* Predicts class probabilities:

$P(C|X) = \frac{P(X|C) \cdot P(C)}{P(X)}$


**Where:**

* **Prior:** (P(C)) → probability of class before seeing features
* **Likelihood:** (P(X|C)) → probability of features given class
* **Posterior:** (P(C|X)) → probability of class after seeing features

---

### **2️⃣ Advantages**

* Simple, fast, and memory-efficient.
* Works well for large datasets and high-dimensional data.
* Handles categorical and discrete data naturally.

### **3️⃣ Limitations**

* Assumes feature independence, which is often unrealistic.
* Poor performance if features are highly correlated.
* Doesn’t capture complex relationships.

---

## **4️⃣ Types of Naive Bayes**

| Type          | Features Type      | Distribution | Key Parameter   | Example                    |
| ------------- | ------------------ | ------------ | --------------- | -------------------------- |
| GaussianNB    | Continuous numeric | Normal       | var_smoothing   | Predicting disease values  |
| MultinomialNB | Count data         | Multinomial  | alpha           | Text classification, spam  |
| BernoulliNB   | Binary 0/1         | Bernoulli    | alpha, binarize | Word presence/absence spam |

---

### **6️⃣ When to Use Naive Bayes**

* **Text classification**: spam detection, sentiment analysis
* **Medical diagnosis**: disease prediction
* **Recommendation systems**: lightweight predictions with categorical features
* **Real-time or large-scale applications**: fast and memory-efficient


### **a) Gaussian Naive Bayes**

* For numeric continuous features (e.g., iris dataset).

In [1]:
from sklearn.naive_bayes import GaussianNB
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn.datasets import load_iris

# Load the Iris dataset (continuous numerical features)
data = load_iris()
X, y = data.data, data.target

# Split into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Train the Gaussian Naïve Bayes model
gnb = GaussianNB()       # priors=[0.3,0.5,0.2]
gnb.fit(X_train, y_train)

# Make predictions
y_pred = gnb.predict(X_test)

# Print Accuracy
print("Gaussian Naïve Bayes Accuracy:", accuracy_score(y_test, y_pred))


Gaussian Naïve Bayes Accuracy: 0.9777777777777777



### **b) Multinomial Naive Bayes**

* For **count or categorical data**, commonly used for **text classification**.

In [2]:
from sklearn.naive_bayes import MultinomialNB
from sklearn.feature_extraction.text import CountVectorizer

# Sample text data (emails labeled as spam=1 or not spam=0)
text_data = ["Buy cheap medicines online", "Congratulations! You won a lottery. Congratulations",
             "Meeting at 3 PM", "Schedule for next week", "Discounts on your favorite items"]
labels = [1, 1, 0, 0, 1]  # 1 = Spam, 0 = Not Spam

# Convert text into a bag-of-words representation
vectorizer = CountVectorizer()
X = vectorizer.fit_transform(text_data)

# Train the Multinomial Naïve Bayes model
mnb = MultinomialNB()
mnb.fit(X, labels)

# Make predictions on a new text
new_text = ["Meeting rescheduled to 5 PM"]
X_new = vectorizer.transform(new_text)
predictions = mnb.predict(X_new)

print("Predictions:", predictions)  # Output: [1, 0] (Spam, Not Spam)


Predictions: [0]


In [3]:
import pandas as pd

# Get feature (word) names
feature_names = vectorizer.get_feature_names_out()

# Convert to DataFrame for better readability
df = pd.DataFrame(X.toarray(), columns=feature_names)
df


Unnamed: 0,at,buy,cheap,congratulations,discounts,favorite,for,items,lottery,medicines,meeting,next,on,online,pm,schedule,week,won,you,your
0,0,1,1,0,0,0,0,0,0,1,0,0,0,1,0,0,0,0,0,0
1,0,0,0,2,0,0,0,0,1,0,0,0,0,0,0,0,0,1,1,0
2,1,0,0,0,0,0,0,0,0,0,1,0,0,0,1,0,0,0,0,0
3,0,0,0,0,0,0,1,0,0,0,0,1,0,0,0,1,1,0,0,0
4,0,0,0,0,1,1,0,1,0,0,0,0,1,0,0,0,0,0,0,1


### **c) Bernoulli Naive Bayes**

* For **binary features** (0/1).

In [4]:
from sklearn.naive_bayes import BernoulliNB

# Sample binary data (e.g., whether a customer buys a product based on three features)
X = [[1, 0, 1], [1, 1, 0], [0, 1, 1], [1, 1, 1], [0, 0, 0]]
y = [1, 0, 1, 1, 0]  # 1 = Buys, 0 = Doesn't buy

# Train the Bernoulli Naïve Bayes model
bnb = BernoulliNB()
bnb.fit(X, y)

# Make predictions
new_data = [[1, 0, 0]]
predictions = bnb.predict(new_data)

print("Predictions:", predictions)  # Output: [1, 1] (Both customers buy)


Predictions: [0]


# Assignment

[Dataset](https://drive.google.com/drive/folders/1bRuM2RJ3CGD5_PtnIr7jVbC5kOJW5o6x)