# 🌟 Naïve Bayes Algorithm in Machine Learning

## 🔹 What is Naïve Bayes?  
Naïve Bayes is a **probabilistic machine learning algorithm** based on **Bayes’ Theorem**.  
It is mainly used for **classification tasks** like spam detection, sentiment analysis, and document categorization.  

It’s called **“Naïve”** because it assumes that **all features are independent of each other**, which is rarely true in real life — but it still works surprisingly well.

---

## 🔹 Bayes’ Theorem
The algorithm is based on Bayes’ rule of probability:

\[
P(Y|X) = \frac{P(X|Y) \cdot P(Y)}{P(X)}
\]

Where:  
- **P(Y|X)** → Posterior probability (probability of class Y given feature X)  
- **P(X|Y)** → Likelihood (probability of feature X given class Y)  
- **P(Y)** → Prior probability (probability of class Y)  
- **P(X)** → Evidence (probability of feature X)  

👉 In classification, we choose the class with the **highest posterior probability**.

---

## 🔹 Why “Naïve”?  
Because it **assumes independence** between features.  
Example: In spam detection, words like *“buy”* and *“cheap”* might not be independent, but Naïve Bayes assumes they are.

---

## 🔹 Types of Naïve Bayes Classifiers
1. **Gaussian Naïve Bayes** – for continuous features, assumes data follows a normal distribution.  
2. **Multinomial Naïve Bayes** – for discrete features (e.g., word counts in text classification).  
3. **Bernoulli Naïve Bayes** – for binary/boolean features (e.g., word present or not).

---

## 🔹 Example (Spam Detection)
Suppose we want to classify an email as **Spam** or **Not Spam**.  

- Features: words like “buy”, “free”, “discount”.  
- Naïve Bayes calculates:  
  \[
  P(Spam|words) \quad \text{and} \quad P(NotSpam|words)
  \]  
- The class with higher probability is chosen.

---

## 🔹 Advantages
✅ Simple and fast to train  
✅ Works well with high-dimensional data (e.g., text classification)  
✅ Performs well even if independence assumption is false  
✅ Requires small amount of training data  

---

## 🔹 Disadvantages
❌ Assumes independence (not always true)  
❌ Struggles with highly correlated features  
❌ If a word never appears in training data, probability becomes 0 (fixed with **Laplace Smoothing**)  

---

## 🔹 Applications
- 📧 Spam filtering  
- 😀 Sentiment analysis (positive/negative reviews)  
- 📄 Document categorization  
- 🏥 Medical diagnosis  

---


In [2]:
#  Import Libraries
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB, MultinomialNB, BernoulliNB
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report


In [3]:
#  Load Dataset (Example: Iris Dataset)
from sklearn.datasets import load_iris

iris = load_iris()
X = iris.data   # Features
y = iris.target # Target (class labels)

print("Shape of Features:", X.shape)
print("Shape of Target:", y.shape)

Shape of Features: (150, 4)
Shape of Target: (150,)


In [5]:
#  Split the dataset
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

In [7]:
# Train Naïve Bayes Model
# We will use GaussianNB (best for continuous features like Iris dataset)
model = GaussianNB()
model.fit(X_train, y_train)

In [8]:
# Make Predictions
y_pred = model.predict(X_test)

print("Predictions:", y_pred[:10])  # first 10 predictions


Predictions: [1 0 2 1 1 0 1 2 1 1]


In [9]:
# Evaluate the Model
acc = accuracy_score(y_test, y_pred)
cm = confusion_matrix(y_test, y_pred)
report = classification_report(y_test, y_pred)

print("✅ Accuracy:", acc)
print("\n📊 Confusion Matrix:\n", cm)
print("\n📄 Classification Report:\n", report)


✅ Accuracy: 0.9777777777777777

📊 Confusion Matrix:
 [[19  0  0]
 [ 0 12  1]
 [ 0  0 13]]

📄 Classification Report:
               precision    recall  f1-score   support

           0       1.00      1.00      1.00        19
           1       1.00      0.92      0.96        13
           2       0.93      1.00      0.96        13

    accuracy                           0.98        45
   macro avg       0.98      0.97      0.97        45
weighted avg       0.98      0.98      0.98        45



# ✅ Conclusion

Naïve Bayes is a **simple yet powerful classification algorithm** based on probability and Bayes’ Theorem.  
Despite its **naïve assumption of feature independence**, it performs surprisingly well in many real-world tasks such as **spam filtering, sentiment analysis, and text classification**.  

- It is **fast, efficient, and works well with small datasets**.  
- It is especially effective in **high-dimensional data** like Natural Language Processing (NLP).  
- However, it struggles when **features are strongly correlated** or when data distribution does not match the model assumptions.  

👉 In summary, Naïve Bayes is an excellent **baseline model** for classification problems and often provides good results with minimal training data and computation.
