# **Naive Bayes Algorithm**

## **1. Introduction**

The **Naive Bayes algorithm** is a probabilistic classification algorithm based on **Bayes' Theorem**. It is called "Naive" because it assumes that the features are independent of each other given the class label, which is rarely true in real-world scenarios.

- **Key Idea**: Use probabilities to predict the class of a given input.
- **Applications**: Spam detection, sentiment analysis, document classification, and medical diagnosis.

<br><br>
![Application of naive bayes.png](../images/appli_of_naive_bayes.png)

##
---

## **2. Key Mathematical Concepts**

### **2.1 Bayes' Theorem**

$$
P(A|B) = \frac{P(B|A) \cdot P(A)}{P(B)}
$$

- \(P(A|B)\): Probability of event \(A\) (class) given event \(B\) (data).
- \(P(B|A)\): Probability of event \(B\) given event \(A\).
- \(P(A)\): Prior probability of event \(A\) (class).
- \(P(B)\): Prior probability of event \(B\) (data).

### **2.2 Naive Bayes Assumption**

For a dataset with features \(X = (x_1, x_2, $\dots$, x_n)\), the algorithm assumes that features are conditionally independent:

$$
P(X|C) = P(x_1|C) \cdot P(x_2|C) \cdot \dots \cdot P(x_n|C)
$$

Thus, the **posterior probability** is:

$$
P(C|X) = \frac{P(C) \cdot \prod_{i=1}^{n} P(x_i|C)}{P(X)}
$$

Where:
- \(C\): Class label.
- \(P(C)\): Prior probability of the class.
- \(P(x_i|C)\): Conditional probability of feature \(x_i\) given class \(C\).

##
---

## **3. Types of Naive Bayes Classifiers**

### **3.1 Gaussian Naive Bayes**

Used for **`continuous data`**, assuming features follow a normal (Gaussian) distribution:

$$
P(x_i|C) = \frac{1}{\sqrt{2\pi\sigma^2}} \exp\left(-\frac{(x_i - \mu)^2}{2\sigma^2}\right)
$$

- \($\mu$\): Mean of the feature.
- \($\sigma^2$\): Variance of the feature.

### **3.2 Multinomial Naive Bayes**

Used for **`discrete data`**, especially for text classification and document categorization (e.g., bag-of-words).

$$
P(x_i|C) = \frac{\text{Count}(x_i, C) + \alpha}{\text{Total Words in } C + \alpha \cdot |V|}
$$

- \($\alpha$\): Smoothing parameter (Laplace smoothing).
- \(|V|\): Vocabulary size.

### **3.3 Bernoulli Naive Bayes**

Used for **`binary data`**. It models the presence or absence of features:

$$
P(x_i|C) = P(x_i = 1|C)^x_i \cdot P(x_i = 0|C)^{1-x_i}
$$

##
---

## **4. Implementation**

### **4.1 Gaussian Naive Bayes Example**

Dataset: Iris Classification

```python
# Import Libraries
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score, classification_report

# Load Dataset
iris = load_iris()
X, y = iris.data, iris.target

# Split into Training and Test Sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Initialize Gaussian Naive Bayes Classifier
gnb = GaussianNB()

# Train the Model
gnb.fit(X_train, y_train)

# Predict and Evaluate
y_pred = gnb.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)

print(f"Gaussian Naive Bayes Accuracy: {accuracy:.2f}")
print(classification_report(y_test, y_pred, target_names=iris.target_names))
```

###
---

### **4.2 Multinomial Naive Bayes Example**

Dataset: Text Classification

```python
# Import Libraries
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import accuracy_score
import numpy as np

# Sample Text Data
texts = ["I love programming", "Python is great", "I hate bugs", "Debugging is fun", "I enjoy learning"]
labels = [1, 1, 0, 0, 1]  # 1: Positive, 0: Negative

# Convert Text to Feature Vectors
vectorizer = CountVectorizer()
X = vectorizer.fit_transform(texts)

# Split into Training and Test Sets
X_train, X_test, y_train, y_test = train_test_split(X, labels, test_size=0.3, random_state=42)

# Initialize Multinomial Naive Bayes Classifier
mnb = MultinomialNB()

# Train the Model
mnb.fit(X_train, y_train)

# Predict and Evaluate
y_pred = mnb.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)

print(f"Multinomial Naive Bayes Accuracy: {accuracy:.2f}")
```

###
---

### **4.3 Bernoulli Naive Bayes Example**

Dataset: Binary Features

```python
# Import Libraries
from sklearn.naive_bayes import BernoulliNB
import numpy as np

# Binary Feature Dataset
X = np.random.randint(2, size=(10, 5))
y = np.random.randint(2, size=10)

# Initialize Bernoulli Naive Bayes Classifier
bnb = BernoulliNB()

# Train the Model
bnb.fit(X, y)

# Predict
y_pred = bnb.predict(X)

print("Predictions:", y_pred)
```

##
---

## **5. Advantages and Disadvantages**

### **Advantages**

- **Fast and Efficient**: Works well with large datasets.

- **Simple Implementation**: Easy to understand and use.

- **Effective for Text Data**: Performs well in document **classification tasks**.

- **Probabilistic Output**: Provides class probabilities for better interpretability.

### **Disadvantages**

- **Feature Independence Assumption**: Real-world data often violates this assumption.

- **Sensitive to Zero Probabilities**: Requires smoothing techniques to handle zero probabilities.

- **Not Suitable for Complex Relationships**: Performs poorly when features are heavily correlated.

##
---

## **6. Extensions**

- ### Laplace Smoothing

To avoid zero probabilities in Multinomial Naive Bayes:

$$
P(x_i|C) = \frac{\text{Count}(x_i, C) + \alpha}{\text{Total Count in Class } C + \alpha \cdot |V|}
$$

- ### Hybrid Models

Combine Naive Bayes with other algorithms (e.g., Naive Bayes + SVM) for better performance.

##
---

## 7. Conclusion

- Naive Bayes is a simple yet effective algorithm for probabilistic classification.

- It works well for high-dimensional data, especially text-based datasets.

- While it has limitations due to its independence assumption, it remains a popular choice for fast and scalable applications.

##
---