## Theoretical Foundation: Bayes' Theorem


At the core of Naive Bayes lies Bayes’ Theorem, a fundamental concept in probability theory:

𝑃
(
𝐴
∣
𝐵
)
=
𝑃
(
𝐵
∣
𝐴
)
⋅
𝑃
(
𝐴
)
𝑃
(
𝐵
)
P(A∣B)= 
P(B)
P(B∣A)⋅P(A)
​
 
Where:

𝑃
(
𝐴
∣
𝐵
)
P(A∣B): Posterior Probability — Probability of class A given B occurred

𝑃
(
𝐵
∣
𝐴
)
P(B∣A): Likelihood — Probability of B given class A

𝑃
(
𝐴
)
P(A): Prior Probability — Probability of class A occurring

𝑃
(
𝐵
)
P(B): Evidence — Probability of B occurring

###  Naive Bayes Classifier: Definition
A Naive Bayes classifier is a probabilistic supervised learning algorithm based on the assumption that features are conditionally independent given the class label.

“Naive” refers to the assumption that all features are mutually independent.

### 🔹 When to Use Naive Bayes
| Scenario                                                  | Relevance                                                       |
| --------------------------------------------------------- | --------------------------------------------------------------- |
| High dimensional data                                     | Performs well even with a large number of features              |
| Real-time predictions                                     | Fast inference speed                                            |
| Text classification / spam filtering / sentiment analysis | Strong performance due to independence assumption between words |
| Small training datasets                                   | Efficient and accurate with limited data                        |

### 🔹 Types of Naive Bayes
| Variant                     | Description                                                 |
| --------------------------- | ----------------------------------------------------------- |
| **Gaussian Naive Bayes**    | Assumes features follow a normal (Gaussian) distribution    |
| **Multinomial Naive Bayes** | Ideal for document classification with discrete word counts |
| **Bernoulli Naive Bayes**   | Binary/boolean features (word present or not)               |


In [1]:
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score

# Load dataset
X, y = load_iris(return_X_y=True)

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)

# Instantiate model
model = GaussianNB()

# Train model
model.fit(X_train, y_train)

# Predict
y_pred = model.predict(X_test)

# Evaluate
print("Accuracy:", accuracy_score(y_test, y_pred))


Accuracy: 0.9333333333333333


### 🔹 Real-Time Use Cases
| Domain          | Application                         |
| --------------- | ----------------------------------- |
| Email Filtering | Spam vs Non-spam                    |
| Banking         | Fraud detection                     |
| Healthcare      | Disease diagnosis based on symptoms |
| Marketing       | Customer churn prediction           |
| Social Media    | Sentiment analysis of reviews/posts |


### 🔹 Key Advantages
Scalable: Handles large datasets efficiently

Fast: Requires fewer computations

Performs well: Especially in text-based applications

### 🔹 Limitations
Strong independence assumption rarely holds in real-world data

Zero Probability problem if a category was never seen in training

Resolved by Laplace Smoothing