## NAIVE BAYES CLASSIFICATION

Naïve Bayes is a probabilistic machine learning algorithm based on Bayes' Theorem, used primarily for classification tasks. It is called "naïve" because it assumes that features are independent of each other, which is rarely true in real-world data but simplifies computation.

![image.png](attachment:image.png)

### TYPES OF NAIVE BAYES CLASSIFIERS

#### Gaussian Naive BAYES(GNB)

* Assumes features follow a normal (Gaussian) distribution.

* Used when features are continuous.

* Example: Spam classification based on word frequencies.

#### Multinomial naive Bayes(MNB)

* Used for discrete features such as word counts in text classification.

* Example: Document classification (e.g., news articles, spam detection).

#### Bernoulli Naive Bayes(BNB)

* Used for binary features (0/1).

* Example: Sentiment analysis, where words are either present or absent.

### When to Use Naïve Bayes?

Fast classification: It is computationally efficient, even for large datasets.

Text classification: Works well for spam filtering, sentiment analysis, and document categorization.

Multiclass classification: Can handle multiple class labels efficiently.

Real-time applications: Due to its simplicity and speed, it is used in real-time decision-making systems.

When feature independence is reasonable: Performs best when features are not highly correlated.

### When NOT to use Naïve Bayes?

When features are highly dependent (e.g., complex images, deep learning tasks).

If dataset size is small and class probabilities are not well-represented.


### Applications of Naïve Bayes

Spam Filtering – Classify emails as spam or not.

Sentiment Analysis – Analyze text sentiment (positive/negative).

Medical Diagnosis – Identify diseases based on symptoms.

Credit Risk Prediction – Categorize loan applicants as low/high risk.

Recommendation Systems – Suggest products based on user preferences.

Fraud Detection – Identify fraudulent transactions.

### Key Parameters in Naïve Bayes

1. Alpha (α) - Laplace Smoothing

* Used in MultinomialNB and BernoulliNB to avoid zero probabilities.

* Default is α = 1, which adds a small value to word counts.

2. Variance Smoothing

* Used in GaussianNB to handle small variance values.

* Prevents division by zero errors.

3. Prior Probabilities (class_prior)

* If None, the model calculates class priors from data.

* If manually set, can influence classification.

In [1]:
import numpy as np
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import accuracy_score

# Step 1: Sample Dataset (Emails & Labels)
emails = [
    "Win a lottery now",         # Spam
    "Get cheap tickets today",   # Spam
    "Lottery prize waiting for you",  # Spam
    "Meeting schedule update",   # Not Spam
    "Project deadline update",   # Not Spam
    "Schedule your meeting now", # Not Spam
]

labels = [1, 1, 1, 0, 0, 0]  # 1 = Spam, 0 = Not Spam

# Step 2: Convert Text Data to Feature Vectors (Bag of Words)
vectorizer = CountVectorizer()
X = vectorizer.fit_transform(emails)  # Convert text to numerical features

# Step 3: Split Data into Training & Testing Sets
X_train, X_test, y_train, y_test = train_test_split(X, labels, test_size=0.3, random_state=42)

# Step 4: Train Naïve Bayes Model
model = MultinomialNB()
model.fit(X_train, y_train)

# Step 5: Test the Model on New Data
y_pred = model.predict(X_test)

# Step 6: Calculate Accuracy
accuracy = accuracy_score(y_test, y_pred)


# Step 7: Predict a New Email
new_email = ["Congratulations! You won a free lottery ticket"]
new_email_vectorized = vectorizer.transform(new_email)
prediction = model.predict(new_email_vectorized)

print(f"Predicted Class: {'Spam' if prediction[0] == 1 else 'Not Spam'}")


Predicted Class: Spam
