# Multinomial Naive Bayes Classifier

This notebook demonstrates the implementation of the **Multinomial Naive Bayes** classifier using the **20 Newsgroups text dataset**.
We cover:
- Text data preprocessing
- Model training and evaluation
- Visualization of the confusion matrix


In [None]:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.datasets import fetch_20newsgroups
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix

categories = ['alt.atheism', 'talk.religion.misc', 'comp.graphics', 'sci.space']
data = fetch_20newsgroups(subset='train', categories=categories, remove=('headers', 'footers', 'quotes'))
X = data.data
y = data.target

vectorizer = CountVectorizer(stop_words='english')
X_vec = vectorizer.fit_transform(X)

X_train, X_test, y_train, y_test = train_test_split(
    X_vec, y, test_size=0.2, random_state=42, stratify=y
)

model = MultinomialNB()
model.fit(X_train, y_train)

y_pred = model.predict(X_test)

accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy:.4f}")

print("\nClassification Report:")
print(classification_report(y_test, y_pred, target_names=data.target_names))

cm = confusion_matrix(y_test, y_pred)
plt.figure(figsize=(8, 6))
sns.heatmap(
    cm, annot=True, fmt='d', cmap='Greens',
    xticklabels=data.target_names, yticklabels=data.target_names
)
plt.title("Confusion Matrix - Multinomial Naive Bayes")
plt.xlabel("Predicted")
plt.ylabel("Actual")
plt.grid(False)
plt.show()

![Confusion Matrix - Multinomial Naive Bayes](Multinomial.png)

## Conclusion: Multinomial Naive Bayes

The Multinomial Naive Bayes classifier was applied to a subset of the 20 Newsgroups dataset.
- This variant is suitable for discrete features such as word counts.
- The model performed well for text classification tasks.
- The confusion matrix image shows the detailed class-wise prediction accuracy.
