# ***Engr.Muhammad Javed***

# 1. Naive Bayes

Probabilistic classifier based on Bayes' theorem.

## Variants
1. **Multinomial NB:** Good for word counts/TF-IDF (most common in NLP).
2. **Bernoulli NB:** Good for binary/boolean features.
3. **Gaussian NB:** Assumes normal distribution (rare for text).

In [None]:
import pandas as pd
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.naive_bayes import MultinomialNB, BernoulliNB, GaussianNB
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, classification_report

# Load Data
df_train = pd.read_csv('../Dataset/train.txt', sep=';', names=['text', 'emotion'])
df_test = pd.read_csv('../Dataset/test.txt', sep=';', names=['text', 'emotion'])

# Preprocessing
vectorizer = TfidfVectorizer(max_features=5000)
X_train = vectorizer.fit_transform(df_train['text'])
X_test = vectorizer.transform(df_test['text'])
y_train = df_train['emotion']
y_test = df_test['emotion']

# 1. Multinomial NB
mnb = MultinomialNB()
mnb.fit(X_train, y_train)
y_pred_mnb = mnb.predict(X_test)
print("Multinomial NB Accuracy:", accuracy_score(y_test, y_pred_mnb))

# 2. Bernoulli NB
bnb = BernoulliNB()
bnb.fit(X_train, y_train)
y_pred_bnb = bnb.predict(X_test)
print("Bernoulli NB Accuracy:", accuracy_score(y_test, y_pred_bnb))

## Gaussian NB Note
Gaussian NB requires dense arrays, which can consume too much memory with large vocabularies. We skip it for this large text dataset or use dimensionality reduction first.