# Task 1: Theory Questions

Ques.1. What is the core assumption of Naive Bayes?

Ans.1. The core assumption of Naive Bayes is that all features are conditionally independent of each other given the class label. This means the presence or absence of one feature does not affect the presence or absence of another, simplifying the probability computation.

Ques.2. Differentiate between GaussianNB, MultinomialNB, and BernoulliNB?

Ans.2. 

GaussianNB assumes that features follow a normal (Gaussian) distribution and is best suited for continuous data.

MultinomialNB works with discrete count data, such as word frequencies in text classification.

BernoulliNB is used when features are binary (0 or 1), indicating presence or absence of a feature, like in spam filtering.

Ques.3. Why is Naive Bayes considered suitable for high-dimensional data?

Ans.3. Naive Bayes is computationally efficient and scales well to high-dimensional data because it assumes feature independence, allowing it to learn parameters from each feature individually. This makes it especially useful for tasks like text classification where feature spaces (like vocabulary size) can be very large.

# Task 2: Spam Detection using MultinomialNB

In [None]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import accuracy_score, precision_score, recall_score, confusion_matrix

df = pd.read_csv("email.csv", sep='\t', header=None, names=["label", "message"])

df.dropna(subset=['message'], inplace=True)

df['label'] = df['label'].map({'ham': 0, 'spam': 1})

X_train, X_test, y_train, y_test = train_test_split(df['message'], df['label'], test_size=0.2, random_state=42)

vectorizer = CountVectorizer()
X_train_vec = vectorizer.fit_transform(X_train)
X_test_vec = vectorizer.transform(X_test)

clf = MultinomialNB()
clf.fit(X_train_vec, y_train)

y_pred = clf.predict(X_test_vec)

acc = accuracy_score(y_test, y_pred)
prec = precision_score(y_test, y_pred)
rec = recall_score(y_test, y_pred)
cm = confusion_matrix(y_test, y_pred)

print(f"Accuracy: {acc:.4f}")
print(f"Precision: {prec:.4f}")
print(f"Recall: {rec:.4f}")
print("Confusion Matrix:")
print(cm)


Accuracy: 1.0
Precision: 1.0000
Recall: 1.0000
Confusion Matrix:
[[132   0]
 [  0  16]]


# Task 3: GaussianNB with Iris or Wine Dataset

In [4]:
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score, classification_report

iris = load_iris()
X, y = iris.data, iris.target

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

gnb = GaussianNB()
gnb.fit(X_train, y_train)
y_pred_nb = gnb.predict(X_test)

print("GaussianNB Evaluation:")
print("Accuracy:", accuracy_score(y_test, y_pred_nb))
print(classification_report(y_test, y_pred_nb, target_names=iris.target_names))

lr = LogisticRegression(max_iter=200)
lr.fit(X_train, y_train)
y_pred_lr = lr.predict(X_test)

dt = DecisionTreeClassifier()
dt.fit(X_train, y_train)
y_pred_dt = dt.predict(X_test)

print("Model Accuracy Comparison:")
print(f"GaussianNB Accuracy: {accuracy_score(y_test, y_pred_nb):.4f}")
print(f"Logistic Regression Accuracy: {accuracy_score(y_test, y_pred_lr):.4f}")
print(f"Decision Tree Accuracy: {accuracy_score(y_test, y_pred_dt):.4f}")


GaussianNB Evaluation:
Accuracy: 1.0
              precision    recall  f1-score   support

      setosa       1.00      1.00      1.00        10
  versicolor       1.00      1.00      1.00         9
   virginica       1.00      1.00      1.00        11

    accuracy                           1.00        30
   macro avg       1.00      1.00      1.00        30
weighted avg       1.00      1.00      1.00        30

Model Accuracy Comparison:
GaussianNB Accuracy: 1.0000
Logistic Regression Accuracy: 1.0000
Decision Tree Accuracy: 1.0000


<div align="center">

<h2>Thank You!</h2>

</div>
