**1. What is the core assumption of Naive Bayes?**

- Naive Bayes assumes that all features (or input variables) are independent of each other given the class label. This means the presence or value of one feature does not affect any other feature.

2. Differentiate between GaussianNB, MultinomialNB, and BernoulliNB:

- GaussianNB is used when features are continuous and follow a normal (Gaussian) distribution.
- MultinomialNB is used for discrete counts, like word frequencies in text data.
- BernoulliNB is used when features are binary (0 or 1), like whether a word exists in a document or not.

**3. Why is Naive Bayes considered suitable for high-dimensional data?**

- Naive Bayes is fast and simple because it makes the independence assumption, so it doesn't need to calculate complex relationships. It also works well even when the number of features is very large, like in text classification.

Task 2: Spam Detection using MultinomialNB 
- ● Load a text dataset (e.g., SMS Spam Collection or any public text 
dataset). 
- ● Preprocess using CountVectorizer or TfidfVectorizer. 
- ● Train a MultinomialNB classifier. 
- ● Evaluate: 
- ○ Accuracy 
- ○ Precision 
- ○ Recall 
- ○ Confusion Matrix 

In [5]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import accuracy_score, precision_score, recall_score, confusion_matrix
data = pd.read_csv("https://raw.githubusercontent.com/justmarkham/pycon-2016-tutorial/master/data/sms.tsv", sep='\t', header=None, names=['label', 'message'])
texts = data['message']
labels = data['label'].map({'ham': 0, 'spam': 1})
vectorizer = TfidfVectorizer()
features = vectorizer.fit_transform(texts)

trainX, testX, trainY, testY = train_test_split(features, labels, test_size=0.2, random_state=42)
model = MultinomialNB()
model.fit(trainX, trainY)
preds = model.predict(testX)
print("Accuracy--", accuracy_score(testY, preds))
print("Precision-", precision_score(testY, preds))
print("Recall", recall_score(testY, preds))
print("Confusion Matrix:\n", confusion_matrix(testY, preds))



Accuracy-- 0.9668161434977578
Precision- 1.0
Recall 0.7516778523489933
Confusion Matrix:
 [[966   0]
 [ 37 112]]


Task 3: GaussianNB with Iris or Wine Dataset 
- ● Train a GaussianNB classifier on a numeric dataset. 
- ● Split data into train/test sets. 
- ● Evaluate model performance. 
- ● Compare with Logistic Regression or Decision Tree briefly.

In [6]:
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score

iris = load_iris()
features = iris.data
targets = iris.target
# Splitting the dataset into training and testing sets
trainX, testX, trainY, testY = train_test_split(features, targets, test_size=0.3, random_state=42)
nb = GaussianNB()
nb.fit(trainX, trainY)
predNB = nb.predict(testX)
print("GaussianNB Accu:", accuracy_score(testY, predNB))
lr = LogisticRegression(max_iter=200)
lr.fit(trainX, trainY)
predLR = lr.predict(testX)
print("Logistic Regression Accuracy:-", accuracy_score(testY, predLR))

dt = DecisionTreeClassifier()
dt.fit(trainX, trainY)
predDT = dt.predict(testX)
print("Decision Tree Accu:", accuracy_score(testY, predDT))

GaussianNB Accu: 0.9777777777777777
Logistic Regression Accuracy:- 1.0
Decision Tree Accu: 1.0
