# **Problem Statement**
The SMS Spam Collection is a set of SMS tagged messages that have been collected for SMS Spam research. It contains one set of SMS messages in English of 5,574 messages, tagged acording being ham (legitimate) or spam.



# **Load The necessary Libraries**

In [2]:
import pandas as pd
import numpy as np

# Load The Dataset

In [3]:
df = pd.read_csv('/content/spam.csv', encoding='latin-1')
df.head()

Unnamed: 0,v1,v2,Unnamed: 2,Unnamed: 3,Unnamed: 4
0,ham,"Go until jurong point, crazy.. Available only ...",,,
1,ham,Ok lar... Joking wif u oni...,,,
2,spam,Free entry in 2 a wkly comp to win FA Cup fina...,,,
3,ham,U dun say so early hor... U c already then say...,,,
4,ham,"Nah I don't think he goes to usf, he lives aro...",,,


#  Select only the 'v1' and 'v2' column

In [4]:
df = df[['v1', 'v2']]

# Rename columns for clarity

In [5]:
df.columns = ['label', 'text']

# Convert label column to numerical values (0 for ham, 1 for spam)

In [6]:
df['label'] = df['label'].map({'ham': 0, 'spam': 1})

# Split the dataset into training and testing sets

In [7]:
X = df['text']
y = df['label']

In [8]:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Convert text data to numerical features using CountVectorizer

In [9]:
from sklearn.feature_extraction.text import CountVectorizer

vectorizer = CountVectorizer()
X_train = vectorizer.fit_transform(X_train)
X_test = vectorizer.transform(X_test)


# Train a Multinomial Naive Bayes classifier

In [10]:
from sklearn.naive_bayes import MultinomialNB

clf = MultinomialNB()
clf.fit(X_train, y_train)


# Make predictions on the test set

In [11]:
y_pred = clf.predict(X_test)

#  Calculate performance metrics

In [12]:
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, confusion_matrix

accuracy = accuracy_score(y_test, y_pred)
precision = precision_score(y_test, y_pred)
recall = recall_score(y_test, y_pred)
f1 = f1_score(y_test, y_pred)
confusion_mat = confusion_matrix(y_test, y_pred)


In [13]:
# Print the performance metrics
print("Accuracy:", accuracy)
print("Precision:", precision)
print("Recall:", recall)
print("F1 Score:", f1)
print("Confusion Matrix:", confusion_mat)

Accuracy: 0.9838565022421525
Precision: 0.9852941176470589
Recall: 0.8933333333333333
F1 Score: 0.9370629370629371
Confusion Matrix: [[963   2]
 [ 16 134]]
