# SMSSpamDetector 

SMS spam detection is a crucial task for filtering out unwanted and potentially harmful messages.
Naive Bayes is a widely used machine learning algorithm that effectively classifies text data,
Making it a suitable choice for SMS spam detection.

Importing dependencies

In [1]:
import pandas as pd
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
import pickle
import re

Dataset Preview

In [2]:
data = pd.read_csv(r'spam.csv', encoding = "ISO-8859-1")
data

Unnamed: 0,v1,v2,Unnamed: 2,Unnamed: 3,Unnamed: 4
0,ham,"Go until jurong point, crazy.. Available only ...",,,
1,ham,Ok lar... Joking wif u oni...,,,
2,spam,Free entry in 2 a wkly comp to win FA Cup fina...,,,
3,ham,U dun say so early hor... U c already then say...,,,
4,ham,"Nah I don't think he goes to usf, he lives aro...",,,
...,...,...,...,...,...
5567,spam,This is the 2nd time we have tried 2 contact u...,,,
5568,ham,Will Ì_ b going to esplanade fr home?,,,
5569,ham,"Pity, * was in mood for that. So...any other s...",,,
5570,ham,The guy did some bitching but I acted like i'd...,,,


In [3]:
messages = data['v2']
labels = data['v1'].tolist()

Preprocessing And Feature Extraction

In [4]:
def preprocess_message(message):
    message = message.lower()
    message = re.sub(r'[^\w\s]', '', message)
    message = re.sub(r'\s+', ' ', message)
    return message

preprocessed_messages = [preprocess_message(message) for message in messages]
vectorizer = CountVectorizer()
feature_matrix = vectorizer.fit_transform(preprocessed_messages)
X_train, X_test, y_train, y_test = train_test_split(feature_matrix, labels, test_size=0.2, random_state=42)

Model Training

In [5]:
model = MultinomialNB()
model.fit(X_train, y_train)

Model Evaluation

In [6]:
y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
precision = precision_score(y_test, y_pred, pos_label="ham")
recall = recall_score(y_test, y_pred, pos_label="ham")
f1 = f1_score(y_test, y_pred, pos_label="ham")

In [7]:
print("Accuracy:", accuracy)
print("Precision:", precision)
print("Recall:", recall)
print("F1-score:", f1)

Accuracy: 0.97847533632287
Precision: 0.9875647668393782
Recall: 0.9875647668393782
F1-score: 0.9875647668393782


API

In [8]:
def classify_message(message):
    preprocessed_message = preprocess_message(message)
    features = vectorizer.transform([preprocessed_message])
    prediction = model.predict_proba(features)[0]
    if prediction[0] < prediction[1]:
        print("This message is spam.")
    else:
        print("This message is ham.")

classify_message("England v Macedonia - dont miss the goals/team news. Txt ur national team to 87077 eg ENGLAND to 87077 Try:WALES, SCOTLAND 4txt/\u033c1.20 POBOXox36504W45WQ 16+")

This message is spam.


Export Model

In [9]:
with open('model.pkl', 'wb') as f:
    pickle.dump(model, f)