## Spam Detection Using Multinomial Naive Bayes Classifier

### Introduction:

Detecting spam alerts in emails and messages is one of the main applications that every big tech company tries to improve for its customers.

Apple’s official messaging app and Google’s Gmail are great examples of such applications where spam detection works well to protect users from spam alerts.

Whenever you submit details about your email or contact number on any platform, it has become easy for those platforms to market their products by advertising them by sending emails or by sending messages directly to your contact number.
This results in lots of spam alerts and notifications in your inbox. This is where the task of spam detection comes in.

Spam detection means detecting spam messages or emails by understanding text content so that you can only receive notifications about messages or emails that are very important to you. 
If spam messages are found, they are automatically transferred to a spam folder and you are never notified of such alerts. 
This helps to improve the user experience, as many spam alerts can bother many users.

The dataset I’m using can be downloaded from here:
https://github.com/JeanGermain/Data_Science_Projects/blob/main/Machine_Learning_Projects/Spam_Detection_System/Spam_Detection_Data.csv

Now let’s see how to train a machine learning model for detecting spam alerts using Python, pandas, numpy, sklearn and multinomial Naive Bayes classifier (MultinomialNB).


In [19]:
# I’ll start by importing the necessary Python libraries and the dataset needed for this task:
import pandas as pd
import numpy as np
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import MultinomialNB
data = pd.read_csv("Spam_Detection_Data.csv", encoding= 'latin-1')
data.head()

Unnamed: 0,class,message,Unnamed: 2,Unnamed: 3,Unnamed: 4
0,"ham,""Go until jurong point, crazy.. Available ...",,,,
1,ham,Ok lar... Joking wif u oni...,,,
2,spam,Free entry in 2 a wkly comp to win FA Cup fina...,,,
3,ham,U dun say so early hor... U c already then say...,,,
4,"ham,""Nah I don't think he goes to usf, he live...",,,,


From this dataset, class and message are the only features we need to train a machine learning model for spam detection, so let’s select these two columns as the new dataset:

In [24]:
data = data[["class", "message"]]
data=data.dropna().reset_index(drop=True)  # Reset index after droping all rows with NaN values
print(data)

     class                                            message
0      ham                      Ok lar... Joking wif u oni...
1     spam  Free entry in 2 a wkly comp to win FA Cup fina...
2      ham  U dun say so early hor... U c already then say...
3      ham  Even my brother is not like to speak with me. ...
4      ham  As per your request 'Melle Melle (Oru Minnamin...
...    ...                                                ...
3950   ham  Why don't you wait 'til at least wednesday to ...
3951   ham                                       Huh y lei...
3952   ham              Will ?_ b going to esplanade fr home?
3953   ham  The guy did some bitching but I acted like i'd...
3954   ham                         Rofl. Its true to its name

[3955 rows x 2 columns]


Now let’s split this dataset into training and test sets and train the model to detect spam messages:

In [25]:
x = np.array(data["message"])
y = np.array(data["class"])
cv = CountVectorizer() # transform a given text into a vector on the basis of the frequency (count) of each word that occurs in the entire text
X = cv.fit_transform(x) # Fit the Data (Fit to data, then transform it.)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)

clf = MultinomialNB()  # clf: classifier
clf.fit(X_train,y_train)

MultinomialNB()

Now let’s test this model by taking a user input as a message to detect whether it is spam or not:

In [26]:
sample = input('Enter a message:')
data = cv.transform([sample]).toarray()
print(clf.predict(data))

Enter a message:You won $40 cash price
['spam']


### Summary:

So this is how you can train a machine learning model for the task of detecting whether an email or a message is spam or not. A Spam detector detects spam messages or emails by understanding text content so that you can only receive notifications about messages or emails that are very important to you.