<a href="https://colab.research.google.com/github/alikhan1129/EMAIL-SPAM-DETECTION-WITH-MACHINE-LEARNING/blob/main/Task3.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>


# **EMAIL SPAM DETECTION WITH MACHINE LEARNING**
---


We’ve all been the recipient of spam emails before. Spam mail, or junk mail, is a type of email
that is sent to a massive number of users at one time, frequently containing cryptic
messages, scams, or most dangerously, phishing content.



In this Project, use Python to build an email spam detector. Then, use machine learning to
train the spam detector to recognize and classify emails into spam and non-spam. Let’s get
started!

In [80]:
import numpy as np
import pandas as pd
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.model_selection import train_test_split
import string
import nltk

from nltk.corpus import stopwords
from sklearn.metrics import accuracy_score, classification_report
from sklearn.svm import SVC





In [81]:
# Load the message_message_data_copy_copyset
df = pd.read_csv("/content/drive/MyDrive/csv_files/spam.csv",encoding ='latin')


In [82]:
df

Unnamed: 0,v1,v2,Unnamed: 2,Unnamed: 3,Unnamed: 4
0,ham,"Go until jurong point, crazy.. Available only ...",,,
1,ham,Ok lar... Joking wif u oni...,,,
2,spam,Free entry in 2 a wkly comp to win FA Cup fina...,,,
3,ham,U dun say so early hor... U c already then say...,,,
4,ham,"Nah I don't think he goes to usf, he lives aro...",,,
...,...,...,...,...,...
5567,spam,This is the 2nd time we have tried 2 contact u...,,,
5568,ham,Will Ì_ b going to esplanade fr home?,,,
5569,ham,"Pity, * was in mood for that. So...any other s...",,,
5570,ham,The guy did some bitching but I acted like i'd...,,,


In [83]:
df= df.rename(columns = {'v1':'Spam/Not_Spam','v2':'message'})

In [84]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5572 entries, 0 to 5571
Data columns (total 5 columns):
 #   Column         Non-Null Count  Dtype 
---  ------         --------------  ----- 
 0   Spam/Not_Spam  5572 non-null   object
 1   message        5572 non-null   object
 2   Unnamed: 2     50 non-null     object
 3   Unnamed: 3     12 non-null     object
 4   Unnamed: 4     6 non-null      object
dtypes: object(5)
memory usage: 217.8+ KB


In [85]:
df.groupby('Spam/Not_Spam').describe()

Unnamed: 0_level_0,message,message,message,message,Unnamed: 2,Unnamed: 2,Unnamed: 2,Unnamed: 2,Unnamed: 3,Unnamed: 3,Unnamed: 3,Unnamed: 3,Unnamed: 4,Unnamed: 4,Unnamed: 4,Unnamed: 4
Unnamed: 0_level_1,count,unique,top,freq,count,unique,top,freq,count,unique,top,freq,count,unique,top,freq
Spam/Not_Spam,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2
ham,4825,4516,"Sorry, I'll call later",30,45,39,"bt not his girlfrnd... G o o d n i g h t . . .@""",3,10,9,GE,2,6,5,"GNT:-)""",2.0
spam,747,653,Please call our customer service representativ...,4,5,4,PO Box 5249,2,2,1,"MK17 92H. 450Ppw 16""",2,0,0,,


In [86]:
df_copy = df['message'].copy()

In [87]:
def text_preprocess(text):
    text = text.translate(str.maketrans('', '', string.punctuation))
    text = [word for word in text.split() if word.lower() not in stopwords.words('english')]
    return " ".join(text)

In [88]:
df_copy =df_copy.apply(text_preprocess)

In [89]:
df_copy

0       Go jurong point crazy Available bugis n great ...
1                                 Ok lar Joking wif u oni
2       Free entry 2 wkly comp win FA Cup final tkts 2...
3                     U dun say early hor U c already say
4             Nah dont think goes usf lives around though
                              ...                        
5567    2nd time tried 2 contact u U å£750 Pound prize...
5568                          Ì b going esplanade fr home
5569                          Pity mood Soany suggestions
5570    guy bitching acted like id interested buying s...
5571                                       Rofl true name
Name: message, Length: 5572, dtype: object

In [90]:
tfidf = TfidfVectorizer()
X = tfidf.fit_transform(df_copy)
y = df['Spam/Not_Spam']

In [91]:
# Split the message_data_copy
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.1, random_state=2)

In [92]:
# Model selection and training (SVM)
model_svm = SVC(kernel='linear')
model_svm.fit(X_train, y_train)

In [93]:
y_pred_svm = model_svm.predict(X_test)

In [94]:
accuracy_svm = accuracy_score(y_test, y_pred_svm)
report_svm = classification_report(y_test, y_pred_svm)

print("SVM Model Accuracy:", accuracy_svm)
print("SVM Classification Report:\n", report_svm)

SVM Model Accuracy: 0.9713261648745519
SVM Classification Report:
               precision    recall  f1-score   support

         ham       0.97      1.00      0.98       490
        spam       1.00      0.76      0.87        68

    accuracy                           0.97       558
   macro avg       0.98      0.88      0.93       558
weighted avg       0.97      0.97      0.97       558



In [99]:
# Assuming you have your trained model loaded in `model_svm` and TF-IDF vectorizer loaded in `tfidf`
sample_email = "Congratulations, you've won a free iPhone! Click here to claim your prize."
"Get rich quick! Earn $1,000 a day from home with our amazing system."
"Hi there, I'm a Nigerian prince, and I need your help transferring a large sum of money."

# Preprocess the sample email
sample_email = text_preprocess(sample_email)

# Transform the sample email using the same TF-IDF vectorizer
sample_email_vector = tfidf.transform([sample_email])

# Use the trained model to predict
prediction = model_svm.predict(sample_email_vector)

if prediction[0] == 'spam':
    print("The sample email is classified as SPAM.")
else:
    print("The sample email is classified as NOT SPAM.")


The sample email is classified as SPAM.
