## Spam Email Detection

In [1]:
import pandas as pd

## Reading Dataset

In [2]:
data = pd.read_csv('emails.csv')
data

Unnamed: 0,text,spam
0,Subject: naturally irresistible your corporate...,1
1,Subject: the stock trading gunslinger fanny i...,1
2,Subject: unbelievable new homes made easy im ...,1
3,Subject: 4 color printing special request add...,1
4,"Subject: do not have money , get software cds ...",1
...,...,...
5723,Subject: re : research and development charges...,0
5724,"Subject: re : receipts from visit jim , than...",0
5725,Subject: re : enron case study update wow ! a...,0
5726,"Subject: re : interest david , please , call...",0


In [3]:
data.spam.value_counts()

0    4360
1    1368
Name: spam, dtype: int64

In [4]:
data.isnull().sum()

text    0
spam    0
dtype: int64

In [5]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(data['text'], data['spam'], test_size = 0.3, random_state = 0, shuffle = True, stratify=data['spam'])

In [6]:
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.pipeline import Pipeline
from sklearn.ensemble import RandomForestClassifier

In [7]:
tfidf = TfidfVectorizer()

In [8]:
classifier = RandomForestClassifier(n_estimators = 100, n_jobs =1)

## Creating Pipeline

In [9]:
model = Pipeline([('tfidf', tfidf),
                    ('RandomForrestClassifier', classifier)])

In [10]:
model.fit(X_train, y_train)

Pipeline(memory=None,
         steps=[('tfidf',
                 TfidfVectorizer(analyzer='word', binary=False,
                                 decode_error='strict',
                                 dtype=<class 'numpy.float64'>,
                                 encoding='utf-8', input='content',
                                 lowercase=True, max_df=1.0, max_features=None,
                                 min_df=1, ngram_range=(1, 1), norm='l2',
                                 preprocessor=None, smooth_idf=True,
                                 stop_words=None, strip_accents=None,
                                 sublinear_tf=False,
                                 token_pattern='...
                 RandomForestClassifier(bootstrap=True, ccp_alpha=0.0,
                                        class_weight=None, criterion='gini',
                                        max_depth=None, max_features='auto',
                                        max_leaf_nodes=None, max_samples=None

In [11]:
data.isna().sum()

text    0
spam    0
dtype: int64

In [12]:
y_pred = model.predict(X_test)

In [13]:
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix

## Classification Report

In [14]:
print(classification_report(y_test, y_pred))

              precision    recall  f1-score   support

           0       0.96      1.00      0.98      1308
           1       1.00      0.87      0.93       411

    accuracy                           0.97      1719
   macro avg       0.98      0.94      0.96      1719
weighted avg       0.97      0.97      0.97      1719



## Confusion Matrix

In [15]:
confusion_matrix(y_test, y_pred)

array([[1308,    0],
       [  52,  359]], dtype=int64)

## Accuracy Score

In [16]:
accuracy_score(y_test,y_pred)

0.9697498545666084

## Predicting Spam or not Spam

In [17]:
email = input("Paste the email:")
detect = model.predict([email])
if detect == [0]:
    print("\n\n\n\nIt is not a spam email.")
elif detect == [1]:
    print("\n\n\n\nIt is a spam email.")

Paste the email:your online sales are low because you don _ t have enough visitors ?  submitting your website in search engines may increase  your online sales dramatically .  lf you invested time and money into your website , you  simply must submit your website  oniine otherwise it wili be invisibie virtualiy , which means efforts spent in vain .  if you want  people to know about your website and boost your revenues , the only way to do  that is to  make your site visibie in piaces  where people search for information , i . e .  submit your  website in muitiple search engines .  submit your website oniine  and watch visitors stream to your e - business .  best regards ,  kenethmckenzie _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ not interested .




It is a spam email.
