# Email Spam Classifier (Supervised Learning)

### Step 1: Import Libraries

In [11]:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix
    

### Step 2: Load Dataset

In [12]:

# Load dataset from CSV
df = pd.read_csv("emails.csv")

# Display first few rows
df.head()
    

Unnamed: 0,label,text
0,1,ounce feather bowl hummingbird opec moment ala...
1,1,wulvob get your medircations online qnb ikud v...
2,0,computer connection from cnn com wednesday es...
3,1,university degree obtain a prosperous future m...
4,0,thanks for all your answers guys i know i shou...


### Step 3: Data Preprocessing

In [13]:

# Split data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(df['text'], df['label'], test_size=0.2, random_state=42)

# Convert text data into numerical format using TF-IDF Vectorizer
vectorizer = TfidfVectorizer(stop_words='english')
X_train_tfidf = vectorizer.fit_transform(X_train)
X_test_tfidf = vectorizer.transform(X_test)
    

### Step 4: Train Naïve Bayes Classifier

In [14]:

# Train Naïve Bayes model
model = MultinomialNB()
model.fit(X_train_tfidf, y_train)

# Predict on test data
y_pred = model.predict(X_test_tfidf)

# Evaluate model
print("Accuracy:", accuracy_score(y_test, y_pred))

Accuracy: 0.9768723786698622


In [15]:
print(classification_report(y_test, y_pred))

              precision    recall  f1-score   support

           0       0.96      0.99      0.98      7938
           1       0.99      0.96      0.98      8752

    accuracy                           0.98     16690
   macro avg       0.98      0.98      0.98     16690
weighted avg       0.98      0.98      0.98     16690



### Step 5: Test with Custom Message

In [16]:
# Custom test case
def predict_message(msg):
    msg_tfidf = vectorizer.transform([msg])
    prediction = model.predict(msg_tfidf)[0]
    return "Spam" if prediction == 1 else "Not Spam"

In [21]:
# Test with new messages
print(predict_message(input("Enter your email: ")))

Enter your email:  Hi Jeswin, Your interview has been scheduled for Eucloid Data Solutions. Interview Time- 01 Apr 2:00 PM IST JD Name- Senior Python Developer &ndash; Generative AI Applications Call Type- Video (Conference will be recorded) Coding during interview - Mandatory, Laptop is required.  To join online conference(Room-14-GM), to see JD and to write code during interview, please visit https://www.cangra.com/?cnd=pfgson3g2xtcurx4296596  To add this interview to your Calendar, please press 'Add to Calendar' /'Accept' button in mail.  How to join (Video Tutorial): https://www.cangra.com/tutorial  Joining this conference will be considered as your consent for recording. Please 'reply to all' in this mail if you have any concern.  This email is being sent on behalf of Eucloid Data Solutions If you would like to verify the legitimacy of the source, please contact the recruiter in CC with domain name @eucloid.com.  Regards, CANGRA Talents Interview Services


Not Spam


Hey Jeswin,
 
I gotta tell you something crazy…
 
A few years ago, I was just like you—curious about cybersecurity but overwhelmed by all the technical jargon. Everyone said, "You need a degree, years of experience, and certifications to even get started."
 
Yeah, right.
 
Fast forward to today, and I've trained thousands of students to become ethical hackers—and many of them had zero prior experience. Some even landed high-paying jobs right after completing my course!
 
So, what's the secret?
 
Simple. You just need the right roadmap.
 
And I've put it all inside my Certified Cyber Warrior Course—a step-by-step program designed to take you from beginner to pro, even if you've never written a line of code before.
 
And here's the kicker…
 
Today I have launched Nav Varsha Special 0ffer.
For the next 24 hours only, you can grab it for 98% 0ff (yep, you read that right).
 
👉 Click here to claim your access now
 
But once this timer hits zero, the price goes back up. So if you've ever thought about getting into cybersecurity, this is your chance.
 
 
To Your Success,
Gautam Kumawat
Founder, HackingFlix
 
 
P.S. Don't wait too long—the last time I ran this deal, spots filled up FAST. Click here to secure yours now
 
ENROLL NOW
Ending at 11 PM Today!
 
Alert: Prices will increase after 0ffer ends.

Hi Jeswin,
Your interview has been scheduled for Eucloid Data Solutions.
Interview Time- 01 Apr 2:00 PM IST
JD Name- Senior Python Developer &ndash; Generative AI Applications
Call Type- Video (Conference will be recorded)
Coding during interview - Mandatory, Laptop is required.

To join online conference(Room-14-GM), to see JD and to write code during interview, please visit
https://www.cangra.com/?cnd=pfgson3g2xtcurx4296596

To add this interview to your Calendar, please press 'Add to Calendar' /'Accept' button in mail.

How to join (Video Tutorial): https://www.cangra.com/tutorial

Joining this conference will be considered as your consent for recording. Please 'reply to all' in this mail if you have any concern.

This email is being sent on behalf of Eucloid Data Solutions If you would like to verify the legitimacy of the source, please contact the recruiter in CC with domain name @eucloid.com.

Regards,
CANGRA Talents Interview Services