# Human vs Machine: Who Detects Phishing Emails Better?

This notebook explores a simple machine learning model that classifies emails as either **phishing** or **legit**, and compares its performance to that of human participants.

We use a small dataset of email examples and train a basic model using `scikit-learn`. The goal is to investigate whether an AI-based tool can outperform students at identifying suspicious emails.


### Step 1: Load the Email Dataset

We begin by importing the dataset (`realistic_emails.csv`) which contains a small collection of real and fake (phishing) emails. Each entry includes:

- `subject`: the email subject line
- `body`: the email message
- `label`: either `phishing` or `legit`


In [486]:
# Step 1: Load the training and test datasets
import pandas as pd

train_df = pd.read_csv("train_emails.csv")
test_df = pd.read_csv("test_emails.csv")

# Preview the datasets (optional)
print("Training set:")
display(train_df.head())

print("Test set:")
display(test_df.head())


Training set:


Unnamed: 0,subject,body,label
0,Your PayPal account has been suspended,"Dear Customer, We have temporarily suspended y...",phishing
1,Google Docs ­– Document shared with you,"Hello, You have been granted access to a docum...",phishing
2,Follow-up: Q2 Project Meeting,"Hi Team, Thanks for joining today’s meeting. A...",legit
3,Lunch next week?,"Hey Jane, Are you free for lunch sometime next...",legit
4,Updated Remote Work Policy,"Dear all, HR has updated the remote work polic...",legit


Test set:


Unnamed: 0,subject,body,label
0,LinkedIn Professional Network: Security Review...,"Dear [User], Our automated security system has...",phishing
1,Google Workspace Security Alert,Important: A new sign-in was detected on your ...,legit
2,E-ZPass Notice: Outstanding Balance Review,Notice: Our records indicate an unpaid toll ba...,phishing
3,Starbucks Rewards: Your July Member Bonus,Hello Rewards Member! As part of our Summer Ap...,phishing
4,Microsoft 365 Business: Action Required,Your organization's Microsoft 365 Business acc...,legit


### Step 2: Clean and Combine the Text

We now combine the email subject and body into a single field and convert all the text to lowercase.

This makes it easier for the machine learning model to process and identify patterns in the language.


In [487]:
# Step 2: Combine subject + body and lowercase everything

train_df['text'] = (train_df['subject'] + " " + train_df['body']).str.lower()
test_df['text'] = (test_df['subject'] + " " + test_df['body']).str.lower()

# Drop original columns to keep things clean (optional)
train_df = train_df.drop(columns=['subject', 'body'])
test_df = test_df.drop(columns=['subject', 'body'])

# Preview cleaned data
train_df.head()


Unnamed: 0,label,text
0,phishing,your paypal account has been suspended dear cu...
1,phishing,"google docs ­– document shared with you hello,..."
2,legit,"follow-up: q2 project meeting hi team, thanks ..."
3,legit,"lunch next week? hey jane, are you free for lu..."
4,legit,"updated remote work policy dear all, hr has up..."


### Step 3: Convert Text to Vectors Using TF-IDF

Since machine learning models work with numbers, we use `TfidfVectorizer` to transform each email into a numerical vector. (Term Frequency-Inverse Document Frequency (TF-IDF))

TF-IDF scores give higher weight to uncommon but meaningful words like "verify" or "click", and lower weight to common words like "and" or "the".


In [488]:
from sklearn.feature_extraction.text import TfidfVectorizer

# Step 3: Create and apply TF-IDF vectorizer
vectorizer = TfidfVectorizer()

# IMPORTANT: fit only on training data
X_train = vectorizer.fit_transform(train_df['text'])
X_test = vectorizer.transform(test_df['text'])  # use the same vectorizer!

# Show the shape of the resulting matrix
print(f"TF-IDF matrix shape: {X_train.shape}")


TF-IDF matrix shape: (46, 779)


### Step 4: Train the Naive Bayes Classifier

We convert the labels (`phishing`, `legit`) into numbers and split the data into training and test sets.

Then, we train a **Naive Bayes** classifier — a simple, fast model commonly used in spam detection.


In [489]:
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.preprocessing import LabelEncoder
from sklearn.naive_bayes import MultinomialNB


# Fit the vectorizer only on training data
vectorizer = TfidfVectorizer(
    stop_words='english',  # Remove common words
    ngram_range=(1, 2),    # Include unigrams and bigrams
)
X_train = vectorizer.fit_transform(train_df['text'])
X_test = vectorizer.transform(test_df['text'])  # Use same vectorizer

# Encode phishing/legit labels as 0/1
le = LabelEncoder()
y_train = le.fit_transform(train_df['label'])
y_test = le.transform(test_df['label'])  # Apply same encoder


# Train the model
modelNB = MultinomialNB(class_prior=[0.5, 0.5])  # Equal weight to phishing and legit, ignoring if training data has more phishing or more legit

modelNB.fit(X_train, y_train)


0,1,2
,alpha,1.0
,force_alpha,True
,fit_prior,True
,class_prior,"[0.5, 0.5]"


### Step 5: Evaluate the Model

We now test the model on unseen emails using several performance metrics:

- **Accuracy**: How many emails it got right overall.
- **Confusion Matrix**: Breakdown of true/false positives/negatives.
- **Precision/Recall**: How well it identifies phishing emails without making too many mistakes.


This model was trained on `train_emails.csv` and evaluated on an unseen file `test_emails.csv`. This simulates real-world behavior and ensures there's no data leakage.


In [490]:
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix

# Predict on test set and evaluate
y_pred_nb = modelNB.predict(X_test)

# Accuracy
print(f"Accuracy: {accuracy_score(y_test, y_pred_nb) * 100:.2f}%")

# Confusion Matrix
print("=== Confusion Matrix: ===")
print(confusion_matrix(y_test, y_pred_nb))

# Classification Report
print("\n=== Classification Report: ===")
print(classification_report(y_test, y_pred_nb, target_names=le.classes_))


Accuracy: 92.50%
=== Confusion Matrix: ===
[[20  2]
 [ 1 17]]

=== Classification Report: ===
              precision    recall  f1-score   support

       legit       0.95      0.91      0.93        22
    phishing       0.89      0.94      0.92        18

    accuracy                           0.93        40
   macro avg       0.92      0.93      0.92        40
weighted avg       0.93      0.93      0.93        40



### Step 6: Compare Different Machine Learning Models

We'll compare three different machine learning approaches for phishing detection:

1. **Naive Bayes (NB)**:
   - Uses probability theory
   - Fast and works well with text
   - Good baseline for text classification

2. **Logistic Regression (LR)**:
   - Linear model that's effective for binary classification
   - Easy to interpret which words influence decisions
   - Works well when classes can be separated by a line

3. **Support Vector Machine (SVM)**:
   - Finds the best boundary between classes
   - Can capture more complex relationships
   - Often performs well in text classification tasks

All models will be:
- Trained on the same TF-IDF vectorized data
- Evaluated using the same metrics 
- Tested on the same unseen test set
- Validated using cross-validation for reliability

This comparison will help us understand which algorithm performs best for phishing detection.

In [491]:
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score
from sklearn.model_selection import cross_val_score

# Initialize and train both models (NB is already trained)
modelLR = LogisticRegression(class_weight='balanced', max_iter=1000)
modelSVM = SVC(kernel='linear', class_weight='balanced', probability=True)

# Train both models
modelLR.fit(X_train, y_train)
modelSVM.fit(X_train, y_train)

# Get predictions and calculate accuracy
y_pred_lr = modelLR.predict(X_test)
y_pred_svm = modelSVM.predict(X_test)
acc_nb = accuracy_score(y_test, y_pred_nb)
acc_lr = accuracy_score(y_test, y_pred_lr)
acc_svm = accuracy_score(y_test, y_pred_svm)

print("=== Model Accuracies ===")
print(f"Naive Bayes: {acc_nb * 100:.2f}%")
print(f"Logistic Regression: {acc_lr * 100:.2f}%")
print(f"Support Vector Machine: {acc_svm * 100:.2f}%")

# Perform cross-validation
cv_scores_nb = cross_val_score(modelNB, X_train, y_train, cv=5)
cv_scores_lr = cross_val_score(modelLR, X_train, y_train, cv=5)
cv_scores_svm = cross_val_score(modelSVM, X_train, y_train, cv=5)

print("\n=== Cross-Validation Scores ===")
print(f"Naive Bayes: {cv_scores_nb.mean():.3f} (+/- {cv_scores_nb.std() * 2:.3f})")
print(f"Logistic Regression: {cv_scores_lr.mean():.3f} (+/- {cv_scores_lr.std() * 2:.3f})")
print(f"Support Vector Machine: {cv_scores_svm.mean():.3f} (+/- {cv_scores_svm.std() * 2:.3f})")

=== Model Accuracies ===
Naive Bayes: 92.50%
Logistic Regression: 90.00%
Support Vector Machine: 90.00%

=== Cross-Validation Scores ===
Naive Bayes: 0.978 (+/- 0.089)
Logistic Regression: 0.913 (+/- 0.165)
Support Vector Machine: 0.891 (+/- 0.141)


### Understanding Cross-Validation

Cross-validation helps us make sure our model's performance is reliable and not just lucky. Here's how it works:

1. The training data is split into 5 parts
2. The model is trained 5 times:
   - Each time using 4 parts for training
   - And 1 part for testing
   - Using a different test part each time

The score format `0.XXX (+/- 0.YYY)` means:
- `0.XXX` is the average accuracy across all 5 tests
- `+/- 0.YYY` shows how much the scores varied
  - Small variation = consistent performance
  - Large variation = unstable performance

This gives us more confidence in our results than a single test score. 

In other words, we are retraining the data with a slightly smaller set and testing it with remaining data in the training set.