<a href="https://colab.research.google.com/github/RiverGumSecurity/AILabs/blob/main/Lab02_Testing_Pretrained_Phishing_Model.ipynb" target="_new"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Lab 02: Testing a Pre-trained Phishing Detection Model

In this lab, we'll evaluate a pre-trained AI model that detects phishing emails. This demonstrates a key concept in modern AI: **transfer learning** - using models that have already been trained on large datasets for specific tasks.

## Learning Objectives

1. Load and use a pre-trained BERT model from Hugging Face
2. Test the model against real phishing email samples
3. Evaluate how well the model performs at classification

## About the Model

We'll use the `ealvaradob/bert-finetuned-phishing` model from Hugging Face. This model:
- Is based on Google's BERT (Bidirectional Encoder Representations from Transformers)
- Was fine-tuned specifically for phishing email detection
- Classifies text as either **"phishing"** or **"benign"**

**Claimed Performance:**
- Accuracy: 97.17%
- Precision: 96.58%
- Recall: 96.70%

## Part 1: Environment Check

The required packages are pre-installed in this environment. Run this cell to verify everything is ready.

In [None]:
# Verify required packages are available
import importlib

packages = ['transformers', 'torch', 'pandas']
missing = []

for pkg in packages:
    try:
        importlib.import_module(pkg)
        print(f'[+] {pkg} is available')
    except ImportError:
        missing.append(pkg)
        print(f'[!] {pkg} is missing')

if missing:
    print(f'\n[!] Installing missing packages: {", ".join(missing)}')
    import subprocess
    subprocess.check_call(['pip', 'install', '-q'] + missing)
    print('[+] Installation complete!')
else:
    print('\n[+] All required packages are available!')

## Part 2: Import Libraries and Load the Model

Now we'll import our libraries and download the pre-trained phishing detection model.

In [None]:
import pandas as pd
import torch
import transformers
import warnings
warnings.filterwarnings('ignore')

# Detect available hardware (GPU speeds up processing significantly)
device = 'cpu'
if torch.cuda.is_available():
    device = 'cuda'
    print('[+] NVIDIA GPU detected - using CUDA')
elif torch.backends.mps.is_available():
    device = 'mps'
    print('[+] Apple Silicon GPU detected - using MPS')
else:
    print('[*] No GPU detected - using CPU (this will be slower)')

# Load the pre-trained phishing detection model
print('\n[*] Loading pre-trained phishing detection model...')
model_name = "ealvaradob/bert-finetuned-phishing"

tokenizer = transformers.AutoTokenizer.from_pretrained(model_name)
model = transformers.AutoModelForSequenceClassification.from_pretrained(model_name)

# Create a prediction pipeline - this simplifies making predictions
classifier = transformers.pipeline(
    'text-classification', 
    model=model, 
    tokenizer=tokenizer, 
    device=device,
    truncation=True
)

print('[+] Model loaded successfully!')

## Part 3: Load the Phishing Email Dataset

We'll use the same Kaggle phishing email dataset from Lab 01. This dataset contains real examples of phishing and legitimate (safe) emails.

In [None]:
import os

# Load the phishing email dataset
# Check for local copy first (pre-downloaded in Docker), otherwise download from URL
local_path = '/home/jovyan/datasets/Phishing_Email.csv.gz'
remote_url = 'https://raw.githubusercontent.com/RiverGumSecurity/Datasets/refs/heads/main/Kaggle/Phishing_Email.csv.gz'

print('[*] Loading phishing email dataset...')

if os.path.exists(local_path):
    print(f'[+] Using pre-downloaded dataset from {local_path}')
    df = pd.read_csv(local_path)
else:
    print(f'[*] Downloading dataset from remote URL...')
    df = pd.read_csv(remote_url)

# Clean up the data
df = df.drop(['Unnamed: 0'], axis=1, errors='ignore')  # Remove index column if present
df = df.dropna()  # Remove rows with missing values
df = df.drop_duplicates()  # Remove duplicate entries

# Show dataset statistics
print(f'[+] Dataset loaded: {len(df)} emails')
print(f'\nEmail Type Distribution:')
print(df['Email Type'].value_counts())

## Part 4: Select Random Test Samples

Let's randomly select 10 emails to test the model - 5 phishing emails and 5 safe emails. This gives us a balanced test set.

In [None]:
# Select 5 random phishing emails and 5 random safe emails
phishing_samples = df[df['Email Type'] == 'Phishing Email'].sample(n=5, random_state=42)
safe_samples = df[df['Email Type'] == 'Safe Email'].sample(n=5, random_state=42)

# Combine into our test set
test_samples = pd.concat([phishing_samples, safe_samples]).reset_index(drop=True)

# Shuffle the samples so they're not in order
test_samples = test_samples.sample(frac=1, random_state=123).reset_index(drop=True)

print(f'Selected {len(test_samples)} test emails:')
print(f'  - Phishing: {len(test_samples[test_samples["Email Type"] == "Phishing Email"])}')
print(f'  - Safe: {len(test_samples[test_samples["Email Type"] == "Safe Email"])}')

## Part 5: Preview the Test Emails

Let's look at snippets of our test emails to understand what the model will be analyzing.

In [None]:
# Show a preview of each test email
print('=' * 80)
print('TEST EMAIL PREVIEWS')
print('=' * 80)

for idx, row in test_samples.iterrows():
    email_text = row['Email Text']
    actual_label = row['Email Type']
    
    # Show first 200 characters of each email
    preview = email_text[:200].replace('\n', ' ') + '...' if len(email_text) > 200 else email_text.replace('\n', ' ')
    
    print(f'\nEmail #{idx + 1} [Actual: {actual_label}]')
    print('-' * 60)
    print(preview)

## Part 6: Run the Model and Evaluate Results

Now let's run each email through the BERT model and see how well it classifies them!

In [None]:
# Run predictions on all test emails
print('[*] Running phishing detection model on test emails...\n')

results = []
correct = 0
total = len(test_samples)

for idx, row in test_samples.iterrows():
    email_text = row['Email Text']
    actual_label = row['Email Type']
    
    # Get model prediction
    prediction = classifier(email_text)[0]
    predicted_label = prediction['label']  # 'phishing' or 'benign'
    confidence = prediction['score']
    
    # Convert actual label to match model output format
    actual_simple = 'phishing' if actual_label == 'Phishing Email' else 'benign'
    
    # Check if prediction is correct
    is_correct = (predicted_label == actual_simple)
    if is_correct:
        correct += 1
    
    # Store result
    results.append({
        'Email #': idx + 1,
        'Actual': actual_simple,
        'Predicted': predicted_label,
        'Confidence': f'{confidence:.1%}',
        'Correct': '✓' if is_correct else '✗'
    })

# Display results as a table
results_df = pd.DataFrame(results)
print(results_df.to_string(index=False))

# Show accuracy
accuracy = correct / total * 100
print(f'\n{"=" * 50}')
print(f'RESULTS: {correct}/{total} correct ({accuracy:.0f}% accuracy)')
print(f'{"=" * 50}')

## Part 7: Detailed Analysis

Let's look more closely at any emails the model got wrong (if any) and understand why.

In [None]:
# Find misclassified emails
misclassified = [r for r in results if r['Correct'] == '✗']

if misclassified:
    print(f'The model misclassified {len(misclassified)} email(s):\n')
    
    for miss in misclassified:
        email_num = miss['Email #']
        email_text = test_samples.iloc[email_num - 1]['Email Text']
        
        print(f'Email #{email_num}:')
        print(f'  Actual: {miss["Actual"]}')
        print(f'  Predicted: {miss["Predicted"]} ({miss["Confidence"]} confidence)')
        print(f'  Content preview: {email_text[:300]}...\n')
else:
    print('The model correctly classified all test emails!')
    print('\nThis aligns with the model\'s claimed 97% accuracy rate.')

## Part 8: Try Your Own Text

Now it's your turn! Modify the text below and run the cell to test the model with your own examples.

In [None]:
# Try your own email text!
# Replace the text below with any email content you want to test

test_text = """
URGENT: Your account has been compromised!

Dear valued customer,

We have detected suspicious activity on your account. 
Please click the link below immediately to verify your identity 
and prevent unauthorized access:

http://secure-verify-account.com/login

If you do not verify within 24 hours, your account will be suspended.

Best regards,
Security Team
"""

# Run the model
result = classifier(test_text)[0]

print('Your Test Results:')
print('=' * 40)
print(f'Classification: {result["label"].upper()}')
print(f'Confidence: {result["score"]:.1%}')
print('=' * 40)

## Conclusion

In this lab, we:

1. **Loaded a pre-trained BERT model** specifically fine-tuned for phishing detection
2. **Tested it against real emails** from a labeled dataset
3. **Evaluated its performance** and saw how well it distinguishes phishing from legitimate emails

### Key Takeaways

- **Transfer learning** allows us to use powerful models without training from scratch
- Pre-trained models can achieve high accuracy on specific tasks
- Even high-accuracy models can make mistakes - no model is perfect
- The model's confidence score indicates how certain it is about each prediction

### Discussion Questions

1. What types of emails might be hardest for the model to classify correctly?
2. How might an attacker try to craft phishing emails that evade this detection?
3. Should we trust a 97% accuracy rate for critical security decisions?
4. What are the risks of false positives (legitimate emails marked as phishing) vs false negatives (phishing emails marked as safe)?