<a href="https://colab.research.google.com/github/RiverGumSecurity/AILabs/blob/main/Lab03_LLM_Generated_Phishing/LLM_Generated_Phishing.ipynb" target="_new"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Lab 03: LLM-Generated Phishing Detection

In this lab, we explore the intersection of generative AI and cybersecurity by investigating whether Large Language Models (LLMs) can generate phishing emails that evade detection by machine learning-based classifiers.

## Learning Objectives

1. Understand how to interact with LLM APIs (OpenAI and Anthropic)
2. Generate synthetic phishing content using prompt engineering
3. Evaluate generated content against a pre-trained phishing detection model
4. Analyze the implications for cybersecurity defenses

## Background

In Lab 01, we trained statistical machine learning models to detect phishing emails. In Lab 02, we used a pre-trained BERT model fine-tuned for phishing detection. Now we will explore the **adversarial** side: can an attacker use generative AI to craft phishing emails that bypass these defenses?

This lab demonstrates a critical concept in AI security: the **arms race** between offensive and defensive AI capabilities. As defenders build better detection models, attackers may leverage the same AI technologies to generate more convincing attacks.

**Ethical Note**: This lab is for educational purposes only. The techniques demonstrated here should only be used in authorized security testing, research, and defensive planning contexts.

## Part 1: Environment Setup

The required packages are pre-installed in this environment. We need:
- `openai` - OpenAI API client
- `anthropic` - Anthropic API client  
- `transformers` - HuggingFace transformers for the BERT model
- `torch` - PyTorch for model inference

In [None]:
# Verify required packages are available
import importlib

packages = ['openai', 'anthropic', 'transformers', 'torch', 'huggingface_hub']
missing = []

for pkg in packages:
    try:
        importlib.import_module(pkg)
        print(f'[+] {pkg} is available')
    except ImportError:
        missing.append(pkg)
        print(f'[!] {pkg} is missing')

if missing:
    print(f'\n[!] Installing missing packages: {", ".join(missing)}')
    import subprocess
    subprocess.check_call(['pip', 'install', '-q'] + missing)
    print('[+] Installation complete!')
else:
    print('\n[+] All required packages are available!')

In [None]:
#################################################
## Lab 03: LLM-Generated Phishing Detection
## AI for Cybersecurity Professionals
#################################################
import os
import sys
import pathlib
import torch
import huggingface_hub
import transformers
import pandas as pd
import matplotlib.pyplot as plt
from typing import Optional

# LLM API clients
import openai
import anthropic

print(f'Python version: {sys.version}')
print(f'PyTorch version: {torch.__version__}')
print(f'Transformers version: {transformers.__version__}')

## Part 2: API Key Configuration

You will need API keys for the services you want to use:

- **OpenAI API Key**: Get one at https://platform.openai.com/api-keys
- **Anthropic API Key**: Get one at https://console.anthropic.com/
- **HuggingFace API Key**: Get one at https://huggingface.co/settings/tokens

You can configure these in several ways:
1. Environment variables (`OPENAI_API_KEY`, `ANTHROPIC_API_KEY`, `HF_TOKEN`)
2. Files in your home directory (`.openai_key`, `.anthropic_key`, `.hfkey`)
3. Google Colab secrets (if running in Colab)
4. Direct input when prompted

In [None]:
def get_api_key(key_name: str, env_var: str, file_name: str) -> Optional[str]:
    """
    Retrieve an API key from various sources.
    Priority: Environment variable > File > Colab secrets > None
    """
    # Check environment variable first
    key = os.environ.get(env_var)
    if key:
        print(f'[+] {key_name} loaded from environment variable')
        return key
    
    # Check file in home directory
    key_file = pathlib.Path.home() / file_name
    if key_file.exists():
        with open(key_file) as f:
            key = f.read().strip()
            if key:
                print(f'[+] {key_name} loaded from ~/{file_name}')
                return key
    
    # Check Google Colab secrets
    if 'google.colab' in sys.modules:
        try:
            from google.colab import userdata
            key = userdata.get(env_var)
            if key:
                print(f'[+] {key_name} loaded from Colab secrets')
                return key
        except Exception:
            pass
    
    print(f'[-] {key_name} not found')
    return None

# Load API keys
OPENAI_API_KEY = get_api_key('OpenAI API Key', 'OPENAI_API_KEY', '.openai_key')
ANTHROPIC_API_KEY = get_api_key('Anthropic API Key', 'ANTHROPIC_API_KEY', '.anthropic_key')
HF_API_KEY = get_api_key('HuggingFace API Key', 'HF_TOKEN', '.hfkey')

# Determine which LLM provider to use
LLM_PROVIDER = None
if OPENAI_API_KEY:
    LLM_PROVIDER = 'openai'
elif ANTHROPIC_API_KEY:
    LLM_PROVIDER = 'anthropic'

if LLM_PROVIDER:
    print(f'\n[*] Will use {LLM_PROVIDER.upper()} for text generation')
else:
    print('\n[!] WARNING: No LLM API key found. Please set OPENAI_API_KEY or ANTHROPIC_API_KEY')

### Manual API Key Entry (Optional)

If your API keys were not automatically detected, you can enter them manually below. **Do not commit notebooks with API keys to version control!**

In [None]:
# Uncomment and fill in if needed:

# OPENAI_API_KEY = 'sk-...'  # Your OpenAI API key
# ANTHROPIC_API_KEY = 'sk-ant-...'  # Your Anthropic API key
# HF_API_KEY = 'hf_...'  # Your HuggingFace API key

# Update provider selection if you set keys manually
if OPENAI_API_KEY and not LLM_PROVIDER:
    LLM_PROVIDER = 'openai'
    print('[*] Will use OPENAI for text generation')
elif ANTHROPIC_API_KEY and not LLM_PROVIDER:
    LLM_PROVIDER = 'anthropic'
    print('[*] Will use ANTHROPIC for text generation')

## Part 3: Load the Phishing Detection Model

We'll use the same BERT-based phishing detection model from Lab 02: `ealvaradob/bert-finetuned-phishing`

This model is pre-downloaded in the lab environment for faster loading.

**Model Performance:**
- **Accuracy**: 97.17%
- **Precision**: 96.58%
- **Recall**: 96.70%
- **False Positive Rate**: 2.49%

In [None]:
# Detect available compute device
device = 'cpu'
if torch.cuda.is_available():
    device = 'cuda'
elif torch.backends.mps.is_available():
    device = 'mps'
print(f'[*] Using device: {device}')

# Load the phishing detection model (pre-downloaded in Docker environment)
model_name = "ealvaradob/bert-finetuned-phishing"
print(f'[*] Loading model: {model_name}')

# Create tokenizer, model, and prediction pipeline
# The model is cached locally, so this will be fast
tokenizer = transformers.AutoTokenizer.from_pretrained(model_name)
model = transformers.AutoModelForSequenceClassification.from_pretrained(model_name)
phishing_detector = transformers.pipeline(
    'text-classification', 
    model=model, 
    tokenizer=tokenizer, 
    device=device,
    truncation=True
)

print('[+] Phishing detection model loaded successfully!')

### Test the Phishing Detector

Let's verify the model works by testing it with some sample emails.

In [None]:
# Test samples
test_emails = [
    "Hi John, just wanted to follow up on our meeting yesterday. Let me know if you have any questions about the project timeline.",
    "URGENT: Your account has been compromised! Click here immediately to verify your identity: http://secure-bank-login.com/verify",
    "Dear valued customer, we noticed suspicious activity on your PayPal account. Please confirm your details within 24 hours or your account will be suspended.",
    "Thanks for the great presentation today! I'll send over the slides by end of day."
]

print('Testing phishing detector with sample emails:\n')
print('-' * 80)
for i, email in enumerate(test_emails, 1):
    result = phishing_detector(email)[0]
    label = result['label']
    score = result['score']
    print(f'Email {i}: {email[:60]}...')
    print(f'  -> Classification: {label.upper()} (confidence: {score:.2%})')
    print()

## Part 4: LLM Text Generation Functions

Now we'll create functions to generate text using either OpenAI or Anthropic APIs. These functions will be used to generate synthetic phishing emails.

### Understanding the APIs

- **OpenAI API**: Uses the Chat Completions endpoint with models like `gpt-4.1-mini`
- **Anthropic API**: Uses the Messages endpoint with models like `claude-sonnet-4-20250514`

Both APIs follow a similar pattern: you send a system prompt (optional) and user message, and receive a generated response.

In [None]:
def generate_with_openai(prompt: str, system_prompt: str = None, model: str = "gpt-4.1-mini") -> str:
    """
    Generate text using OpenAI's API.
    
    Args:
        prompt: The user prompt/instruction
        system_prompt: Optional system-level instructions
        model: The OpenAI model to use
    
    Returns:
        Generated text response
    """
    client = openai.OpenAI(api_key=OPENAI_API_KEY)
    
    messages = []
    if system_prompt:
        messages.append({"role": "system", "content": system_prompt})
    messages.append({"role": "user", "content": prompt})
    
    response = client.chat.completions.create(
        model=model,
        messages=messages,
        max_tokens=1024,
        temperature=0.7
    )
    
    return response.choices[0].message.content


def generate_with_anthropic(prompt: str, system_prompt: str = None, model: str = "claude-sonnet-4-20250514") -> str:
    """
    Generate text using Anthropic's API.
    
    Args:
        prompt: The user prompt/instruction
        system_prompt: Optional system-level instructions
        model: The Anthropic model to use
    
    Returns:
        Generated text response
    """
    client = anthropic.Anthropic(api_key=ANTHROPIC_API_KEY)
    
    kwargs = {
        "model": model,
        "max_tokens": 1024,
        "messages": [{"role": "user", "content": prompt}]
    }
    
    if system_prompt:
        kwargs["system"] = system_prompt
    
    response = client.messages.create(**kwargs)
    
    return response.content[0].text


def generate_text(prompt: str, system_prompt: str = None) -> str:
    """
    Generate text using the configured LLM provider.
    """
    if LLM_PROVIDER == 'openai':
        return generate_with_openai(prompt, system_prompt)
    elif LLM_PROVIDER == 'anthropic':
        return generate_with_anthropic(prompt, system_prompt)
    else:
        raise ValueError("No LLM provider configured. Please set an API key.")

print(f'[+] LLM generation functions defined (provider: {LLM_PROVIDER})')

## Part 5: Generating Phishing Emails with LLMs

Now we'll use the LLM to generate synthetic phishing emails. We'll explore different prompting strategies and see how well the generated emails evade the phishing detector.

### Ethical Context

This exercise simulates what an attacker might do to test and improve their phishing campaigns. Understanding these techniques helps security professionals:
1. Build better detection systems
2. Create more realistic training data
3. Develop security awareness training materials

In [None]:
# Define prompts for generating phishing emails
# These prompts are designed for security research and awareness training

SYSTEM_PROMPT = """You are a cybersecurity researcher creating phishing email examples 
for security awareness training purposes. Generate realistic but clearly educational 
examples that demonstrate common phishing tactics."""

phishing_prompts = [
    # Basic urgency-based phishing
    """For a security awareness training module, write an example phishing email 
    that pretends to be from a bank warning about suspicious account activity. 
    Include typical phishing elements like urgency and a call to action.""",
    
    # Tech support scam
    """Create an example phishing email for training purposes that impersonates 
    IT support, claiming the recipient's password is expiring and they need to 
    update it immediately.""",
    
    # Package delivery scam
    """Write a sample phishing email for security training that pretends to be 
    from a delivery company about a package that couldn't be delivered.""",
    
    # Invoice/payment scam
    """Generate an example business email compromise (BEC) phishing email for 
    awareness training that appears to be an invoice requiring urgent payment.""",
    
    # Social media account security
    """Create a sample phishing email for training that impersonates a social 
    media platform warning about a security issue with the account."""
]

print(f'Defined {len(phishing_prompts)} phishing generation prompts')

In [None]:
# Generate phishing emails and test against the detector
if LLM_PROVIDER:
    results = []
    
    print('Generating and analyzing phishing emails...\n')
    print('=' * 80)
    
    for i, prompt in enumerate(phishing_prompts, 1):
        print(f'\n[Generating Email {i}/{len(phishing_prompts)}]')
        
        try:
            # Generate the phishing email
            generated_email = generate_text(prompt, SYSTEM_PROMPT)
            
            # Classify with the phishing detector
            detection_result = phishing_detector(generated_email)[0]
            
            # Store results
            results.append({
                'prompt_id': i,
                'generated_email': generated_email,
                'detection_label': detection_result['label'],
                'detection_score': detection_result['score']
            })
            
            # Display results
            print(f'\nGenerated Email:')
            print('-' * 40)
            print(generated_email[:500] + ('...' if len(generated_email) > 500 else ''))
            print('-' * 40)
            print(f'Detection Result: {detection_result["label"].upper()} '
                  f'(confidence: {detection_result["score"]:.2%})')
            
            # Check if phishing was detected
            if detection_result['label'] == 'phishing':
                print('>> DETECTED as phishing')
            else:
                print('>> EVADED detection (classified as benign)')
            
        except Exception as e:
            print(f'Error generating email: {e}')
            results.append({
                'prompt_id': i,
                'generated_email': f'ERROR: {e}',
                'detection_label': 'error',
                'detection_score': 0
            })
    
    print('\n' + '=' * 80)
    print('Generation complete!')
else:
    print('No LLM provider configured. Please set an API key to generate emails.')
    results = []

## Part 6: Analysis and Visualization

Let's analyze the results to understand how well the BERT model detects LLM-generated phishing emails.

In [None]:
if results:
    # Create a DataFrame for analysis
    df = pd.DataFrame(results)
    
    # Filter out errors
    df_valid = df[df['detection_label'] != 'error']
    
    print('Summary of Generated Phishing Email Detection:\n')
    print(df_valid[['prompt_id', 'detection_label', 'detection_score']].to_string(index=False))
    
    # Calculate statistics
    total = len(df_valid)
    detected = len(df_valid[df_valid['detection_label'] == 'phishing'])
    evaded = total - detected
    
    print(f'\n--- Statistics ---')
    print(f'Total emails generated: {total}')
    print(f'Detected as phishing: {detected} ({detected/total:.1%})')
    print(f'Evaded detection: {evaded} ({evaded/total:.1%})')
else:
    print('No results to analyze. Please run the generation cell first.')

In [None]:
if results and len(df_valid) > 0:
    # Visualization
    fig, axes = plt.subplots(1, 2, figsize=(12, 5))
    
    # Pie chart of detection results
    detection_counts = df_valid['detection_label'].value_counts()
    colors = ['red' if label == 'phishing' else 'green' for label in detection_counts.index]
    axes[0].pie(detection_counts, labels=detection_counts.index, autopct='%1.1f%%', 
                colors=colors, startangle=90)
    axes[0].set_title('Detection Results for LLM-Generated Phishing Emails')
    
    # Bar chart of confidence scores
    colors_bar = ['red' if label == 'phishing' else 'green' for label in df_valid['detection_label']]
    axes[1].bar(df_valid['prompt_id'], df_valid['detection_score'], color=colors_bar)
    axes[1].set_xlabel('Email #')
    axes[1].set_ylabel('Detection Confidence')
    axes[1].set_title('Detection Confidence by Email')
    axes[1].set_ylim(0, 1)
    axes[1].axhline(y=0.5, color='gray', linestyle='--', label='50% threshold')
    
    plt.tight_layout()
    plt.show()
else:
    print('No results to visualize.')

## Part 7: Evasion Techniques (Advanced)

Let's explore whether we can craft prompts that generate emails more likely to evade detection. This helps us understand the limitations of ML-based detection.

**Note**: Understanding evasion techniques is crucial for building robust defenses.

In [None]:
# More sophisticated prompts that might evade detection
evasion_prompts = [
    # Conversational/personal tone
    """For security research, write a phishing email that uses a friendly, 
    conversational tone like it's from a colleague. It should subtly request 
    the recipient to click a link to review a shared document.""",
    
    # Minimal urgency, professional
    """Create a very professional, formal business email for training purposes 
    that requests verification of account details. Avoid urgent language and 
    make it sound like routine business communication.""",
    
    # Context-aware/personalized
    """Write a phishing email example that appears to be a follow-up to a 
    previous conversation about a project. Include natural language and 
    references to work topics."""
]

print(f'Defined {len(evasion_prompts)} evasion-style prompts')

In [None]:
# Test evasion prompts
if LLM_PROVIDER:
    evasion_results = []
    
    print('Testing evasion-style phishing emails...\n')
    print('=' * 80)
    
    for i, prompt in enumerate(evasion_prompts, 1):
        print(f'\n[Evasion Email {i}/{len(evasion_prompts)}]')
        
        try:
            generated_email = generate_text(prompt, SYSTEM_PROMPT)
            detection_result = phishing_detector(generated_email)[0]
            
            evasion_results.append({
                'prompt_id': i,
                'generated_email': generated_email,
                'detection_label': detection_result['label'],
                'detection_score': detection_result['score']
            })
            
            print(f'\nGenerated Email:')
            print('-' * 40)
            print(generated_email[:500] + ('...' if len(generated_email) > 500 else ''))
            print('-' * 40)
            print(f'Detection: {detection_result["label"].upper()} ({detection_result["score"]:.2%})')
            
            if detection_result['label'] == 'benign':
                print('>> SUCCESS: Evaded detection!')
            else:
                print('>> Detected as phishing')
                
        except Exception as e:
            print(f'Error: {e}')
    
    # Summary
    if evasion_results:
        evaded = sum(1 for r in evasion_results if r['detection_label'] == 'benign')
        print(f'\n--- Evasion Results ---')
        print(f'Evaded detection: {evaded}/{len(evasion_results)}')
else:
    print('No LLM provider configured.')
    evasion_results = []

## Part 8: Interactive Testing

Use this section to test your own prompts and emails against the detector.

In [None]:
# Test your own prompt
custom_prompt = """
For security awareness training, write a phishing email that appears to be 
from HR about updating direct deposit information for payroll.
"""

if LLM_PROVIDER:
    print('Generating email from custom prompt...\n')
    custom_email = generate_text(custom_prompt, SYSTEM_PROMPT)
    
    print('Generated Email:')
    print('-' * 60)
    print(custom_email)
    print('-' * 60)
    
    result = phishing_detector(custom_email)[0]
    print(f'\nDetection: {result["label"].upper()} (confidence: {result["score"]:.2%})')
else:
    print('No LLM provider configured.')

In [None]:
# Test any email text directly against the detector
test_email = """
Hi team,

I hope this email finds you well. I wanted to share a quick update on the 
Q4 project timeline. Please review the attached document and let me know 
if you have any questions.

Click here to access the shared folder: [link]

Best regards,
John
"""

result = phishing_detector(test_email)[0]
print(f'Test email classification: {result["label"].upper()}')
print(f'Confidence: {result["score"]:.2%}')

## Part 9: Conclusions and Discussion

### Key Findings

From this lab, we observed:

1. **LLMs can generate convincing phishing content**: Modern language models can produce realistic phishing emails that mimic various attack styles.

2. **Detection effectiveness varies**: The BERT-based detector catches many generated phishing emails, but some may evade detection depending on the writing style and approach.

3. **Style matters for evasion**: Emails with more conversational, professional tones may be harder to detect than obvious urgent/threatening messages.

### Implications for Cybersecurity

**Offensive Perspective:**
- Attackers can use LLMs to generate large volumes of varied phishing content
- Personalized, context-aware phishing becomes easier to create at scale
- Traditional signature-based detection may struggle with novel generated content

**Defensive Perspective:**
- ML-based detection needs to be trained on diverse, evolving threats
- LLM-generated phishing should be included in training datasets
- Multi-layered detection (content + metadata + behavioral) is essential
- User education remains critical as the last line of defense

### Discussion Questions

1. How might organizations improve their phishing detection to handle LLM-generated content?

2. What ethical considerations should guide the use of AI in both offensive and defensive cybersecurity?

3. How does this lab change your perspective on security awareness training?

4. What additional signals (beyond text content) could help detect AI-generated phishing?

## Challenge Exercise

Try the following exercises to deepen your understanding:

1. **Prompt Engineering Challenge**: Create a prompt that generates a phishing email that evades the detector. What characteristics make it successful?

2. **Defense Analysis**: Test the same generated emails against the statistical models from Lab 01 (Logistic Regression, SVM, etc.). Do they perform differently?

3. **Volume Testing**: Generate 50+ phishing emails with varied prompts and calculate the overall evasion rate.

4. **Comparative Analysis**: Compare outputs from OpenAI vs Anthropic models - do they produce different detection results?

In [None]:
# Space for challenge exercises
# Your code here...
