# QR Code Phishing Detection using Deep Learning

**Project:** End-of-Workshop Assignment - Deep Learning with PyTorch  
**Dataset:** QR Code Images (Benign vs Malicious)  
**Model:** Convolutional Neural Network (CNN)  
**Task:** Binary Classification


## 1. Problem Statement

### Objective
Detect phishing attempts by classifying QR codes as either **benign** or **malicious** using deep learning.

### Motivation
- QR codes are increasingly used in phishing attacks ("quishing")
- Need automated detection system
- Visual patterns in QR codes may indicate malicious intent

### Dataset
- **Benign QR codes**: ~430,000 images
- **Malicious QR codes**: ~576,000 images
- **Total**: ~1,006,000 QR code images

### Task
Binary classification: Predict if a QR code is benign (0) or malicious (1)


## 2. Setup and Imports


In [None]:
import os
import time
import yaml
import torch
import torch.nn as nn
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from pathlib import Path
from PIL import Image
from tqdm import tqdm
from sklearn.metrics import (
    accuracy_score, precision_score, recall_score, f1_score,
    confusion_matrix, classification_report
)

# Set style
plt.style.use('seaborn-v0_8')
sns.set_palette("husl")

# Set device
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f"Using device: {device}")

# Import project modules
from dataset import QRCodeDataset, get_transforms
from data_utils import create_data_splits, create_dataloaders
from model import create_model, create_optimizer, create_scheduler
from train import train
from test import evaluate


## 3. Data Analysis (EDA)

### 3.1 Dataset Overview


In [None]:
# Load configuration
with open('config.yaml', 'r') as f:
    config = yaml.safe_load(f)

print("Dataset Configuration:")
print(f"  - Benign directory: {config['data']['benign_dir']}")
print(f"  - Malicious directory: {config['data']['malicious_dir']}")
print(f"  - Sample size per class: {config['data']['sample_size']}")
print(f"  - Image size: {config['data']['image_size']}x{config['data']['image_size']}")
print(f"  - Train/Val/Test split: {config['data']['train_ratio']:.0%}/{config['data']['val_ratio']:.0%}/{config['data']['test_ratio']:.0%}")


### 3.2 Check Available Data


In [None]:
from pathlib import Path

benign_dir = Path(config['data']['benign_dir'])
malicious_dir = Path(config['data']['malicious_dir'])

# Count available images
def count_images(directory):
    extensions = {'.png', '.jpg', '.jpeg', '.PNG', '.JPG', '.JPEG'}
    count = sum(1 for f in directory.iterdir() if f.suffix in extensions)
    return count

benign_count = count_images(benign_dir)
malicious_count = count_images(malicious_dir)

print(f"Available Images:")
print(f"  - Benign: {benign_count:,} images")
print(f"  - Malicious: {malicious_count:,} images")
print(f"  - Total: {benign_count + malicious_count:,} images")
print(f"\nUsing: {config['data']['sample_size']:,} per class = {config['data']['sample_size'] * 2:,} total")


### 3.3 Visualize Sample Images


In [None]:
# Load a few sample images for visualization
sample_size = 100  # Small sample for EDA
train_dataset_eda, _, _ = create_data_splits(
    benign_dir=str(benign_dir),
    malicious_dir=str(malicious_dir),
    sample_size=sample_size,
    train_ratio=0.7,
    val_ratio=0.15,
    test_ratio=0.15,
    image_size=config['data']['image_size'],
    seed=42
)

# Visualize sample images
fig, axes = plt.subplots(2, 4, figsize=(12, 6))
axes = axes.flatten()

for i in range(8):
    image, label = train_dataset_eda[i]
    
    # Denormalize for visualization
    mean = torch.tensor([0.485, 0.456, 0.406]).view(3, 1, 1)
    std = torch.tensor([0.229, 0.224, 0.225]).view(3, 1, 1)
    img_vis = image * std + mean
    img_vis = torch.clamp(img_vis, 0, 1)
    
    axes[i].imshow(img_vis.permute(1, 2, 0).numpy())
    axes[i].set_title(f"{'Benign' if label == 0 else 'Malicious'}")
    axes[i].axis('off')

plt.suptitle('Sample QR Code Images', fontsize=14, fontweight='bold')
plt.tight_layout()
plt.show()


### 3.4 Class Distribution


In [None]:
# Create full dataset for analysis
train_dataset_full, val_dataset_full, test_dataset_full = create_data_splits(
    benign_dir=str(benign_dir),
    malicious_dir=str(malicious_dir),
    sample_size=config['data']['sample_size'],
    train_ratio=config['data']['train_ratio'],
    val_ratio=config['data']['val_ratio'],
    test_ratio=config['data']['test_ratio'],
    image_size=config['data']['image_size'],
    seed=config['data']['seed']
)

# Get class distribution
train_dist = train_dataset_full.get_class_distribution()
val_dist = val_dataset_full.get_class_distribution()
test_dist = test_dataset_full.get_class_distribution()

# Visualize
fig, axes = plt.subplots(1, 3, figsize=(15, 4))

for idx, (name, dist) in enumerate([('Train', train_dist), ('Validation', val_dist), ('Test', test_dist)]):
    labels = ['Benign', 'Malicious']
    sizes = [dist['benign'], dist['malicious']]
    colors = ['#66b3ff', '#ff9999']
    
    axes[idx].pie(sizes, labels=labels, autopct='%1.1f%%', colors=colors, startangle=90)
    axes[idx].set_title(f'{name} Set\n({dist["total"]} images)')

plt.suptitle('Class Distribution Across Splits', fontsize=14, fontweight='bold')
plt.tight_layout()
plt.show()

print(f"\nClass Distribution Summary:")
print(f"Train: {train_dist['benign']} benign, {train_dist['malicious']} malicious")
print(f"Val:   {val_dist['benign']} benign, {val_dist['malicious']} malicious")
print(f"Test:  {test_dist['benign']} benign, {test_dist['malicious']} malicious")


## 4. Model Architecture


In [None]:
# Create model
model = create_model(
    num_classes=config['model']['num_classes'],
    dropout=config['model']['dropout'],
    device=device
)

print("Model Architecture:")
print(model)
print(f"\nModel Statistics:")
print(f"  - Total parameters: {model.count_parameters():,}")
print(f"  - Model size: {model.get_model_size_mb():.2f} MB")

# Test forward pass
dummy_input = torch.randn(1, 3, config['data']['image_size'], config['data']['image_size']).to(device)
with torch.no_grad():
    output = model(dummy_input)
print(f"  - Input shape: {dummy_input.shape}")
print(f"  - Output shape: {output.shape}")


## 5. Training


In [None]:
print("Starting Training...")
print("=" * 60)
print(f"Configuration:")
print(f"  - Epochs: {config['training']['epochs']}")
print(f"  - Batch size: {config['training']['batch_size']}")
print(f"  - Learning rate: {config['training']['learning_rate']}")
print(f"  - Sample size: {config['data']['sample_size']} per class")
print("=" * 60)

# Train model
trained_model, history = train(
    config_path='config.yaml',
    save_plots=True
)


### 5.1 Training History Visualization


In [None]:
# Plot training history
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

epochs = range(1, len(history['train_loss']) + 1)

# Loss plot
axes[0].plot(epochs, history['train_loss'], 'b-', label='Train Loss', linewidth=2)
axes[0].plot(epochs, history['val_loss'], 'r-', label='Val Loss', linewidth=2)
axes[0].set_xlabel('Epoch', fontsize=12)
axes[0].set_ylabel('Loss', fontsize=12)
axes[0].set_title('Training and Validation Loss', fontsize=14, fontweight='bold')
axes[0].legend(fontsize=11)
axes[0].grid(True, alpha=0.3)

# Accuracy plot
axes[1].plot(epochs, history['train_acc'], 'b-', label='Train Acc', linewidth=2)
axes[1].plot(epochs, history['val_acc'], 'r-', label='Val Acc', linewidth=2)
axes[1].set_xlabel('Epoch', fontsize=12)
axes[1].set_ylabel('Accuracy (%)', fontsize=12)
axes[1].set_title('Training and Validation Accuracy', fontsize=14, fontweight='bold')
axes[1].legend(fontsize=11)
axes[1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

# Print best metrics
best_val_epoch = np.argmin(history['val_loss']) + 1
print(f"\nBest Validation Performance:")
print(f"  - Epoch: {best_val_epoch}")
print(f"  - Val Loss: {min(history['val_loss']):.4f}")
print(f"  - Val Accuracy: {max(history['val_acc']):.2f}%")


## 6. Evaluation


In [None]:
# Evaluate on test set
results = evaluate(
    config_path='config.yaml',
    model_path=None,  # Uses best model from training
    save_plots=True
)


### 6.1 Confusion Matrix


In [None]:
# Plot confusion matrix
cm = results['confusion_matrix']
class_names = ['Benign', 'Malicious']

plt.figure(figsize=(8, 6))
sns.heatmap(
    cm,
    annot=True,
    fmt='d',
    cmap='Blues',
    xticklabels=class_names,
    yticklabels=class_names,
    cbar_kws={'label': 'Count'}
)
plt.title('Confusion Matrix', fontsize=14, fontweight='bold')
plt.ylabel('True Label', fontsize=12)
plt.xlabel('Predicted Label', fontsize=12)
plt.tight_layout()
plt.show()

# Print metrics
print(f"\nTest Set Performance:")
print(f"  - Accuracy: {results['accuracy']*100:.2f}%")
print(f"  - Precision: {results['precision']*100:.2f}%")
print(f"  - Recall: {results['recall']*100:.2f}%")
print(f"  - F1-Score: {results['f1']*100:.2f}%")

print(f"\nPer-Class Metrics:")
for i, class_name in enumerate(class_names):
    print(f"  {class_name}:")
    print(f"    Precision: {results['precision_per_class'][i]:.4f}")
    print(f"    Recall: {results['recall_per_class'][i]:.4f}")
    print(f"    F1-Score: {results['f1_per_class'][i]:.4f}")


### 6.2 Efficiency Metrics


In [None]:
if 'inference_time' in results:
    timing = results['inference_time']
    
    print(f"\nEfficiency Metrics:")
    print(f"  - Average inference time: {timing['avg_sample_time_ms']:.2f} ms/sample")
    print(f"  - Throughput: {timing['samples_per_second']:.2f} samples/sec")
    print(f"  - Model parameters: {trained_model.count_parameters():,}")
    print(f"  - Model size: {trained_model.get_model_size_mb():.2f} MB")
    
    # Visualize inference time
    fig, ax = plt.subplots(figsize=(8, 5))
    metrics = ['Inference Time\n(ms/sample)', 'Throughput\n(samples/sec)']
    values = [timing['avg_sample_time_ms'], timing['samples_per_second']]
    
    bars = ax.bar(metrics, values, color=['#66b3ff', '#99ff99'])
    ax.set_ylabel('Value', fontsize=12)
    ax.set_title('Model Efficiency Metrics', fontsize=14, fontweight='bold')
    
    # Add value labels on bars
    for bar, val in zip(bars, values):
        height = bar.get_height()
        ax.text(bar.get_x() + bar.get_width()/2., height,
                f'{val:.2f}', ha='center', va='bottom', fontsize=11)
    
    plt.tight_layout()
    plt.show()


## 7. Results Summary


In [None]:
print("=" * 60)
print("FINAL RESULTS SUMMARY")
print("=" * 60)

print(f"\nDataset:")
print(f"  - Training samples: {len(train_dataset_full):,}")
print(f"  - Validation samples: {len(val_dataset_full):,}")
print(f"  - Test samples: {len(test_dataset_full):,}")

print(f"\nModel:")
print(f"  - Architecture: CNN (3 Conv + 2 FC layers)")
print(f"  - Parameters: {trained_model.count_parameters():,}")
print(f"  - Size: {trained_model.get_model_size_mb():.2f} MB")

print(f"\nPerformance:")
print(f"  - Test Accuracy: {results['accuracy']*100:.2f}%")
print(f"  - Test Precision: {results['precision']*100:.2f}%")
print(f"  - Test Recall: {results['recall']*100:.2f}%")
print(f"  - Test F1-Score: {results['f1']*100:.2f}%")

if 'inference_time' in results:
    print(f"\nEfficiency:")
    print(f"  - Inference time: {results['inference_time']['avg_sample_time_ms']:.2f} ms/sample")
    print(f"  - Throughput: {results['inference_time']['samples_per_second']:.2f} samples/sec")

print("=" * 60)


## 8. Conclusion

### What Worked
- CNN architecture successfully learned patterns from QR code images
- Data augmentation improved model generalization
- Gradient clipping prevented training instability
- Wandb integration provided excellent visualization of training progress

### What Didn't Work Initially
- Initial learning rate (0.001) was too high, causing poor convergence
- Model was too large initially, making training difficult
- Insufficient data (5K per class) limited model learning
- Weight decay (0.01) was too aggressive

### Solutions Applied
1. **Reduced learning rate**: 0.001 → 0.0005 → 0.0001
2. **Increased dataset size**: 5K → 20K per class
3. **Added gradient clipping**: Prevents exploding gradients
4. **Reduced weight decay**: 0.01 → 0.001
5. **Better initialization**: Small init for final layer
6. **More epochs**: 10 → 30 (then back to 10 for time)

### Future Improvements
1. **Transfer Learning**: Use pretrained models (ResNet, EfficientNet)
2. **Hybrid Approach**: Combine image features with URL features from CSV files
3. **Ensemble Methods**: Combine multiple models for better accuracy
4. **Hyperparameter Tuning**: Use WandB Sweeps for automated tuning
5. **More Data**: Use full dataset (1M images) instead of sampling
6. **Different Architecture**: Try Vision Transformers or attention mechanisms
7. **Data Quality**: Investigate if QR codes have visual differences or if URL analysis is needed

### Key Learnings
- QR codes are visually very similar, making classification challenging
- Data size and quality are crucial for model performance
- Proper hyperparameter tuning is essential for convergence
- Monitoring training with validation set prevents overfitting
- Gradient clipping helps stabilize training
- Lower learning rates often work better for complex tasks

### Final Notes
This project demonstrates a complete deep learning pipeline for binary image classification. The model can be further improved with more data, better architectures, or hybrid approaches combining visual and URL features.
