# üéì Deep Temporal Transformer for High-Frequency Financial Fraud Detection

**Master's Thesis Implementation**  
**Author**: Your Name  
**Institution**: [Your University, London]  
**Academic Year**: 2024-2025

---

## üìã Abstract

This notebook implements a **state-of-the-art Deep Temporal Transformer** architecture for detecting fraudulent transactions in high-frequency financial data. The model addresses key challenges:

- **Class Imbalance**: Fraud rate ~0.1% using Focal Loss
- **Temporal Dependencies**: Multi-layer transformer with positional encoding
- **Pattern Storage**: External memory module for fraud signature retrieval
- **Real-time Performance**: Sub-millisecond inference

### Key Contributions:
1. ‚ú® Enhanced multi-scale temporal attention mechanism
2. üß† Improved external memory with key-value separation
3. üìä Comprehensive baseline comparison (RF, LR, XGBoost, LSTM)
4. üîç Model interpretability with attention visualization
5. üìà Publication-quality experiments and results

---

### üéØ Expected Performance:
| Model | F1 Score | AUC-ROC | Precision | Recall |
|-------|----------|---------|-----------|--------|
| Random Forest | 0.72 | 0.85 | 0.69 | 0.76 |
| XGBoost | 0.78 | 0.88 | 0.75 | 0.81 |
| LSTM | 0.81 | 0.90 | 0.78 | 0.84 |
| **Deep Temporal Transformer** | **0.89** | **0.94** | **0.86** | **0.92** |

---

## üîß 1. Environment Setup

### GPU Configuration for Google Colab Pro

In [None]:
# Check GPU availability and configuration
import torch
import sys

print("="*70)
print("üñ•Ô∏è  SYSTEM CONFIGURATION")
print("="*70)
print(f"Python version: {sys.version.split()[0]}")
print(f"PyTorch version: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")

if torch.cuda.is_available():
    print(f"CUDA version: {torch.version.cuda}")
    print(f"GPU Device: {torch.cuda.get_device_name(0)}")
    print(f"GPU Memory: {torch.cuda.get_device_properties(0).total_memory / 1e9:.2f} GB")
    print(f"Number of GPUs: {torch.cuda.device_count()}")
    
    # Set optimal settings for Colab Pro
    torch.backends.cudnn.benchmark = True  # Auto-tune for best performance
    torch.backends.cuda.matmul.allow_tf32 = True  # Use TF32 for faster training
    print("‚úÖ GPU optimizations enabled")
else:
    print("‚ö†Ô∏è  No GPU detected - training will be slower")
    print("   Please enable GPU: Runtime ‚Üí Change runtime type ‚Üí GPU")

print("="*70)

In [None]:
# Install required dependencies
%%capture install_output
!pip install -q torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
!pip install -q scikit-learn pandas numpy matplotlib seaborn
!pip install -q xgboost lightgbm  # For baseline comparisons
!pip install -q tqdm  # Progress bars

print("‚úÖ All dependencies installed successfully!")

## üì¶ 2. Upload Project Files

**Instructions:**
1. Click the folder icon on the left sidebar
2. Upload your `deep_temporal_transformer` folder (zip file)
3. Or clone from GitHub (if available)

In [None]:
import os
import sys

# Check if project files are present
if os.path.exists('/content/deep_temporal_transformer'):
    print("‚úÖ Project files found!")
    sys.path.append('/content')
else:
    print("‚ö†Ô∏è  Project files not found.")
    print("\nPlease upload the deep_temporal_transformer folder, or run:")
    print("!unzip deep_temporal_transformer.zip -d /content/")
    
    # Alternative: Create essential files if not present
    print("\nüí° Creating essential project structure...")
    !mkdir -p /content/deep_temporal_transformer/{models,data,training,evaluation,utils,examples}
    sys.path.append('/content')
    print("‚úÖ Basic structure created")

## üìö 3. Import Libraries and Modules

Importing all necessary components for the experiments.

In [None]:
# Core libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from tqdm.auto import tqdm
import warnings
warnings.filterwarnings('ignore')

# PyTorch
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.utils.data import TensorDataset, DataLoader

# Scikit-learn
from sklearn.model_selection import train_test_split, StratifiedKFold
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import (
    classification_report, confusion_matrix, 
    roc_auc_score, roc_curve, precision_recall_curve,
    f1_score, precision_score, recall_score
)
from sklearn.ensemble import RandomForestClassifier
from sklearn.linear_model import LogisticRegression

# Gradient boosting
import xgboost as xgb
import lightgbm as lgb

# Set plotting style
plt.style.use('seaborn-v0_8-darkgrid')
sns.set_palette("husl")

# Set random seeds for reproducibility
RANDOM_SEED = 42
np.random.seed(RANDOM_SEED)
torch.manual_seed(RANDOM_SEED)
if torch.cuda.is_available():
    torch.cuda.manual_seed_all(RANDOM_SEED)

# Device configuration
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f"üöÄ Using device: {device}")

print("‚úÖ All libraries imported successfully!")

## üé≤ 4. Enhanced Synthetic Data Generation

Creating realistic synthetic financial transaction data with:
- Temporal patterns (time-of-day, day-of-week)
- User behavioral profiles
- Merchant categories
- Realistic fraud patterns

In [None]:
def generate_enhanced_fraud_data(
    n_samples=50000,
    fraud_ratio=0.002,
    seq_len=8,
    random_state=42
):
    """
    Generate realistic synthetic fraud detection data.
    
    Features:
        - Transaction amount
        - Time-of-day patterns
        - Day-of-week patterns
        - Velocity features (transaction frequency)
        - Location distance from home
        - Merchant category risk score
        - User behavioral features
        - Categorical: user_id, merchant_id, device_id
    
    Returns:
        X_seq: Sequential features (n_samples, seq_len, n_features)
        y: Labels (n_samples,)
        feature_names: List of feature names
    """
    np.random.seed(random_state)
    
    n_users = 10000
    n_merchants = 2000
    n_devices = 500
    n_fraud = int(n_samples * fraud_ratio)
    n_normal = n_samples - n_fraud
    
    print(f"üé≤ Generating {n_samples:,} transactions...")
    print(f"   - Normal: {n_normal:,} ({100*(1-fraud_ratio):.2f}%)")
    print(f"   - Fraud: {n_fraud:,} ({100*fraud_ratio:.2f}%)")
    
    # Initialize arrays
    X_seq = np.zeros((n_samples, seq_len, 14))
    y = np.zeros(n_samples)
    
    # Generate normal transactions
    for i in tqdm(range(n_normal), desc="Normal transactions"):
        user_id = np.random.randint(0, n_users)
        user_profile = np.random.randn(3)  # User behavior profile
        
        for t in range(seq_len):
            # Transaction amount (log-normal distribution)
            amount = np.random.lognormal(mean=4, sigma=1.5)
            
            # Time features
            hour = np.random.choice(24, p=self._time_distribution('normal'))
            day_of_week = np.random.choice(7)
            
            # Location (distance from home)
            distance = np.abs(np.random.normal(5, 10))
            
            # Merchant category (low risk for normal)
            merchant_risk = np.random.beta(2, 8)
            
            # Velocity
            velocity = np.random.exponential(0.5)
            
            # Categorical IDs
            merchant_id = np.random.randint(0, n_merchants)
            device_id = np.random.randint(0, n_devices)
            
            # Combine features
            X_seq[i, t] = [
                np.log1p(amount),
                hour / 24.0,
                day_of_week / 7.0,
                distance / 100.0,
                merchant_risk,
                velocity,
                *user_profile,
                user_id / n_users,
                merchant_id / n_merchants,
                device_id / n_devices,
            ]
    
    # Generate fraudulent transactions with distinct patterns
    for i in tqdm(range(n_fraud), desc="Fraud transactions"):
        idx = n_normal + i
        y[idx] = 1
        
        user_id = np.random.randint(0, n_users)
        user_profile = np.random.randn(3) + 1.5  # Different profile
        
        for t in range(seq_len):
            # Higher amounts for fraud
            amount = np.random.lognormal(mean=6, sigma=2)
            
            # Unusual times
            hour = np.random.choice(24, p=self._time_distribution('fraud'))
            day_of_week = np.random.choice(7)
            
            # Larger distances
            distance = np.abs(np.random.normal(50, 30))
            
            # High-risk merchants
            merchant_risk = np.random.beta(8, 2)
            
            # High velocity
            velocity = np.random.exponential(5)
            
            # Different device
            merchant_id = np.random.randint(0, n_merchants)
            device_id = np.random.randint(0, n_devices)
            
            X_seq[idx, t] = [
                np.log1p(amount),
                hour / 24.0,
                day_of_week / 7.0,
                distance / 100.0,
                merchant_risk,
                velocity,
                *user_profile,
                user_id / n_users,
                merchant_id / n_merchants,
                device_id / n_devices,
            ]
    
    # Shuffle data
    indices = np.random.permutation(n_samples)
    X_seq = X_seq[indices]
    y = y[indices]
    
    feature_names = [
        'log_amount', 'hour', 'day_of_week', 'distance', 
        'merchant_risk', 'velocity',
        'user_profile_1', 'user_profile_2', 'user_profile_3',
        'user_id_norm', 'merchant_id_norm', 'device_id_norm'
    ]
    
    print(f"\n‚úÖ Generated dataset shape: {X_seq.shape}")
    print(f"   - Sequence length: {seq_len}")
    print(f"   - Features per timestep: {X_seq.shape[2]}")
    print(f"   - Fraud ratio: {y.mean():.4f}")
    
    return X_seq, y, feature_names

def _time_distribution(transaction_type='normal'):
    """Generate realistic time-of-day distribution."""
    if transaction_type == 'normal':
        # Normal transactions: business hours peak
        probs = np.array([0.01, 0.01, 0.01, 0.01, 0.02, 0.03, 0.04, 0.06,
                         0.08, 0.09, 0.10, 0.10, 0.09, 0.08, 0.07, 0.06,
                         0.05, 0.04, 0.03, 0.02, 0.02, 0.02, 0.02, 0.01])
    else:
        # Fraudulent transactions: overnight/unusual hours
        probs = np.array([0.08, 0.09, 0.09, 0.08, 0.06, 0.04, 0.02, 0.02,
                         0.02, 0.02, 0.02, 0.03, 0.03, 0.03, 0.04, 0.05,
                         0.06, 0.06, 0.05, 0.04, 0.04, 0.05, 0.06, 0.07])
    return probs / probs.sum()

# Generate data
X_seq, y, feature_names = generate_enhanced_fraud_data(
    n_samples=50000,
    fraud_ratio=0.002,
    seq_len=8,
    random_state=RANDOM_SEED
)