Abdullah Shakir

22I-1138

Messam Raza

22I-1194

Data Mining (CS-A)


# *Transformer-Based Anomaly Detection Framework*

## Overview
This notebook implements a state-of-the-art anomaly detection system for multivariate time series data using:
 
 1. **Transformer Autoencoder** - For temporal feature extraction and reconstruction
 2. **Contrastive Learning** - To distinguish normal vs. anomalous patterns
 3. **Generative Adversarial Network (GAN)** - To handle training data contamination
 4. **Geometric Masking** - Data augmentation for robustness
 
## Dataset
 **Credit Card Fraud Detection Dataset**
 - Source: Kaggle/ULB Machine Learning Group
 - Features: 28 PCA-transformed features + Time + Amount
 - Task: Detect fraudulent transactions (highly imbalanced)



 ## üìö Section 1: Imports and Configuration



In [2]:

# Core libraries
import os
import math
import random
import warnings
warnings.filterwarnings("ignore")

# Data processing
import numpy as np
import pandas as pd
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import (
    roc_auc_score, 
    precision_recall_curve, 
    auc, 
    roc_curve,
    classification_report,
    confusion_matrix
)
from sklearn.model_selection import train_test_split

# Deep learning
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.utils.data import Dataset, DataLoader

# Visualization
import matplotlib.pyplot as plt
import seaborn as sns

# Set style
plt.style.use('seaborn-v0_8-darkgrid')
sns.set_palette("husl")

print("‚úì All libraries imported successfully")


‚úì All libraries imported successfully



## ‚öôÔ∏è Section 2: Hyperparameters and Configuration



In [3]:

# ============================================================================
# CONFIGURATION PARAMETERS
# ============================================================================

# Data paths
DATA_PATH = "./data/creditcard.csv"

# Sequence generation
WINDOW_SIZE = 10          # Sliding window size for time series
STEP = 1                  # Step size for sliding window

# Model architecture
D_MODEL = 64              # Transformer embedding dimension
NHEAD = 4                 # Number of attention heads
NUM_ENCODER_LAYERS = 2    # Number of transformer encoder layers
LATENT_DIM = 64           # Latent space dimensionality

# Training hyperparameters
BATCH_SIZE = 256
EPOCHS = 20
LEARNING_RATE = 5e-4      # Initial learning rate

# Loss weights
W_RECON = 1.0             # Reconstruction loss weight
W_CONTRAST = 0.3          # Contrastive loss weight
W_ADV = 0.01              # Adversarial loss weight

# GAN training stability
N_CRITIC = 5              # Update discriminator every N batches
CLIP_VALUE = 0.01         # Weight clipping for WGAN

# Data augmentation
TIME_MASK_PROB = 0.1      # Probability of masking time steps
FEATURE_MASK_PROB = 0.1   # Probability of masking features
NOISE_STD = 0.01          # Gaussian noise standard deviation

# System
DEVICE = torch.device("cuda" if torch.cuda.is_available() else "cpu")
SEED = 42

# Set random seeds for reproducibility
np.random.seed(SEED)
torch.manual_seed(SEED)
random.seed(SEED)
if torch.cuda.is_available():
    torch.cuda.manual_seed_all(SEED)

print(f"‚úì Configuration loaded")
print(f"‚úì Device: {DEVICE}")
print(f"‚úì Random seed: {SEED}")


‚úì Configuration loaded
‚úì Device: cpu
‚úì Random seed: 42
