# Video Anomaly Detection using Multi-Layer Reconstruction Autoencoder

### Key Features
- **3D Convolutional Autoencoder** for spatiotemporal feature learning
- **Variance Attention Mechanism** to focus on important regions
- **Multi-Layer Reconstruction** for improved anomaly detection
- **PyTorch Lightning** for efficient training
- **Early Stopping & Checkpointing** for optimal results


---
## 1Ô∏è‚É£ Import Libraries

Import all necessary dependencies for deep learning, data processing, and visualization.

In [None]:
# Core libraries
import os
import glob
import numpy as np
import pandas as pd
from PIL import Image
import warnings
warnings.filterwarnings('ignore')

# PyTorch and PyTorch Lightning
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.utils.data import Dataset, DataLoader
import pytorch_lightning as pl
from pytorch_lightning.callbacks import EarlyStopping, ModelCheckpoint

print("‚úì All libraries imported successfully!")
print(f"PyTorch version: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"GPU: {torch.cuda.get_device_name(0)}")

---
## 2Ô∏è‚É£ Configuration

### üìÅ Dataset Paths
Update these paths according to your Kaggle dataset location.

### ‚öôÔ∏è Hyperparameters
- **Image Size**: resized to 448w x 256h to maintain aspect ratio
- **Sequence Length**: 8 frames per clip
- **Batch Size**: 4
- **Max Epochs**: 80 (with early stopping)
- **Learning Rate**: 0.0001
- **Early Stopping Patience**: 8 epochs

In [None]:
class Config:
    # ==================== DATASET PATHS ====================
    # Update these paths for your environment
    BASE_PATH = '/kaggle/input/pixel-play-26/Avenue_Corrupted-20251221T112159Z-3-001/Avenue_Corrupted/Dataset'
    TRAIN_PATH = f'{BASE_PATH}/training_videos'  # Training .jpg frames
    TEST_PATH = f'{BASE_PATH}/testing_videos'    # Testing .jpg frames
    
    # ==================== MODEL PARAMETERS ====================
    IMG_HEIGHT = 256  # Padded from 360 to be divisible by 16
    IMG_WIDTH = 448
    SEQUENCE_LENGTH = 8  # Number of frames per sequence
    CHANNELS = 3         # RGB channels
    
    # ==================== TRAINING PARAMETERS ====================
    BATCH_SIZE = 4
    MAX_EPOCHS = 80      
    LEARNING_RATE = 0.0001
    WEIGHT_DECAY = 0.0001
    PATIENCE = 6         # Early stopping patience
    
    # ==================== LOSS WEIGHTS ====================
    ALPHA_1 = 1.0   # patAppearance loss weight
    ALPHA_2 = 25.0  # Motion loss weight (higher = more focus on motion)
    
    # ==================== SCORE WEIGHTS ====================
    # These are tuned for the Avenue dataset
    BETA_1 = -0.1   # Appearance score weight
    BETA_2 = 2.0    # Motion score weight
    
    # ==================== MULTI-LAYER RECONSTRUCTION ====================
    RECON_LAYERS = [0, 2]  
    
    # ==================== OUTPUT ====================
    OUTPUT_CSV = 'anomaly_scores.csv'
    CHECKPOINT_DIR = 'checkpoints'

# Initialize config
config = Config()
print("‚úì Configuration loaded successfully!")

---
## 3Ô∏è‚É£ Early Dataset Validation

Before training, let's verify that our dataset is correctly loaded by counting frames in both training and testing sets.

In [None]:
# Define and run dataset validation function
def get_dataset_frame_counts(train_path, test_path):
    """
    Retrieve the number of frames in train and test datasets.
    This validates data availability before training begins.
    """
    print("\n" + "="*60)
    print("EARLY DATASET VALIDATION - Counting Frames")
    print("="*60)
    
    results = {
        'train': {'videos': 0, 'total_frames': 0, 'frames_per_video': []},
        'test': {'videos': 0, 'total_frames': 0, 'frames_per_video': []}
    }
    
    # Count training frames
    if os.path.exists(train_path):
        train_videos = sorted([d for d in os.listdir(train_path) 
                              if os.path.isdir(os.path.join(train_path, d))])
        results['train']['videos'] = len(train_videos)
        
        for video_folder in train_videos:
            video_path = os.path.join(train_path, video_folder)
            frames = glob.glob(os.path.join(video_path, '*.jpg'))
            frame_count = len(frames)
            results['train']['frames_per_video'].append(frame_count)
            results['train']['total_frames'] += frame_count
        
        print(f"\nüìä TRAINING DATASET:")
        print(f"   - Number of videos: {results['train']['videos']}")
        print(f"   - Total frames: {results['train']['total_frames']}")
        print(f"   - Average frames per video: {np.mean(results['train']['frames_per_video']):.1f}")
        print(f"   - Min frames: {min(results['train']['frames_per_video']) if results['train']['frames_per_video'] else 0}")
        print(f"   - Max frames: {max(results['train']['frames_per_video']) if results['train']['frames_per_video'] else 0}")
    else:
        print(f"\n‚ö†Ô∏è  WARNING: Training path not found: {train_path}")
    
    # Count testing frames
    if os.path.exists(test_path):
        test_videos = sorted([d for d in os.listdir(test_path) 
                             if os.path.isdir(os.path.join(test_path, d))])
        results['test']['videos'] = len(test_videos)
        
        for video_folder in test_videos:
            video_path = os.path.join(test_path, video_folder)
            frames = glob.glob(os.path.join(video_path, '*.jpg'))
            frame_count = len(frames)
            results['test']['frames_per_video'].append(frame_count)
            results['test']['total_frames'] += frame_count
        
        print(f"\nüìä TESTING DATASET:")
        print(f"   - Number of videos: {results['test']['videos']}")
        print(f"   - Total frames: {results['test']['total_frames']}")
        print(f"   - Average frames per video: {np.mean(results['test']['frames_per_video']):.1f}")
        print(f"   - Min frames: {min(results['test']['frames_per_video']) if results['test']['frames_per_video'] else 0}")
        print(f"   - Max frames: {max(results['test']['frames_per_video']) if results['test']['frames_per_video'] else 0}")
    else:
        print(f"\n‚ö†Ô∏è  WARNING: Testing path not found: {test_path}")
    
    print("\n" + "="*60)
    print("‚úì Frame count validation completed")
    print("="*60 + "\n")
    
    return results

# Run validation immediately
frame_counts = get_dataset_frame_counts(config.TRAIN_PATH, config.TEST_PATH)

---
## 4Ô∏è‚É£ Dataset Class

Custom PyTorch Dataset for loading video frame sequences. 

### üîë Key Features:
- Loads `.jpg` frames from video directories
- Creates sequences of 8 consecutive frames
- **Training mode**: Non-overlapping sequences (faster)
- **Testing mode**: Sliding window with stride=1 (complete coverage)
- Normalizes pixel values to [0, 1]
- Returns tensors in shape: `(C, T, H, W)` = `(3, 8, 300, 528)`

In [None]:
class AvenueDataset(Dataset):
    """Dataset for Avenue video frames (.jpg files)"""
    
    def __init__(self, root_dir, sequence_length=8, mode='train', transform=None):
        self.root_dir = root_dir
        self.sequence_length = sequence_length
        self.mode = mode
        self.transform = transform
        
        # Get all video folders
        self.video_folders = sorted([d for d in os.listdir(root_dir) 
                                     if os.path.isdir(os.path.join(root_dir, d))])
        
        # Build frame sequences
        self.sequences = []
        self.video_ids = []
        
        for video_id, video_folder in enumerate(self.video_folders):
            video_path = os.path.join(root_dir, video_folder)
            frames = sorted(glob.glob(os.path.join(video_path, '*.jpg')))
            
            # Create sequences
            # Train: non-overlapping (stride = sequence_length)
            # Test: sliding window (stride = 1)
            stride = sequence_length if mode == 'train' else 1
            for i in range(0, len(frames) - sequence_length + 1, stride):
                seq_frames = frames[i:i + sequence_length]
                if len(seq_frames) == sequence_length:
                    self.sequences.append(seq_frames)
                    self.video_ids.append(video_id)
        
        print(f"{mode.upper()} dataset: {len(self.sequences)} sequences from {len(self.video_folders)} videos")
    
    def __len__(self):
        return len(self.sequences)
    
    def __getitem__(self, idx):
        frame_paths = self.sequences[idx]
        video_id = self.video_ids[idx]
        
        # Load frames
        frames = []
        for frame_path in frame_paths:
            img = Image.open(frame_path).convert('RGB')
            img = img.resize((config.IMG_WIDTH, config.IMG_HEIGHT))
            img = np.array(img) / 255.0  # Normalize to [0, 1]
            frames.append(img)
        
        # Stack frames: (T, H, W, C)
        frames = np.stack(frames, axis=0)
        
        # Convert to torch tensor: (C, T, H, W)
        frames = torch.from_numpy(frames).float().permute(3, 0, 1, 2)
        
        return {
            'frames': frames,
            'video_id': video_id,
            'frame_idx': idx
        }

print("‚úì Dataset class defined!")

---
## 5Ô∏è‚É£ Model Architecture

### üèóÔ∏è 3D Convolutional Autoencoder

The model consists of:
1. **Encoder**: 4 downsampling blocks (3‚Üí64‚Üí128‚Üí256‚Üí512 channels)
2. **Decoder**: 4 upsampling blocks (512‚Üí256‚Üí128‚Üí64‚Üí3 channels)
3. **Multi-layer outputs**: Stores intermediate features for reconstruction

### üîç Why 3D Convolutions?
- Captures **spatial** patterns (what's in the frame)
- Captures **temporal** patterns (how things move)
- Better than 2D for video understanding

In [None]:
class DownsamplingBlock(nn.Module):
    """3D Downsampling block with spatial-temporal convolutions"""
    
    def __init__(self, in_channels, out_channels):
        super().__init__()
        self.conv1 = nn.Conv3d(in_channels, out_channels, kernel_size=3, stride=1, padding=1)
        self.bn1 = nn.BatchNorm3d(out_channels)
        self.conv2 = nn.Conv3d(out_channels, out_channels, kernel_size=3, stride=1, padding=1)
        self.bn2 = nn.BatchNorm3d(out_channels)
        self.conv3 = nn.Conv3d(out_channels, out_channels, kernel_size=1, stride=2, padding=0)
        self.bn3 = nn.BatchNorm3d(out_channels)
        self.relu = nn.ReLU(inplace=True)
    
    def forward(self, x):
        x = self.relu(self.bn1(self.conv1(x)))
        x = self.relu(self.bn2(self.conv2(x)))
        x = self.relu(self.bn3(self.conv3(x)))  # Downsample by 2x
        return x


class UpsamplingBlock(nn.Module):
    """3D Upsampling block with spatial-temporal convolutions"""
    
    def __init__(self, in_channels, out_channels):
        super().__init__()
        self.conv1 = nn.Conv3d(in_channels, out_channels, kernel_size=5, stride=1, padding=2)
        self.bn1 = nn.BatchNorm3d(out_channels)
        self.conv2 = nn.Conv3d(out_channels, out_channels, kernel_size=5, stride=1, padding=2)
        self.bn2 = nn.BatchNorm3d(out_channels)
        self.upsample = nn.Upsample(scale_factor=2, mode='trilinear', align_corners=True)
        self.conv3 = nn.Conv3d(out_channels, out_channels, kernel_size=3, stride=1, padding=1)
        self.bn3 = nn.BatchNorm3d(out_channels)
        self.relu = nn.ReLU(inplace=True)
    
    def forward(self, x):
        x = self.relu(self.bn1(self.conv1(x)))
        x = self.relu(self.bn2(self.conv2(x)))
        x = self.upsample(x)  # Upsample by 2x
        x = self.relu(self.bn3(self.conv3(x)))
        return x


class MultiLayerAutoencoder(nn.Module):
    """Multi-layer reconstruction autoencoder for video anomaly detection"""
    
    def __init__(self, in_channels=3):
        super().__init__()
        
        # Encoder: progressively downsample and increase channels
        self.enc1 = DownsamplingBlock(in_channels, 64)
        self.enc2 = DownsamplingBlock(64, 128)
        self.enc3 = DownsamplingBlock(128, 256)
        self.enc4 = DownsamplingBlock(256, 512)
        
        # Decoder: progressively upsample and decrease channels
        self.dec4 = UpsamplingBlock(512, 256)
        self.dec3 = UpsamplingBlock(256, 128)
        self.dec2 = UpsamplingBlock(128, 64)
        self.dec1 = UpsamplingBlock(64, in_channels)
        
        # Final output layer
        self.final = nn.Conv3d(in_channels, in_channels, kernel_size=1)
    
    def forward(self, x):
        # Store intermediate features for multi-layer reconstruction
        features = {'enc': [x], 'dec': []}
        
        # Encoder path
        x1 = self.enc1(x)
        features['enc'].append(x1)
        
        x2 = self.enc2(x1)
        features['enc'].append(x2)
        
        x3 = self.enc3(x2)
        features['enc'].append(x3)
        
        x4 = self.enc4(x3)
        features['enc'].append(x4)
        
        # Decoder path
        x = self.dec4(x4)
        features['dec'].append(x)
        
        x = self.dec3(x)
        features['dec'].append(x)
        
        x = self.dec2(x)
        features['dec'].append(x)
        
        x = self.dec1(x)
        features['dec'].append(x)
        
        # Final reconstruction
        x = self.final(x)
        features['dec'].insert(0, x)
        
        return x, features

print("‚úì Model architecture defined!")

---
## 6Ô∏è‚É£ Variance Attention Mechanism

### üí° What is Variance Attention?

Focuses the model on regions with high variance, which often indicate:
- **Movement** (temporal variance)
- **Complex patterns** (channel variance)
- **Potential anomalies** (unusual variations)

### üìê How it works:
1. Compute variance across channels and time
2. Apply softmax to create attention weights
3. Multiply reconstruction loss by attention weights
4. Model learns to pay more attention to important regions

In [None]:
def compute_variance_attention(feature_map):
    """
    Compute variance attention along channel and temporal dimensions.
    
    Args:
        feature_map: (B, C, T, H, W) - Batch, Channels, Time, Height, Width
    
    Returns:
        channel_attention: (B, 1, T, H, W) - Attention across channels
        temporal_attention: (B, C, 1, H, W) - Attention across time
    """
    B, C, T, H, W = feature_map.shape
    
    # === Channel Variance Attention ===
    # Compute variance across channel dimension
    channel_mean = feature_map.mean(dim=1, keepdim=True)
    channel_variance = ((feature_map - channel_mean) ** 2).mean(dim=1, keepdim=True)
    
    # Apply softmax for attention weights
    channel_variance_flat = channel_variance.view(B, T, H * W)
    channel_attention_flat = F.softmax(channel_variance_flat, dim=-1)
    channel_attention = channel_attention_flat.view(B, 1, T, H, W)
    
    # === Temporal Variance Attention ===
    # Compute variance across temporal dimension
    temporal_mean = feature_map.mean(dim=2, keepdim=True)
    temporal_variance = ((feature_map - temporal_mean) ** 2).mean(dim=2, keepdim=True)
    
    # Apply softmax for attention weights
    temporal_variance_flat = temporal_variance.view(B, C, H * W)
    temporal_attention_flat = F.softmax(temporal_variance_flat, dim=-1)
    temporal_attention = temporal_attention_flat.view(B, C, 1, H, W)
    
    return channel_attention, temporal_attention

print("‚úì Variance attention mechanism defined!")

---
## 7Ô∏è‚É£ Loss Functions

### üìä Two Types of Losses:

1. **Appearance Loss** (MSE on pixel values)
   - Measures how well we reconstruct the visual content
   - Normal scenes ‚Üí low loss
   - Anomalies ‚Üí high loss (can't reconstruct well)

2. **Motion Loss** (MSE on temporal gradients)
   - Measures how well we reconstruct movement patterns
   - Normal motion ‚Üí low loss
   - Anomalous motion ‚Üí high loss

### ‚öñÔ∏è Combined Loss:
`Total Loss = Œ±‚ÇÅ √ó Appearance Loss + Œ±‚ÇÇ √ó Motion Loss`

where Œ±‚ÇÅ = 1.0 and Œ±‚ÇÇ = 25.0 (motion is more important for anomaly detection)

In [None]:
def appearance_loss(reconstruction, target, attention_weights=None):
    """
    Compute appearance loss with optional attention weighting.
    
    Args:
        reconstruction: Reconstructed features
        target: Original features
        attention_weights: Optional variance attention weights
    
    Returns:
        MSE loss (weighted if attention provided)
    """
    loss = F.mse_loss(reconstruction, target, reduction='none')
    
    if attention_weights is not None:
        loss = loss * attention_weights  # Weight by attention
    
    return loss.mean()


def motion_loss(reconstruction, target, attention_weights=None):
    """
    Compute motion loss using temporal gradients.
    
    Args:
        reconstruction: Reconstructed features (B, C, T, H, W)
        target: Original features (B, C, T, H, W)
        attention_weights: Optional variance attention weights
    
    Returns:
        MSE loss on temporal gradients
    """
    # Compute temporal gradients (difference between consecutive frames)
    target_gradient = torch.abs(target[:, :, 1:] - target[:, :, :-1])
    recon_gradient = torch.abs(reconstruction[:, :, 1:] - reconstruction[:, :, :-1])
    
    loss = F.mse_loss(recon_gradient, target_gradient, reduction='none')
    
    if attention_weights is not None:
        # Adjust attention for temporal dimension (T-1 frames)
        attention_weights = attention_weights[:, :, 1:]
        loss = loss * attention_weights
    
    return loss.mean()

print("‚úì Loss functions defined!")

---
## 8Ô∏è‚É£ Training Module (PyTorch Lightning)

### ‚ö° Why PyTorch Lightning?
- Cleaner code organization
- Automatic GPU handling
- Built-in logging and checkpointing
- Easy to use with Kaggle

### üîÑ Training Process:
1. Forward pass through encoder-decoder
2. Compute multi-layer reconstruction losses
3. Apply variance attention
4. Backpropagate and update weights
5. Log metrics for monitoring

In [None]:
class AnomalyDetectionModel(pl.LightningModule):
    """PyTorch Lightning module for video anomaly detection"""
    
    def __init__(self, config):
        super().__init__()
        self.save_hyperparameters()
        self.config = config
        
        # Build model
        self.model = MultiLayerAutoencoder(in_channels=config.CHANNELS)
        
        # For validation metrics
        self.validation_step_outputs = []
    
    def forward(self, x):
        return self.model(x)
    
    def compute_multi_layer_loss(self, features, target, mode='train'):
        """
        Compute multi-layer reconstruction loss.
        
        We reconstruct at multiple layers (input, layer 3, layer 4)
        to capture anomalies at different abstraction levels.
        """
        total_app_loss = 0.0
        total_mot_loss = 0.0
        
        # Reverse decoder features to match encoder indexing
        dec_to_enc_map = {0: 0, 1: 3, 2: 2, 3: 1, 4: 1}
        
        for layer_idx in self.config.RECON_LAYERS:
            enc_feat = features['enc'][layer_idx]
            dec_feat = features['dec'][dec_to_enc_map[layer_idx]]
            
            # Resize decoder features if needed
            if enc_feat.shape != dec_feat.shape:
                dec_feat = F.interpolate(dec_feat, size=enc_feat.shape[2:], 
                                        mode='trilinear', align_corners=True)
            
            # Compute variance attention
            channel_att, temporal_att = compute_variance_attention(enc_feat)
            attention = channel_att + temporal_att
            
            # Appearance loss
            app_loss = appearance_loss(dec_feat, enc_feat, attention)
            total_app_loss += app_loss
            
            # Motion loss
            mot_loss = motion_loss(dec_feat, enc_feat, attention)
            total_mot_loss += mot_loss
        
        # Average over layers
        total_app_loss /= len(self.config.RECON_LAYERS)
        total_mot_loss /= len(self.config.RECON_LAYERS)
        
        # Combined loss
        total_loss = self.config.ALPHA_1 * total_app_loss + self.config.ALPHA_2 * total_mot_loss
        
        return total_loss, total_app_loss, total_mot_loss
    
    def training_step(self, batch, batch_idx):
        frames = batch['frames']
        
        # Forward pass
        reconstruction, features = self.model(frames)
        
        # Compute loss
        loss, app_loss, mot_loss = self.compute_multi_layer_loss(features, frames, mode='train')
        
        # Logging
        self.log('train_loss', loss, prog_bar=True)
        self.log('train_app_loss', app_loss)
        self.log('train_mot_loss', mot_loss)
        
        return loss
    
    def validation_step(self, batch, batch_idx):
        frames = batch['frames']
        
        # Forward pass
        reconstruction, features = self.model(frames)
        
        # Compute loss
        loss, app_loss, mot_loss = self.compute_multi_layer_loss(features, frames, mode='val')
        
        # Compute anomaly scores for validation
        dec_to_enc_map = {0: 0, 1: 3, 2: 2, 3: 1, 4: 1}
        app_scores = []
        mot_scores = []
        
        for layer_idx in self.config.RECON_LAYERS:
            enc_feat = features['enc'][layer_idx]
            dec_feat = features['dec'][dec_to_enc_map[layer_idx]]
            
            if enc_feat.shape != dec_feat.shape:
                dec_feat = F.interpolate(dec_feat, size=enc_feat.shape[2:], 
                                        mode='trilinear', align_corners=True)
            
            # Frame-level scores
            app_score = F.mse_loss(dec_feat, enc_feat, reduction='none').mean(dim=[1, 3, 4])
            app_scores.append(app_score)
            
            # Motion scores
            enc_grad = torch.abs(enc_feat[:, :, 1:] - enc_feat[:, :, :-1])
            dec_grad = torch.abs(dec_feat[:, :, 1:] - dec_feat[:, :, :-1])
            mot_score = F.mse_loss(dec_grad, enc_grad, reduction='none').mean(dim=[1, 3, 4])
            mot_scores.append(mot_score)
        
        # Store for epoch end
        self.validation_step_outputs.append({
            'loss': loss,
            'app_loss': app_loss,
            'mot_loss': mot_loss
        })
        
        self.log('val_loss', loss, prog_bar=True)
        
        return loss
    
    def on_validation_epoch_end(self):
        avg_loss = torch.stack([x['loss'] for x in self.validation_step_outputs]).mean()
        self.log('val_loss_epoch', avg_loss, prog_bar=True)
        self.validation_step_outputs.clear()
    
    def configure_optimizers(self):
        # AdamW optimizer with weight decay
        optimizer = torch.optim.AdamW(
            self.parameters(),
            lr=self.config.LEARNING_RATE,
            weight_decay=self.config.WEIGHT_DECAY
        )
        
        # Learning rate scheduler (reduce LR every 80 epochs)
        scheduler = torch.optim.lr_scheduler.StepLR(
            optimizer,
            step_size=80,
            gamma=0.5
        )
        
        return {
            'optimizer': optimizer,
            'lr_scheduler': {
                'scheduler': scheduler,
                'interval': 'epoch'
            }
        }

print("‚úì PyTorch Lightning module defined!")

---
## 9Ô∏è‚É£ Testing & CSV Generation

### üìù Output Format
The function generates a CSV file in Kaggle submission format:
```
Id,Predicted
1_1,0.234567
1_2,0.456789
...
```

### üîç Process:
1. Load each test video
2. Process frames in sliding windows (stride=1)
3. Compute appearance & motion scores
4. Average overlapping windows
5. Normalize scores to [0, 1]
6. Save to CSV

In [None]:
def test_and_generate_csv_by_video(model, test_dataset, config):
    """
    Process each video separately to ensure correct frame numbering.
    Output format: Id,Predicted (e.g., 1_1, 1_2, ...)
    """
    model.eval()
    device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
    model = model.to(device)
    
    # Get unique videos
    video_folders = sorted([d for d in os.listdir(config.TEST_PATH) 
                           if os.path.isdir(os.path.join(config.TEST_PATH, d))])
    
    all_results = []
    
    print("Processing videos...")
    for video_idx, video_folder in enumerate(video_folders, start=1):
        video_path = os.path.join(config.TEST_PATH, video_folder)
        frames_paths = sorted(glob.glob(os.path.join(video_path, '*.jpg')))
        
        print(f"\nVideo {video_idx}/{len(video_folders)}: {video_folder} ({len(frames_paths)} frames)")
        
        video_scores = []
        
        # Process video in sliding windows
        for start_idx in range(0, len(frames_paths) - config.SEQUENCE_LENGTH + 1):
            sequence_paths = frames_paths[start_idx:start_idx + config.SEQUENCE_LENGTH]
            
            # Load frames
            frames_list = []
            for frame_path in sequence_paths:
                img = Image.open(frame_path).convert('RGB')
                img = img.resize((config.IMG_WIDTH, config.IMG_HEIGHT))
                img = np.array(img) / 255.0
                frames_list.append(img)
            
            frames_tensor = np.stack(frames_list, axis=0)
            frames_tensor = torch.from_numpy(frames_tensor).float().permute(3, 0, 1, 2)
            frames_tensor = frames_tensor.unsqueeze(0).to(device)
            
            with torch.no_grad():
                reconstruction, features = model.model(frames_tensor)
                
                dec_to_enc_map = {0: 0, 1: 3, 2: 2, 3: 1, 4: 1}
                app_scores_all = []
                mot_scores_all = []
                
                for layer_idx in config.RECON_LAYERS:
                    enc_feat = features['enc'][layer_idx]
                    dec_feat = features['dec'][dec_to_enc_map[layer_idx]]
                    
                    if enc_feat.shape != dec_feat.shape:
                        dec_feat = F.interpolate(dec_feat, size=enc_feat.shape[2:], 
                                                mode='trilinear', align_corners=True)
                    
                    # Appearance scores
                    app_score = F.mse_loss(dec_feat, enc_feat, reduction='none').mean(dim=[1, 3, 4])
                    app_scores_all.append(app_score)
                    
                    # Motion scores
                    enc_grad = torch.abs(enc_feat[:, :, 1:] - enc_feat[:, :, :-1])
                    dec_grad = torch.abs(dec_feat[:, :, 1:] - dec_feat[:, :, :-1])
                    mot_score = F.mse_loss(dec_grad, enc_grad, reduction='none').mean(dim=[1, 3, 4])
                    mot_scores_all.append(mot_score)
                
                # Average across layers
                app_score = torch.stack(app_scores_all).mean(dim=0).squeeze(0)
                mot_score = torch.stack(mot_scores_all).mean(dim=0).squeeze(0)
                mot_score_padded = F.pad(mot_score, (0, 1), value=mot_score[-1])
                
                # Combine appearance and motion scores
                for t in range(config.SEQUENCE_LENGTH):
                    frame_score = (config.BETA_1 * app_score[t].item() + 
                                  config.BETA_2 * mot_score_padded[t].item())
                    
                    frame_global_idx = start_idx + t
                    while len(video_scores) <= frame_global_idx:
                        video_scores.append([])
                    video_scores[frame_global_idx].append(frame_score)
        
        # Average overlapping window scores
        for frame_num, scores in enumerate(video_scores, start=1):
            if scores:
                all_results.append({
                    'video_id': video_idx,
                    'frame_num': frame_num,
                    'score': np.mean(scores)
                })
    
    # Convert to DataFrame
    df = pd.DataFrame(all_results)
    
    # Normalize scores to [0, 1]
    min_score = df['score'].min()
    max_score = df['score'].max()
    df['score_normalized'] = (df['score'] - min_score) / (max_score - min_score)
    
    # Create competition format
    df['Id'] = df['video_id'].astype(str) + '_' + df['frame_num'].astype(str)
    df['Predicted'] = df['score_normalized'].round(6)
    
    # Select only required columns and sort
    output_df = df[['Id', 'Predicted']].sort_values('Id')
    
    # Save to CSV
    output_df.to_csv(config.OUTPUT_CSV, index=False)
    print(f"\n{'='*60}")
    print(f"‚úì Results saved to {config.OUTPUT_CSV}")
    print(f"‚úì Total predictions: {len(output_df)}")
    print(f"‚úì Videos: {df['video_id'].nunique()}")
    print(f"{'='*60}")
    
    return output_df

print("‚úì Testing function defined!")

---
## üöÄ Main Training & Evaluation

Now we're ready to run the complete pipeline:

1. **Validate dataset** (count frames)
2. **Load datasets** (train sequences)
3. **Create data loaders** (batching)
4. **Build model** (autoencoder)
5. **Train** (with early stopping)
6. **Test** (generate anomaly scores)
7. **Save CSV** (Kaggle submission format)

### ‚è±Ô∏è Expected Training Time (P100 GPU):
- Per epoch: ~4-5 minutes
- With early stopping (8 epochs patience): **~2-4 hours**
- Maximum (80 epochs): ~6-7 hours

In [None]:
def main():
    """Main training and evaluation pipeline"""
    
    # Set random seed for reproducibility
    pl.seed_everything(42)
    
    print("="*60)
    print("üé¨ STARTING VIDEO ANOMALY DETECTION PIPELINE")
    print("="*60)
    
    # ========== STEP 1: VALIDATE DATASET ==========
    frame_counts = get_dataset_frame_counts(config.TRAIN_PATH, config.TEST_PATH)
    
    # ========== STEP 2: CREATE DATASETS ==========
    print("\nLoading datasets...")
    train_dataset = AvenueDataset(
        root_dir=config.TRAIN_PATH,
        sequence_length=config.SEQUENCE_LENGTH,
        mode='train'
    )

    # Split 20% for validation
    val_size = int(0.2 * len(train_dataset))
    train_size = len(train_dataset) - val_size
    train_subset, val_subset = torch.utils.data.random_split(
        train_dataset, [train_size, val_size]
    )

    print(f"‚úì Train sequences: {train_size}")
    print(f"‚úì Validation sequences: {val_size}")
    
    # ========== STEP 3: CREATE DATA LOADERS ==========
    train_loader = DataLoader(
        train_subset,
        batch_size=config.BATCH_SIZE,
        shuffle=True,
        num_workers=2,
        pin_memory=True
    )
    
    val_loader = DataLoader(
        val_subset,
        batch_size=config.BATCH_SIZE,
        shuffle=False,
        num_workers=2,
        pin_memory=True
    )
    
    print(f"‚úì Train sequences: {train_size}")
    print(f"‚úì Validation sequences: {val_size}")
    
    # üîç ADD THIS DIAGNOSTIC CODE HERE ‚Üì
    print("\nüîç DIAGNOSTIC:")
    print(f"  Validation batches per epoch: {val_size // config.BATCH_SIZE}")
    if val_size < 50:
        print("  ‚ö†Ô∏è  WARNING: Validation set is very small!")
        print("  ‚ö†Ô∏è  This can cause val_loss to be unstable/incorrect")
    # üîç END OF DIAGNOSTIC CODE ‚Üë

    # ========== STEP 4: BUILD MODEL ==========
    print("\nBuilding model...")
    model = AnomalyDetectionModel(config)
    
    # ========== STEP 5: SETUP CALLBACKS ==========
    # Early stopping: stops if val_loss doesn't improve for 8 epochs
    early_stop_callback = EarlyStopping(
        monitor='val_loss',
        patience=config.PATIENCE,
        mode='min',
        verbose=True
    )
    
    # Model checkpoint: saves best model
    checkpoint_callback = ModelCheckpoint(
        dirpath=config.CHECKPOINT_DIR,
        filename='anomaly-{epoch:02d}-{val_loss:.4f}',
        monitor='val_loss',
        mode='min',
        save_top_k=3
    )
    
    # ========== STEP 6: CREATE TRAINER ==========
    trainer = pl.Trainer(
        max_epochs=config.MAX_EPOCHS,
        callbacks=[early_stop_callback, checkpoint_callback],
        accelerator='auto',  # Automatically use GPU if available
        devices=1,
        precision=32,  # Mixed precision for faster training
        log_every_n_steps=10,
        accumulate_grad_batches=4,
        gradient_clip_val=1.0
    )
    
    # ========== STEP 7: TRAIN MODEL ==========
    print("\n" + "="*60)
    print("üèãÔ∏è  STARTING TRAINING")
    print("="*60)
    trainer.fit(model, train_loader, val_loader)
    
    # ========== STEP 8: TEST AND GENERATE CSV ==========
    print("\n" + "="*60)
    print("üß™ TESTING ON TEST DATASET")
    print("="*60)
    
    best_model_path = checkpoint_callback.best_model_path
    print(f"Loading best model from: {best_model_path}")
    
    best_model = AnomalyDetectionModel.load_from_checkpoint(best_model_path, config=config)
    
    test_dataset = AvenueDataset(
        root_dir=config.TEST_PATH,
        sequence_length=config.SEQUENCE_LENGTH,
        mode='test'
    )
    
    # Generate predictions
    results_df = test_and_generate_csv_by_video(best_model, test_dataset, config)
    
    # ========== STEP 9: DISPLAY RESULTS ==========
    print("\n" + "="*60)
    print("‚úÖ TRAINING AND TESTING COMPLETED!")
    print("="*60)
    
    print("\nüìä Sample output (first 20 rows):")
    print(results_df.head(20).to_string(index=False))
    
    print("\nüìä Sample output (last 20 rows):")
    print(results_df.tail(20).to_string(index=False))
    
    print("\nüìà Anomaly score statistics:")
    print(results_df['Predicted'].describe())
    
    print(f"\nüéâ Done! Submit '{config.OUTPUT_CSV}' to Kaggle!")
    
    return results_df

print("‚úì Main function defined!")

---
## üéØ Run Training & Evaluation

Execute the cell below to start the complete pipeline. This will:
- Validate your dataset
- Train the model (with progress bars)
- Generate anomaly scores
- Save results to `anomaly_scores.csv`

**Note**: This may take 2-6 hours depending on your GPU and early stopping.

In [None]:
# Run the complete pipeline
if __name__ == '__main__':
    results = main()

---
## üéä Next Steps

### üì§ Submit to Kaggle
1. Download `anomaly_scores.csv`
2. Go to the competition page
3. Submit your results!

### üîß Hyperparameter Tuning (Optional)

If you want to improve results, try adjusting:

```python
# In Config class:
BATCH_SIZE = 8           # Smaller batch = more updates
LEARNING_RATE = 0.0002   # Higher LR = faster learning
ALPHA_2 = 30.0           # More weight on motion
BETA_2 = 3.0             # More weight on motion scores
```

### üìä Visualize Results (Optional)

Add this cell to visualize anomaly score distribution:

```python
import matplotlib.pyplot as plt

plt.figure(figsize=(12, 4))
plt.plot(results['Predicted'].values)
plt.xlabel('Frame Index')
plt.ylabel('Anomaly Score')
plt.title('Anomaly Scores Across All Test Frames')
plt.grid(True)
plt.show()
```

### üß† Model Insights

- **High scores** (close to 1.0) = Anomalous frames
- **Low scores** (close to 0.0) = Normal frames
- The model learns what "normal" looks like during training
- Anything that deviates significantly gets a high anomaly score

---

**Good luck with your submission! üöÄ**