# UCF-Crime Anomaly Detection - Colab Setup

Setup notebook for running MIL Ranking Loss re-implementation on Google Colab.

## 1. Check GPU

In [2]:
!nvidia-smi

# Verify PyTorch can use GPU
import torch
print(f"\nPyTorch CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"GPU: {torch.cuda.get_device_name(0)}")
    print(f"CUDA version: {torch.version.cuda}")

/bin/bash: line 1: nvidia-smi: command not found

PyTorch CUDA available: False


## 2. Mount Google Drive

**Prerequisites:**
1. Upload features.zip and annotations.zip to Google Drive
2. Google Drive structure:
```
MyDrive/
└── Colab Notebooks/
    └── data_distribution/
        ├── features.zip
        └── annotations.zip
```

In [3]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


## 3. Clone Repository

In [4]:
!git clone https://github.com/KwonPodo/MILRankingLoss_Sultani2018_ReImplementation.git
%cd MILRankingLoss_Sultani2018_ReImplementation

Cloning into 'MILRankingLoss_Sultani2018_ReImplementation'...
remote: Enumerating objects: 39, done.[K
remote: Counting objects: 100% (39/39), done.[K
remote: Compressing objects: 100% (31/31), done.[K
remote: Total 39 (delta 7), reused 37 (delta 5), pack-reused 0 (from 0)[K
Receiving objects: 100% (39/39), 245.14 KiB | 7.66 MiB/s, done.
Resolving deltas: 100% (7/7), done.
/content/MILRankingLoss_Sultani2018_ReImplementation


## 4. Install Packages

**Note:** Using `requirements-colab.txt` to avoid package conflicts

In [5]:
# Install minimal packages for Colab (avoid conflicts)
!pip install -r requirements-colab.txt -q

# Check installed package versions
import torch
import numpy as np
print(f"PyTorch: {torch.__version__}")
print(f"NumPy: {np.__version__}")

PyTorch: 2.9.0+cu126
NumPy: 2.0.2


## 5. Extract Data

Extract features from Google Drive.

In [7]:
import os

# Verify working directory
%cd /content/MILRankingLoss_Sultani2018_ReImplementation
!pwd

# Create data directory
!mkdir -p data

# Google Drive path
DRIVE_DATA_PATH = '/content/drive/MyDrive/Colab Notebooks/data_distribution'

# Extract features
!cp "{DRIVE_DATA_PATH}/features.zip" data/
!unzip -q data/features.zip -d data/
!rm data/features.zip

print("Features extracted")
!ls -lh data/

/content/MILRankingLoss_Sultani2018_ReImplementation
/content/MILRankingLoss_Sultani2018_ReImplementation
Features extracted
total 4.0K
drwxr-xr-x 17 root root 4.0K Oct 31 17:28 features


## 6. Extract Annotations

Extract annotation files from Google Drive.

In [8]:
# Extract annotations from Google Drive
!cp "{DRIVE_DATA_PATH}/annotations.zip" data/
!unzip -q data/annotations.zip -d data/
!rm data/annotations.zip

print("Annotations extracted")
!ls -lh data/annotations/

Annotations extracted
total 100K
-rwxr-xr-x 1 root root 16K Jan  3  2023 Temporal_Anomaly_Annotation_for_Testing_Videos.txt
-rwxr-xr-x 1 root root 13K Oct 31 17:28 test_set.txt
-rwxr-xr-x 1 root root 66K Oct 31 17:28 train_set.txt


## 7. Verify Dataset

In [9]:
# Verify working directory
%cd /content/MILRankingLoss_Sultani2018_ReImplementation

# Check feature categories
!ls data/features/

# Check sample counts
!echo "Train samples:"
!wc -l data/annotations/train_set.txt
!echo "Test samples:"
!wc -l data/annotations/test_set.txt

/content/MILRankingLoss_Sultani2018_ReImplementation
Abuse	Assault    Fighting	  Shooting     Testing_Normal_Videos_Anomaly
Arrest	Burglary   RoadAccidents  Shoplifting  Training_Normal_Videos_Anomaly
Arson	Explosion  Robbery	  Stealing     Vandalism
Train samples:
1610 data/annotations/train_set.txt
Test samples:
290 data/annotations/test_set.txt


## 8. Test Dataset Loading

In [10]:
%cd /content/MILRankingLoss_Sultani2018_ReImplementation
!PYTHONPATH=/content/MILRankingLoss_Sultani2018_ReImplementation:$PYTHONPATH python scripts/test_dataset.py

/content/MILRankingLoss_Sultani2018_ReImplementation
Total samples in dataset: 1610
Positive samples: 810
Negative samples: 800

First batch:

Positive bags: torch.Size([30, 32, 4096])

Negative bags: torch.Size([30, 32, 4096])
Batch 0: pos=30, neg=30
Batch 1: pos=30, neg=30
Batch 2: pos=30, neg=30
Batch 3: pos=30, neg=30


## 9. Test Model

In [11]:
%cd /content/MILRankingLoss_Sultani2018_ReImplementation
!PYTHONPATH=/content/MILRankingLoss_Sultani2018_ReImplementation:$PYTHONPATH python scripts/test_model.py

/content/MILRankingLoss_Sultani2018_ReImplementation
Model architecture:
AnomalyDetector(
  (fc1): Linear(in_features=4096, out_features=512, bias=True)
  (fc2): Linear(in_features=512, out_features=32, bias=True)
  (fc3): Linear(in_features=32, out_features=1, bias=True)
  (dropout): Dropout(p=0.6, inplace=False)
  (relu): ReLU()
  (sigmoid): Sigmoid()
)

Total parameters: 2,114,113

Positive scores shape: torch.Size([30, 32])
Negative scores shape: torch.Size([30, 32])
Score range: [0.4856, 0.5983]

Total loss: 1.0416
  Ranking loss: 1.0000
  Smoothness loss: 0.5422
  Sparsity loss: 519.6133

Training mode loss: 1.0364


## 10. Start Training

### Option 1: Train without WandB

In [None]:
%cd /content/MILRankingLoss_Sultani2018_ReImplementation
!PYTHONPATH=/content/MILRankingLoss_Sultani2018_ReImplementation:$PYTHONPATH python train.py --config configs/default.yaml --no-wandb

/content/MILRankingLoss_Sultani2018_ReImplementation
Loaded config from configs/default.yaml
Using device: cpu
Train dataset: 1610 videos
Positive samples: 810
Negative samples: 800
Total batches per epoch: 26
Model parameters: 2,114,113
Optimizer: adam

Starting training for 100 epochs...
Epoch 1: 100% 26/26 [01:29<00:00,  3.44s/it, loss=1.0356, rank=0.9996]
Epoch 1/100
  Loss: 1.0362
  Ranking: 0.9998
  Smoothness: 0.0377
  Sparsity: 454.4468
  Saved checkpoint: checkpoints/epoch_1.pth
  New best model: checkpoints/best_model.pth
Epoch 2: 100% 26/26 [01:32<00:00,  3.56s/it, loss=1.0352, rank=0.9998]
Epoch 2/100
  Loss: 1.0356
  Ranking: 0.9998
  Smoothness: 0.0530
  Sparsity: 446.9604
  Saved checkpoint: checkpoints/epoch_2.pth
  New best model: checkpoints/best_model.pth
Epoch 3:  77% 20/26 [01:22<00:18,  3.06s/it, loss=1.0351, rank=1.0000]

## 11. Evaluate

Evaluate trained model.

In [None]:
%cd /content/MILRankingLoss_Sultani2018_ReImplementation
!PYTHONPATH=/content/MILRankingLoss_Sultani2018_ReImplementation:$PYTHONPATH python evaluate.py \
    --config configs/default.yaml \
    --checkpoint checkpoints/best_model.pth \
    --temporal-annotation data/annotations/Temporal_Anomaly_Annotation_for_Testing_Videos.txt

## 12. View Results

In [None]:
%cd /content/MILRankingLoss_Sultani2018_ReImplementation

# Display ROC curve
from IPython.display import Image, display
import os

if os.path.exists('results/roc_curve.png'):
    display(Image('results/roc_curve.png'))


if os.path.exists('results/pr_curve.png'):
    display(Image('results/pr_curve.png'))

# Print evaluation results
if os.path.exists('results/evaluation_summary.txt'):
    !cat results/evaluation_summary.txt

## 13. (Optional) Save Results to Google Drive

In [None]:
%cd /content/MILRankingLoss_Sultani2018_ReImplementation

# Backup checkpoints and results to Drive
DRIVE_DATA_PATH = '/content/drive/MyDrive/Colab Notebooks/data_distribution'
!mkdir -p "{DRIVE_DATA_PATH}/results"
!cp -r checkpoints "{DRIVE_DATA_PATH}/"
!cp -r results "{DRIVE_DATA_PATH}/"

print("Results saved to Google Drive")

1. LeakyReLU Activation Function:

Before: The original model used the ReLU activation function.

After: I replaced ReLU with LeakyReLU. This modification allows a small, non-zero gradient when the input is negative (by setting a small slope, negative_slope=0.01). This is beneficial because, with ReLU, some neurons can "die" during training (i.e., they stop updating because they always output 0 for negative inputs). LeakyReLU mitigates this problem by allowing small negative values to propagate.

Why: LeakyReLU helps the model learn more efficiently, especially in deep networks where dead neurons can be a problem, improving the overall performance and training stability.

2. Dropout Regularization:

Before: The original model didn't have dropout regularization, which can lead to overfitting, especially with complex models and limited data.

After: I added Dropout with a rate of 0.5 (50% chance of dropping a neuron during training).

Why: Dropout helps prevent overfitting by randomly setting a fraction of input units to zero during each forward pass. This encourages the model to not rely too heavily on any single neuron, thus improving generalization to unseen data.

3. Batch Normalization:

Before: The original model didn't use Batch Normalization.

After: I added Batch Normalization after each fully connected layer (FC1 and FC2). This normalizes the output of each layer to have zero mean and unit variance, which stabilizes and speeds up training.

Why: Batch Normalization helps the model learn faster by reducing internal covariate shift, which can lead to better performance, especially in deeper networks.

4. Network Architecture Changes:

Before: The original model had a simpler architecture with just two fully connected layers: 4096 -> 512 and 512 -> 1.

After: I added a third fully connected layer with 512 -> 64 neurons. This gives the model an additional layer of abstraction, which could help it learn more complex patterns.

Why: Increasing the depth of the network allows the model to learn more complex representations of the data. With the additional layer, the network can model finer-grained relationships between input features, which should improve its ability to detect anomalies.

5. Sigmoid Activation:

Before: The original model used a Sigmoid function at the output layer, which is still maintained in the new model.

Why: The Sigmoid activation is useful for binary classification tasks (0 or 1), where we want the output to represent the probability of an anomaly. Since this is an anomaly detection task, the Sigmoid activation ensures that the model outputs a value between 0 and 1, representing the likelihood of each segment being an anomaly.

In [None]:
import torch
import torch.nn as nn
import torch.optim as optim

class ImprovedAnomalyDetector(nn.Module):
    """
    Improved anomaly detection model with LSTM layers for temporal dependencies,
    dropout, batch normalization, and LeakyReLU activation.

    Architecture:
    - LSTM: Input (4096) -> Hidden (256) -> Output (LSTM output)
    - FC1: LSTM output -> 512 (LeakyReLU + BatchNorm + Dropout)
    - FC2: 512 -> 64 (LeakyReLU + BatchNorm + Dropout)
    - FC3: 64 -> 1 (Sigmoid)
    """

    def __init__(self, input_dim=4096, lstm_hidden_size=256, dropout=0.5, lstm_num_layers=2):
        super(ImprovedAnomalyDetector, self).__init__()

        # LSTM Layer for temporal sequence learning
        self.lstm = nn.LSTM(input_size=input_dim, hidden_size=lstm_hidden_size,
                            num_layers=lstm_num_layers, batch_first=True)

        # Fully Connected Layers after LSTM
        self.fc1 = nn.Linear(lstm_hidden_size, 512)
        self.fc2 = nn.Linear(512, 64)
        self.fc3 = nn.Linear(64, 1)

        # Batch Normalization (optional)
        self.bn1 = nn.BatchNorm1d(512)
        self.bn2 = nn.BatchNorm1d(64)

        # Dropout and activation functions
        self.dropout = nn.Dropout(dropout)
        self.leaky_relu = nn.LeakyReLU(negative_slope=0.01)
        self.sigmoid = nn.Sigmoid()

    def forward(self, x):
        """
        Args:
            x: (batch_size, num_segments, feature_dim)
               e.g., (30, 32, 4096)

        Returns:
            scores: (batch_size, num_segments)
                    Anomaly score for each segment (0~1)
        """
        batch_size, num_segments, feature_dim = x.shape

        # LSTM for temporal modeling (use the LSTM output from the last time step)
        lstm_out, (hn, cn) = self.lstm(x)  # lstm_out has shape (batch_size, num_segments, lstm_hidden_size)

        # Option 1: Use only the last hidden state
        x = lstm_out[:, -1, :]  # (batch_size, lstm_hidden_size)

        # Fully connected layers with BatchNorm, LeakyReLU, and Dropout
        x = self.fc1(x)           # (batch_size, 512)
        x = self.bn1(x)
        x = self.leaky_relu(x)
        x = self.dropout(x)

        x = self.fc2(x)           # (batch_size, 64)
        x = self.bn2(x)
        x = self.leaky_relu(x)
        x = self.dropout(x)

        x = self.fc3(x)           # (batch_size, 1)
        x = self.sigmoid(x)       # Final anomaly score

        return x  # Anomaly score for the entire sequence (0~1)



1. Regularization of Smoothness and Sparsity with Alpha:

Before: In the original MILRankingLoss class, the smoothness and sparsity constraints were used directly in the loss function. The weights for these constraints, λ1 (smoothness) and λ2 (sparsity), were defined, but there was no control over how much these two components influenced the total loss.

After: I added a new parameter, alpha, which scales the contributions of the smoothness and sparsity terms. This allows us to have more fine-grained control over how much these constraints should affect the final loss.

Why: The regularization factor (alpha) allows you to balance the importance of sparsity and smoothness in the overall model. This flexibility can be useful for optimizing the model's performance when you want to adjust how much these terms contribute to the loss function.

2. Addition of alpha for Sparsity and Smoothness:

Before: The original loss function only included λ1 and λ2 for smoothness and sparsity, respectively, but they were weighted equally in the final loss.

After: The total loss is now modified by adding alpha times the sum of the smoothness loss and sparsity loss. This makes it possible to scale the contribution of these terms separately.

Why: The alpha factor provides a regularization mechanism that can tune how much influence the smoothness and sparsity have on the final loss. This is helpful because the effects of smoothness and sparsity can vary depending on the data, so alpha allows for easier tuning of the model's behavior.

In [None]:
import torch
import torch.nn as nn

class ImprovedMILRankingLoss(nn.Module):
    """
    Multiple Instance Learning Ranking Loss with sparsity, smoothness, and L2 regularization constraints.

    Loss formula:
    loss = hinge_loss + λ1 * smoothness + λ2 * sparsity + λ3 * L2_regularization

    where:
    - hinge_loss = max(0, 1 - max(pos_scores) + max(neg_scores))
    - smoothness = sum of squared differences between adjacent segments
    - sparsity = sum of all positive bag scores
    - L2_regularization = weight decay on model weights
    """

    def __init__(self, lambda1=0.00008, lambda2=0.00008, lambda3=0.0001, alpha=1.0):
        """
        Args:
            lambda1: weight for temporal smoothness constraint
            lambda2: weight for sparsity constraint
            lambda3: weight for L2 regularization (model weights)
            alpha: factor to control how much sparsity and smoothness contribute to the final loss
        """
        super(ImprovedMILRankingLoss, self).__init__()
        self.lambda1 = lambda1
        self.lambda2 = lambda2
        self.lambda3 = lambda3  # L2 regularization weight
        self.alpha = alpha  # Regularization factor for sparsity and smoothness terms

    def forward(self, pos_scores, neg_scores, model=None):
        """
        Args:
            pos_scores: (batch_pos, num_segments) - scores for positive bags
            neg_scores: (batch_neg, num_segments) - scores for negative bags
            model: (optional) The model from which to calculate L2 regularization loss

        Returns:
            total_loss: scalar tensor
            loss_dict: dictionary with individual loss components for logging
        """
        # MIL ranking loss: max score of positive bag should be higher than negative
        pos_max = torch.max(pos_scores, dim=1)[0]  # (batch_pos,)
        neg_max = torch.max(neg_scores, dim=1)[0]  # (batch_neg,)

        # Hinge loss (ranking loss)
        ranking_loss = torch.clamp(
            1.0 - pos_max.mean() + neg_max.mean(),
            min=0
        )

        # Temporal smoothness: minimize difference between adjacent segments
        smoothness_loss = 0
        if pos_scores.size(1) > 1:  # if more than 1 segment
            temporal_diff = pos_scores[:, 1:] - pos_scores[:, :-1]  # (batch, num_segments-1)
            smoothness_loss = torch.sum(temporal_diff ** 2)

        # Sparsity: minimize sum of all scores (encourage sparse anomalies)
        sparsity_loss = torch.sum(pos_scores)

        # L2 Regularization: apply L2 penalty on model parameters
        l2_loss = 0
        if model is not None:
            for param in model.parameters():
                l2_loss += torch.sum(param ** 2)

        # Total loss: Sum of all components
        total_loss = ranking_loss + self.lambda1 * smoothness_loss + self.lambda2 * sparsity_loss + self.lambda3 * l2_loss

        # Optionally scale smoothness and sparsity with alpha to control how much they contribute
        total_loss += self.alpha * (self.lambda1 * smoothness_loss + self.lambda2 * sparsity_loss)

        return total_loss, {
            'ranking_loss': ranking_loss.item(),
            'smoothness_loss': smoothness_loss.item() if isinstance(smoothness_loss, torch.Tensor) else 0.0,
            'sparsity_loss': sparsity_loss.item(),
            'l2_loss': l2_loss.item() if isinstance(l2_loss, torch.Tensor) else 0.0
        }


1. Learning Rate Scheduling:

Before: The learning rate has remained constant throughout training, which could limit the model's ability to adapt as training progresses.

After: I added a learning rate scheduler (StepLR) to decay the learning rate by 30% every 10 epochs.

Why: A learning rate scheduler helps the model converge more efficiently by lowering the learning rate as training progresses. This enables finer adjustments to the model's weights as it approaches the optimal solution.

In [None]:
import torch
import torch.optim as optim
from torch.utils.data import DataLoader
from tqdm import tqdm
from pathlib import Path
from utils.dataset import C3DFeatureDataset, collate_fn
from utils.sampler import BalancedBatchSampler

def build_model(config, device):
    """Build model and move to device"""
    model = ImprovedAnomalyDetector(
        input_dim=config['model']['input_dim'],
        dropout=config['model']['dropout']
    )
    model = model.to(device)
    return model

def initialize_weights(model):
    """Initialize weights using Xavier uniform distribution for Linear layers."""
    for m in model.modules():
        if isinstance(m, nn.Linear):
            nn.init.xavier_uniform_(m.weight)  # Xavier initialization
            if m.bias is not None:
                nn.init.constant_(m.bias, 0)  # Initialize bias to 0

def build_optimizer(model, config):
    """Build optimizer"""
    optimizer_name = config['training']['optimizer'].lower()
    lr = config['training']['learning_rate']
    weight_decay = config['training']['lambda3']

    if optimizer_name == 'adam':
        optimizer = optim.Adam(model.parameters(), lr=lr, weight_decay=weight_decay)
    elif optimizer_name == 'adagrad':
        optimizer = optim.Adagrad(model.parameters(), lr=lr, weight_decay=weight_decay)
    elif optimizer_name == 'adamw':  # Add support for AdamW
        optimizer = optim.AdamW(model.parameters(), lr=lr, weight_decay=weight_decay)
    else:
        raise ValueError(f"Unknown optimizer: {optimizer_name}")

    return optimizer


def save_checkpoint(model, optimizer, epoch, loss, save_path):
    """Save model checkpoint"""
    checkpoint = {
        'epoch': epoch,
        'model_state_dict': model.state_dict(),
        'optimizer_state_dict': optimizer.state_dict(),
        'loss': loss
    }
    torch.save(checkpoint, save_path)
    print(f"Checkpoint saved to {save_path}")

def train_epoch(model, loader, criterion, optimizer, device, epoch):
    """Train one epoch with early stopping"""
    model.train()

    epoch_loss = 0.0
    epoch_ranking_loss = 0.0
    epoch_smoothness_loss = 0.0
    epoch_sparsity_loss = 0.0

    progress_bar = tqdm(loader, desc=f"Epoch {epoch}")

    for batch_idx, batch in enumerate(progress_bar):
        pos_features = batch['pos_features']
        neg_features = batch['neg_features']

        if pos_features is None or neg_features is None:
            continue

        pos_features = pos_features.to(device)
        neg_features = neg_features.to(device)

        # Forward
        pos_scores = model(pos_features)
        neg_scores = model(neg_features)

        # Loss
        loss, loss_dict = criterion(pos_scores, neg_scores)

        # Backward
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        # Accumulate losses
        epoch_loss += loss.item()
        epoch_ranking_loss += loss_dict['ranking_loss']
        epoch_smoothness_loss += loss_dict['smoothness_loss']
        epoch_sparsity_loss += loss_dict['sparsity_loss']

        # Update progress bar
        progress_bar.set_postfix({
            'loss': f"{loss.item():.4f}",
            'rank': f"{loss_dict['ranking_loss']:.4f}"
        })

    num_batches = len(loader)
    avg_loss = epoch_loss / num_batches
    avg_ranking = epoch_ranking_loss / num_batches
    avg_smoothness = epoch_smoothness_loss / num_batches
    avg_sparsity = epoch_sparsity_loss / num_batches

    return {
        'loss': avg_loss,
        'ranking_loss': avg_ranking,
        'smoothness_loss': avg_smoothness,
        'sparsity_loss': avg_sparsity
    }


def main(config):
    # Setup device
    device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
    print(f"Using device: {device}")

    # Create checkpoint directory
    checkpoint_dir = Path('checkpoints/New Model')
    checkpoint_dir.mkdir(exist_ok=True)

    # Build dataset
    train_dataset = C3DFeatureDataset(
        annotation_path=config['data']['train_annotation_path'],
        features_root=config['data']['feature_path']
    )
    print(f"Train dataset: {len(train_dataset)} videos")

    # Build sampler and loader
    sampler = BalancedBatchSampler(
        train_dataset,
        batch_size=config['training']['batch_size']
    )

    train_loader = DataLoader(
        train_dataset,
        batch_sampler=sampler,
        collate_fn=collate_fn,
        num_workers=4
    )
    print(f"Total batches per epoch: {len(train_loader)}")

    # Build model
    model = build_model(config, device)
    initialize_weights(model)  # Apply custom weight initialization
    print(f"Model parameters: {sum(p.numel() for p in model.parameters()):,}")

    # Build optimizer
    optimizer = build_optimizer(model, config)
    print(f"Optimizer: {config['training']['optimizer']}")

    # Build loss
    criterion = ImprovedMILRankingLoss(
        lambda1=config['training']['lambda1'],
        lambda2=config['training']['lambda2']
    )

    # Learning rate scheduler
    scheduler = optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max=30, eta_min=0)

    # Training loop
    num_epochs = config['training']['num_epochs']
    best_loss = float('inf')

    print(f"\nStarting training for {num_epochs} epochs...")

    for epoch in range(1, num_epochs + 1):
        metrics = train_epoch(model, train_loader, criterion, optimizer, device, epoch)

        print(f"Epoch {epoch}/{num_epochs}")
        print(f"  Loss: {metrics['loss']:.4f}")
        print(f"  Ranking: {metrics['ranking_loss']:.4f}")
        print(f"  Smoothness: {metrics['smoothness_loss']:.4f}")
        print(f"  Sparsity: {metrics['sparsity_loss']:.4f}")

        # Save checkpoint
        if epoch % 10 == 0 or metrics['loss'] < best_loss:
            save_path = checkpoint_dir / f'epoch_{epoch}.pth'
            save_checkpoint(model, optimizer, epoch, metrics['loss'], save_path)
            print(f"  Saved checkpoint: {save_path}")

            if metrics['loss'] < best_loss:
                best_loss = metrics['loss']
                best_path = checkpoint_dir / 'best_model.pth'
                save_checkpoint(model, optimizer, epoch, metrics['loss'], best_path)
                print(f"  New best model: {best_path}")

        scheduler.step()

    print("\nTraining completed!")


if __name__ == '__main__':
    config = {
        'model': {'input_dim': 4096, 'dropout': 0.5},
        'training': {'optimizer': 'adamw', 'learning_rate': 1e-4, 'lambda3': 0.01, 'num_epochs': 100, 'batch_size': 32, 'lambda1': 0.00008, 'lambda2': 0.00008},
        'data': {'train_annotation_path': "data/annotations/train_set.txt", 'feature_path': "data/features/", "test_annotation_path": "data/annotations/test_set.txt"}
    }
    main(config)


1. Precision-Recall Curve:

Before: Precision-Recall (PR) curve not have been included.

After: I added the plot_pr_curve function to calculate and save the PR curve and its AUC for further evaluation.

Why: The PR curve is especially useful in imbalanced datasets like anomaly detection, where the number of normal segments greatly outweighs anomalies. The PR AUC gives a better sense of model performance in these cases.

In [None]:
import yaml
from pathlib import Path
import torch
import numpy as np
from torch.utils.data import DataLoader
from sklearn.metrics import roc_curve, auc
import matplotlib.pyplot as plt
from tqdm import tqdm

from models.anomaly_detector import AnomalyDetector
from utils.dataset import C3DFeatureDataset

def load_temporal_annotations(annotation_file):
    """Load temporal annotations for test videos."""
    annotations = {}

    with open(annotation_file, 'r') as f:
        for line in f:
            parts = line.strip().split()
            if len(parts) < 6:
                continue

            video_name = parts[0].replace('.mp4', '')  # Remove extension
            start1, end1 = int(parts[2]), int(parts[3])
            start2, end2 = int(parts[4]), int(parts[5])

            segments = []
            if start1 != -1 and end1 != -1:
                segments.append((start1, end1))
            if start2 != -1 and end2 != -1:
                segments.append((start2, end2))

            annotations[video_name] = segments

    return annotations


def get_frame_level_labels(video_name, annotations, num_segments=32, fps=30):
    """Generate binary labels for video segments (0 = normal, 1 = anomaly)."""
    labels = np.zeros(num_segments, dtype=np.int32)

    base_name = video_name.split('/')[-1]

    if base_name not in annotations:
        return labels

    anomaly_segments = annotations[base_name]

    if not anomaly_segments:
        return labels

    max_frame = max(end for _, end in anomaly_segments)

    frames_per_segment = max_frame / num_segments

    for seg_idx in range(num_segments):
        seg_start = seg_idx * frames_per_segment
        seg_end = (seg_idx + 1) * frames_per_segment

        for anomaly_start, anomaly_end in anomaly_segments:
            if not (seg_end < anomaly_start or seg_start > anomaly_end):
                labels[seg_idx] = 1
                break

    return labels


def evaluate_model(model, dataset, annotations, device):
    """Evaluate model on test set."""
    model.eval()

    all_labels = []
    all_scores = []

    with torch.no_grad():
        for idx in tqdm(range(len(dataset)), desc="Evaluating"):
            sample = dataset[idx]
            features = sample['features'].unsqueeze(0).to(device)  # (1, 32, 4096)
            video_name = sample['video_name']

            scores = model(features).squeeze(0).cpu().numpy()  # (32,)
            labels = get_frame_level_labels(video_name, annotations)

            all_labels.extend(labels)
            all_scores.extend(scores)

    return np.array(all_labels), np.array(all_scores)


def plot_roc_curve(labels, scores, save_path):
    """Plot and save ROC curve"""
    fpr, tpr, thresholds = roc_curve(labels, scores)
    roc_auc = auc(fpr, tpr)

    plt.figure(figsize=(8, 6))
    plt.plot(fpr, tpr, color='darkorange', lw=2,
             label=f'ROC curve (AUC = {roc_auc:.4f})')
    plt.plot([0, 1], [0, 1], color='navy', lw=2, linestyle='--')
    plt.xlim([0.0, 1.0])
    plt.ylim([0.0, 1.05])
    plt.xlabel('False Positive Rate')
    plt.ylabel('True Positive Rate')
    plt.title('Receiver Operating Characteristic (ROC) Curve')
    plt.legend(loc="lower right")
    plt.grid(alpha=0.3)
    plt.tight_layout()
    plt.savefig(save_path, dpi=300)
    plt.close()

    print(f"ROC curve saved to {save_path}")

    return roc_auc, fpr, tpr, thresholds


from sklearn.metrics import precision_recall_curve

def plot_pr_curve(labels, scores, save_path):
    """Plot and save Precision-Recall curve"""
    precision, recall, _ = precision_recall_curve(labels, scores)
    pr_auc = auc(recall, precision)

    plt.figure(figsize=(8, 6))
    plt.plot(recall, precision, color='blue', lw=2, label=f'PR curve (AUC = {pr_auc:.4f})')
    plt.xlabel('Recall')
    plt.ylabel('Precision')
    plt.title('Precision-Recall Curve')
    plt.legend(loc="lower left")
    plt.grid(alpha=0.3)
    plt.tight_layout()
    plt.savefig(save_path, dpi=300)
    plt.close()

    print(f"PR curve saved to {save_path}")

    return pr_auc


def save_results(labels, scores, save_path):
    """Save evaluation results"""
    results = {
        'labels': labels.tolist(),
        'scores': scores.tolist()
    }

    import json
    with open(save_path, 'w') as f:
        json.dump(results, f)

    print(f"Results saved to {save_path}")


def main(config_file, checkpoint_file, temporal_annotation_file):
    config = config_file

    # Setup device
    device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
    print(f"Using device: {device}")

    # Load model
    model = ImprovedAnomalyDetector(
        input_dim=config['model']['input_dim'],
        dropout=config['model']['dropout']
    )

    checkpoint = torch.load(checkpoint_file, map_location=device)
    model.load_state_dict(checkpoint['model_state_dict'])
    model = model.to(device)

    print(f"Loaded model from {checkpoint_file}")
    print(f"Epoch: {checkpoint['epoch']}, Loss: {checkpoint['loss']:.4f}")

    # Load test dataset
    test_dataset = C3DFeatureDataset(
        annotation_path=config['data']['test_annotation_path'],
        features_root=config['data']['feature_path']
    )

    print(f"Test dataset: {len(test_dataset)} videos")

    # Load temporal annotations
    annotations = load_temporal_annotations(temporal_annotation_file)
    print(f"Loaded temporal annotations for {len(annotations)} videos")

    # Evaluate
    print("\nEvaluating new model...")
    new_labels, new_scores = evaluate_model(model, test_dataset, annotations, device)

    print(f"\nTotal segments evaluated: {len(new_labels)}")
    print(f"Anomaly segments: {new_labels.sum()} ({new_labels.sum()/len(new_labels)*100:.1f}%)")
    print(f"Normal segments: {len(new_labels) - new_labels.sum()} ({(len(new_labels)-new_labels.sum())/len(new_labels)*100:.1f}%)")

    # Calculate ROC-AUC
    print("\nCalculating ROC curve...")
    results_dir = Path('results/New Model')
    results_dir.mkdir(exist_ok=True)

    roc_auc, fpr, tpr, thresholds = plot_roc_curve(
        new_labels, new_scores,
        save_path=results_dir / 'roc_curve.png'
    )

    # Plot and save PR curve
    pr_auc = plot_pr_curve(
        new_labels, new_scores,
        save_path=results_dir / 'pr_curve.png'
    )

    print(f"\n{'='*60}")
    print(f"AUC: {roc_auc:.4f}")
    print(f"PR AUC: {pr_auc:.4f}")
    print(f"{'='*60}")

    # Save results
    save_results(new_labels, new_scores, results_dir / 'evaluation_results.json')

    # Find optimal threshold (Youden's J statistic)
    j_scores = tpr - fpr
    optimal_idx = np.argmax(j_scores)
    optimal_threshold = thresholds[optimal_idx]

    print(f"\nOptimal threshold: {optimal_threshold:.4f}")
    print(f"  TPR: {tpr[optimal_idx]:.4f}")
    print(f"  FPR: {fpr[optimal_idx]:.4f}")

    # Save summary
    summary_path = results_dir / 'evaluation_summary.txt'
    with open(summary_path, 'w') as f:
        f.write(f"Evaluation Summary\n")
        f.write(f"{'='*60}\n")
        f.write(f"Model: {checkpoint_file}\n")
        f.write(f"Test videos: {len(test_dataset)}\n")
        f.write(f"Total segments: {len(new_labels)}\n")
        f.write(f"Anomaly segments: {new_labels.sum()} ({new_labels.sum()/len(new_labels)*100:.1f}%)\n")
        f.write(f"\nResults:\n")
        f.write(f"  AUC: {roc_auc:.4f}\n")
        f.write(f"  PR AUC: {pr_auc:.4f}\n")
        f.write(f"  Optimal threshold: {optimal_threshold:.4f}\n")
        f.write(f"  TPR at optimal: {tpr[optimal_idx]:.4f}\n")
        f.write(f"  FPR at optimal: {fpr[optimal_idx]:.4f}\n")

    print(f"\nSummary saved to {summary_path}")


# Now, instead of using argparse, call the main function directly
config = {
    'model': {'input_dim': 4096, 'dropout': 0.5},
    'training': {'optimizer': 'adamw', 'learning_rate': 1e-4, 'lambda3': 0.01, 'num_epochs': 100, 'batch_size': 32, 'lambda1': 0.00008, 'lambda2': 0.00008},
    'data': {'train_annotation_path': "data/annotations/train_set.txt", 'feature_path': "data/features/", "test_annotation_path": "data/annotations/test_set.txt"}
}

checkpoint_file = '/content/MILRankingLoss_Sultani2018_ReImplementation/checkpoints/New Model/best_model.pth'  # Replace with your path
temporal_annotation_file = 'data/annotations/Temporal_Anomaly_Annotation_for_Testing_Videos.txt'  # Replace with your path

main(config, checkpoint_file, temporal_annotation_file)


In [None]:
%cd /content/MILRankingLoss_Sultani2018_ReImplementation

# Display ROC curve
from IPython.display import Image, display
import os

if os.path.exists('results/roc_curve.png'):
    display(Image('results/roc_curve.png'))


if os.path.exists('results/pr_curve.png'):
    display(Image('results/pr_curve.png'))

if os.path.exists('results/New Model/roc_curve.png'):
    display(Image('results/New Model/roc_curve.png'))


if os.path.exists('results/New Model/pr_curve.png'):
    display(Image('results/New Model/pr_curve.png'))

# Print evaluation results
if os.path.exists('results/New Model/evaluation_summary.txt'):
    !cat results/evaluation_summary.txt

In [None]:
# Backup checkpoints and results to Drive
DRIVE_DATA_PATH = '/content/drive/MyDrive/Colab Notebooks/data_distribution/New Model'
!mkdir -p "{DRIVE_DATA_PATH}/results"
!cp -r checkpoints "{DRIVE_DATA_PATH}/"
!cp -r results "{DRIVE_DATA_PATH}/"

print("Results saved to Google Drive")