# Phase 2: Feature Extraction with TimeSformer
## Weakly Supervised Video Anomaly Detection using TimeSformer and MIL

This notebook implements Phase 2: Extracting 768-dimensional features from video frames using a pretrained **TimeSformer** model.

### Pipeline Overview:
1. **GPU Setup & Verification** - Ensure CUDA is available and monitor GPU usage
2. **Load Phase 1 Output** - Load extracted frames metadata
3. **TimeSformer Model** - Load pretrained TimeSformer for feature extraction
4. **Feature Extraction** - Extract [CLS] token features (768-dim) for each video
5. **Save Features** - Store features for Phase 3 (MIL Training)

### Key Concepts:
- **TimeSformer**: Transformer-based video understanding model with divided space-time attention
- **[CLS] Token**: 768-dimensional feature vector representing the entire video
- **Batch Processing**: Process videos in batches for memory efficiency

### Expected Input:
- Extracted frames from Phase 1 (32 frames per video, 224√ó224 resolution)

### Expected Output:
- Feature vectors: (N_videos, 768) for each video
- Saved as `.npy` files for efficient loading in Phase 3

## Section 1: Environment Setup & GPU Verification

First, let's verify that PyTorch can access the GPU and set up monitoring.

In [1]:
"""
Phase 2: Feature Extraction with TimeSformer
Environment Setup and GPU Verification
"""

import os
import sys
import torch
import numpy as np
from pathlib import Path
from typing import List, Dict, Optional, Tuple
import json
import logging
from tqdm.notebook import tqdm
import time
from datetime import datetime
import warnings
warnings.filterwarnings('ignore')

# ==================== GPU Verification ====================
def check_gpu_status():
    """
    Comprehensive GPU status check for PyTorch.
    Returns detailed information about GPU availability and properties.
    """
    print("\n" + "="*70)
    print("GPU STATUS CHECK")
    print("="*70)
    
    gpu_info = {
        'cuda_available': torch.cuda.is_available(),
        'device_count': 0,
        'current_device': None,
        'device_name': None,
        'device_capability': None,
        'total_memory_gb': 0,
        'allocated_memory_gb': 0,
        'cached_memory_gb': 0,
        'free_memory_gb': 0,
        'cudnn_available': torch.backends.cudnn.is_available(),
        'cudnn_version': None,
        'pytorch_version': torch.__version__
    }
    
    print(f"\nüì¶ PyTorch Version: {torch.__version__}")
    print(f"üîß CUDA Available: {torch.cuda.is_available()}")
    
    if torch.cuda.is_available():
        gpu_info['device_count'] = torch.cuda.device_count()
        gpu_info['current_device'] = torch.cuda.current_device()
        gpu_info['device_name'] = torch.cuda.get_device_name(0)
        gpu_info['device_capability'] = torch.cuda.get_device_capability(0)
        
        # Memory info
        total_memory = torch.cuda.get_device_properties(0).total_memory
        allocated_memory = torch.cuda.memory_allocated(0)
        cached_memory = torch.cuda.memory_reserved(0)
        
        gpu_info['total_memory_gb'] = round(total_memory / (1024**3), 2)
        gpu_info['allocated_memory_gb'] = round(allocated_memory / (1024**3), 2)
        gpu_info['cached_memory_gb'] = round(cached_memory / (1024**3), 2)
        gpu_info['free_memory_gb'] = round((total_memory - allocated_memory) / (1024**3), 2)
        
        if torch.backends.cudnn.is_available():
            gpu_info['cudnn_version'] = torch.backends.cudnn.version()
        
        print(f"\nüñ•Ô∏è  GPU Device: {gpu_info['device_name']}")
        print(f"üìä Device Count: {gpu_info['device_count']}")
        print(f"üî¢ CUDA Capability: {gpu_info['device_capability']}")
        print(f"\nüíæ Memory Status:")
        print(f"   Total Memory: {gpu_info['total_memory_gb']} GB")
        print(f"   Allocated: {gpu_info['allocated_memory_gb']} GB")
        print(f"   Cached: {gpu_info['cached_memory_gb']} GB")
        print(f"   Free: {gpu_info['free_memory_gb']} GB")
        print(f"\n‚ö° cuDNN Available: {gpu_info['cudnn_available']}")
        if gpu_info['cudnn_version']:
            print(f"   cuDNN Version: {gpu_info['cudnn_version']}")
        
        print("\n" + "="*70)
        print("‚úÖ GPU IS READY FOR FEATURE EXTRACTION!")
        print("="*70)
    else:
        print("\n" + "="*70)
        print("‚ùå NO GPU AVAILABLE - Will use CPU (MUCH SLOWER!)")
        print("="*70)
        print("\n‚ö†Ô∏è  Tips to enable GPU:")
        print("   1. Install CUDA Toolkit: https://developer.nvidia.com/cuda-downloads")
        print("   2. Install PyTorch with CUDA: pip install torch torchvision --index-url https://download.pytorch.org/whl/cu118")
        print("   3. Verify NVIDIA drivers are installed")
    
    return gpu_info


def get_gpu_memory_usage():
    """Get current GPU memory usage."""
    if torch.cuda.is_available():
        allocated = torch.cuda.memory_allocated(0) / (1024**3)
        cached = torch.cuda.memory_reserved(0) / (1024**3)
        return f"Allocated: {allocated:.2f}GB, Cached: {cached:.2f}GB"
    return "GPU not available"


def clear_gpu_memory():
    """Clear GPU memory cache."""
    if torch.cuda.is_available():
        torch.cuda.empty_cache()
        torch.cuda.synchronize()
        print(f"üßπ GPU memory cleared. Current usage: {get_gpu_memory_usage()}")


# Run GPU check
gpu_info = check_gpu_status()

# Set device
DEVICE = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f"\nüéØ Using device: {DEVICE}")


GPU STATUS CHECK

üì¶ PyTorch Version: 2.7.1+cu118
üîß CUDA Available: True

üñ•Ô∏è  GPU Device: NVIDIA GeForce RTX 3080 Ti
üìä Device Count: 1
üî¢ CUDA Capability: (8, 6)

üíæ Memory Status:
   Total Memory: 12.0 GB
   Allocated: 0.0 GB
   Cached: 0.0 GB
   Free: 12.0 GB

‚ö° cuDNN Available: True
   cuDNN Version: 90100

‚úÖ GPU IS READY FOR FEATURE EXTRACTION!

üéØ Using device: cuda


## Section 2: Configuration

Set up paths and parameters for feature extraction.

In [2]:
"""
Configuration for Phase 2: Feature Extraction
"""

# ==================== Dataset Paths ====================
DATASET_ROOT = r"C:\UCF_video_dataset"
EXTRACTED_FRAMES_PATH = os.path.join(DATASET_ROOT, "Extracted_Frames")
FEATURES_PATH = os.path.join(DATASET_ROOT, "TimeSformer_Features")

# ==================== UCF-Crime Dataset Categories ====================
ANOMALY_CATEGORIES = [
    "Abuse", "Arrest", "Arson", "Assault", "Burglary",
    "Explosion", "Fighting", "RoadAccidents", "Robbery",
    "Shooting", "Shoplifting", "Stealing", "Vandalism"
]
NORMAL_CATEGORY = "Normal"
ALL_CATEGORIES = ANOMALY_CATEGORIES + [NORMAL_CATEGORY]

# ==================== TimeSformer Parameters ====================
# Model configuration
TIMESFORMER_MODEL = "facebook/timesformer-base-finetuned-k400"  # Pretrained on Kinetics-400
FEATURE_DIM = 768  # TimeSformer [CLS] token dimension

# Frame parameters (must match Phase 1)
NUM_FRAMES_PER_VIDEO = 32
FRAME_HEIGHT = 224
FRAME_WIDTH = 224

# ==================== Processing Parameters ====================
BATCH_SIZE = 1  # Process one video at a time to avoid OOM
NUM_WORKERS = 0  # DataLoader workers (0 for Windows compatibility)

# ==================== Output Settings ====================
SAVE_INDIVIDUAL_FEATURES = True  # Save features per video
SAVE_COMBINED_FEATURES = True    # Save all features in one file

# ==================== Logging ====================
logging.basicConfig(
    level=logging.INFO,
    format='%(asctime)s - %(levelname)s - %(message)s'
)
logger = logging.getLogger(__name__)

# Create output directory
os.makedirs(FEATURES_PATH, exist_ok=True)

# Print configuration
print("\n" + "="*70)
print("PHASE 2 CONFIGURATION")
print("="*70)
print(f"\nüìÅ Paths:")
print(f"   Extracted Frames: {EXTRACTED_FRAMES_PATH}")
print(f"   Features Output: {FEATURES_PATH}")
print(f"\nü§ñ Model:")
print(f"   TimeSformer: {TIMESFORMER_MODEL}")
print(f"   Feature Dimension: {FEATURE_DIM}")
print(f"\nüé¨ Video Parameters:")
print(f"   Frames per Video: {NUM_FRAMES_PER_VIDEO}")
print(f"   Frame Resolution: {FRAME_HEIGHT}√ó{FRAME_WIDTH}")
print(f"\n‚öôÔ∏è  Processing:")
print(f"   Batch Size: {BATCH_SIZE}")
print(f"   Device: {DEVICE}")
print("="*70)


PHASE 2 CONFIGURATION

üìÅ Paths:
   Extracted Frames: C:\UCF_video_dataset\Extracted_Frames
   Features Output: C:\UCF_video_dataset\TimeSformer_Features

ü§ñ Model:
   TimeSformer: facebook/timesformer-base-finetuned-k400
   Feature Dimension: 768

üé¨ Video Parameters:
   Frames per Video: 32
   Frame Resolution: 224√ó224

‚öôÔ∏è  Processing:
   Batch Size: 1
   Device: cuda


## Section 3: Install Required Packages

Install TimeSformer and related packages if not already installed.

In [3]:
# Install required packages
# Run this cell only once

import subprocess
import sys

def install_package(package):
    subprocess.check_call([sys.executable, "-m", "pip", "install", "-q", package])

# Check and install required packages
packages_to_install = []

try:
    from transformers import TimesformerModel, AutoImageProcessor
    print("‚úì transformers (with TimeSformer) already installed")
except ImportError:
    packages_to_install.append("transformers")

try:
    from PIL import Image
    print("‚úì Pillow already installed")
except ImportError:
    packages_to_install.append("Pillow")

try:
    import cv2
    print("‚úì OpenCV already installed")
except ImportError:
    packages_to_install.append("opencv-python")

if packages_to_install:
    print(f"\nüì¶ Installing: {packages_to_install}")
    for pkg in packages_to_install:
        print(f"   Installing {pkg}...")
        install_package(pkg)
    print("\n‚úì All packages installed successfully!")
    print("‚ö†Ô∏è  Please restart the kernel and run again.")
else:
    print("\n‚úì All required packages are already installed!")

‚úì Pillow already installed
‚úì OpenCV already installed

üì¶ Installing: ['transformers']
   Installing transformers...

‚úì All packages installed successfully!
‚ö†Ô∏è  Please restart the kernel and run again.


## Section 4: Load TimeSformer Model

Load the pretrained TimeSformer model and image processor.

In [4]:
"""
Load TimeSformer Model for Feature Extraction
"""

from transformers import TimesformerModel, AutoImageProcessor
import cv2
from PIL import Image

class TimeSformerFeatureExtractor:
    """
    Feature extractor using pretrained TimeSformer model.
    Extracts 768-dimensional [CLS] token features from video frames.
    """
    
    def __init__(
        self,
        model_name: str = TIMESFORMER_MODEL,
        device: torch.device = DEVICE,
        num_frames: int = NUM_FRAMES_PER_VIDEO
    ):
        """
        Initialize the TimeSformer feature extractor.
        
        Args:
            model_name: HuggingFace model identifier
            device: torch device (cuda/cpu)
            num_frames: Number of frames per video
        """
        self.device = device
        self.num_frames = num_frames
        self.model_name = model_name
        
        print(f"\n{'='*70}")
        print("LOADING TIMESFORMER MODEL")
        print(f"{'='*70}")
        print(f"\nüì• Downloading/Loading: {model_name}")
        print(f"   This may take a few minutes on first run...")
        
        start_time = time.time()
        
        # Load image processor
        self.image_processor = AutoImageProcessor.from_pretrained(model_name)
        print(f"   ‚úì Image processor loaded")
        
        # Load model
        self.model = TimesformerModel.from_pretrained(model_name)
        self.model = self.model.to(device)
        self.model.eval()  # Set to evaluation mode
        print(f"   ‚úì Model loaded and moved to {device}")
        
        elapsed = time.time() - start_time
        
        # Get model info
        total_params = sum(p.numel() for p in self.model.parameters())
        trainable_params = sum(p.numel() for p in self.model.parameters() if p.requires_grad)
        
        print(f"\nüìä Model Statistics:")
        print(f"   Total Parameters: {total_params:,}")
        print(f"   Trainable Parameters: {trainable_params:,}")
        print(f"   Model Size: ~{total_params * 4 / (1024**3):.2f} GB (FP32)")
        print(f"   Load Time: {elapsed:.2f} seconds")
        
        # Check GPU memory after loading
        if torch.cuda.is_available():
            print(f"\nüíæ GPU Memory: {get_gpu_memory_usage()}")
        
        print(f"\n{'='*70}")
        print("‚úÖ TIMESFORMER READY FOR FEATURE EXTRACTION!")
        print(f"{'='*70}")
    
    def load_frames_from_directory(self, video_dir: str) -> Optional[torch.Tensor]:
        """
        Load frames from a directory containing extracted frames.
        
        Args:
            video_dir: Path to directory containing frame images
            
        Returns:
            Tensor of shape (1, num_frames, C, H, W) or None if loading fails
        """
        try:
            # Get sorted list of frame files
            frame_files = sorted([
                f for f in os.listdir(video_dir) 
                if f.endswith(('.jpg', '.jpeg', '.png'))
            ])
            
            if len(frame_files) < self.num_frames:
                logger.warning(f"Not enough frames in {video_dir}: {len(frame_files)} < {self.num_frames}")
                return None
            
            # Select frames (use first num_frames if more available)
            selected_files = frame_files[:self.num_frames]
            
            # Load frames
            frames = []
            for frame_file in selected_files:
                frame_path = os.path.join(video_dir, frame_file)
                frame = Image.open(frame_path).convert('RGB')
                frames.append(frame)
            
            # Process frames using the image processor
            inputs = self.image_processor(frames, return_tensors="pt")
            
            return inputs['pixel_values']  # Shape: (1, num_frames, C, H, W)
            
        except Exception as e:
            logger.error(f"Error loading frames from {video_dir}: {e}")
            return None
    
    @torch.no_grad()
    def extract_features(self, pixel_values: torch.Tensor) -> Optional[np.ndarray]:
        """
        Extract features from video frames using TimeSformer.
        
        Args:
            pixel_values: Tensor of shape (batch, num_frames, C, H, W)
            
        Returns:
            Feature array of shape (batch, 768) - [CLS] token features
        """
        try:
            # Move to device
            pixel_values = pixel_values.to(self.device)
            
            # Forward pass
            outputs = self.model(pixel_values)
            
            # Get [CLS] token features (first token of last hidden state)
            # last_hidden_state shape: (batch, num_patches + 1, hidden_size)
            cls_features = outputs.last_hidden_state[:, 0, :]  # Shape: (batch, 768)
            
            return cls_features.cpu().numpy()
            
        except Exception as e:
            logger.error(f"Error extracting features: {e}")
            return None
    
    def process_video(self, video_dir: str) -> Optional[np.ndarray]:
        """
        Process a single video directory and extract features.
        
        Args:
            video_dir: Path to directory containing video frames
            
        Returns:
            Feature array of shape (768,) or None if processing fails
        """
        # Load frames
        pixel_values = self.load_frames_from_directory(video_dir)
        
        if pixel_values is None:
            return None
        
        # Extract features
        features = self.extract_features(pixel_values)
        
        if features is None:
            return None
        
        return features[0]  # Return first (and only) batch item


# Initialize the feature extractor
print("\nüöÄ Initializing TimeSformer Feature Extractor...")
feature_extractor = TimeSformerFeatureExtractor(
    model_name=TIMESFORMER_MODEL,
    device=DEVICE,
    num_frames=NUM_FRAMES_PER_VIDEO
)


üöÄ Initializing TimeSformer Feature Extractor...

LOADING TIMESFORMER MODEL

üì• Downloading/Loading: facebook/timesformer-base-finetuned-k400
   This may take a few minutes on first run...


preprocessor_config.json:   0%|          | 0.00/412 [00:00<?, ?B/s]

Using a slow image processor as `use_fast` is unset and a slow processor was saved with this model. `use_fast=True` will be the default behavior in v4.52, even if the model was saved with a slow processor. This will result in minor differences in outputs. You'll still be able to use a slow processor with `use_fast=False`.


   ‚úì Image processor loaded


config.json: 0.00B [00:00, ?B/s]

Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


pytorch_model.bin:   0%|          | 0.00/486M [00:00<?, ?B/s]

   ‚úì Model loaded and moved to cuda

üìä Model Statistics:
   Total Parameters: 121,258,752
   Trainable Parameters: 121,258,752
   Model Size: ~0.45 GB (FP32)
   Load Time: 23.19 seconds

üíæ GPU Memory: Allocated: 0.45GB, Cached: 0.51GB

‚úÖ TIMESFORMER READY FOR FEATURE EXTRACTION!


## Section 5: GPU Monitoring During Extraction

Utility functions to monitor GPU usage during feature extraction.

In [5]:
"""
GPU Monitoring Utilities
"""

class GPUMonitor:
    """
    Monitor GPU usage during processing.
    """
    
    def __init__(self):
        self.is_available = torch.cuda.is_available()
        self.measurements = []
        
    def measure(self):
        """Take a measurement of current GPU memory usage."""
        if self.is_available:
            allocated = torch.cuda.memory_allocated(0) / (1024**3)
            cached = torch.cuda.memory_reserved(0) / (1024**3)
            self.measurements.append({
                'timestamp': time.time(),
                'allocated_gb': allocated,
                'cached_gb': cached
            })
            return allocated, cached
        return 0, 0
    
    def get_peak_usage(self):
        """Get peak memory usage."""
        if self.is_available:
            return torch.cuda.max_memory_allocated(0) / (1024**3)
        return 0
    
    def reset_peak(self):
        """Reset peak memory statistics."""
        if self.is_available:
            torch.cuda.reset_peak_memory_stats(0)
    
    def print_status(self):
        """Print current GPU status."""
        if self.is_available:
            allocated, cached = self.measure()
            peak = self.get_peak_usage()
            total = torch.cuda.get_device_properties(0).total_memory / (1024**3)
            print(f"\nüíæ GPU Memory Status:")
            print(f"   Allocated: {allocated:.2f} GB")
            print(f"   Cached: {cached:.2f} GB")
            print(f"   Peak: {peak:.2f} GB")
            print(f"   Total: {total:.2f} GB")
            print(f"   Utilization: {(allocated/total)*100:.1f}%")
        else:
            print("\n‚ö†Ô∏è  GPU not available")


# Test GPU monitoring
gpu_monitor = GPUMonitor()
gpu_monitor.print_status()


üíæ GPU Memory Status:
   Allocated: 0.45 GB
   Cached: 0.51 GB
   Peak: 0.45 GB
   Total: 12.00 GB
   Utilization: 3.8%


## Section 6: Load Phase 1 Metadata

Load the extraction metadata from Phase 1 to get the list of videos to process.

In [6]:
"""
Load Phase 1 Extraction Metadata
"""

def load_phase1_metadata(extracted_frames_path: str = EXTRACTED_FRAMES_PATH) -> Dict:
    """
    Load metadata from Phase 1 frame extraction.
    
    Args:
        extracted_frames_path: Path to extracted frames directory
        
    Returns:
        Dictionary containing video metadata
    """
    print("\n" + "="*70)
    print("LOADING PHASE 1 METADATA")
    print("="*70)
    
    # Check for metadata files
    metadata_path = os.path.join(extracted_frames_path, 'extraction_metadata.json')
    alt_metadata_path = os.path.join(extracted_frames_path, 'dataset_metadata.json')
    
    metadata = None
    
    if os.path.exists(metadata_path):
        with open(metadata_path, 'r') as f:
            metadata = json.load(f)
        print(f"\n‚úì Loaded: {metadata_path}")
    elif os.path.exists(alt_metadata_path):
        with open(alt_metadata_path, 'r') as f:
            metadata = json.load(f)
        print(f"\n‚úì Loaded: {alt_metadata_path}")
    else:
        print(f"\n‚ùå No metadata file found!")
        print(f"   Expected: {metadata_path}")
        print(f"   Or: {alt_metadata_path}")
        return None
    
    # Parse metadata
    videos_info = []
    
    if 'videos' in metadata:
        # GPU extraction format
        for video in metadata['videos']:
            if video.get('status') in ['success', 'skipped']:
                videos_info.append({
                    'video_name': video.get('video_name'),
                    'category': video.get('category'),
                    'is_anomaly': video.get('is_anomaly', video.get('category') != NORMAL_CATEGORY)
                })
    elif 'categories' in metadata:
        # UCFCrimeDatasetProcessor format
        for category, cat_info in metadata['categories'].items():
            for video in cat_info.get('videos', []):
                if video.get('status') == 'success':
                    videos_info.append({
                        'video_name': video.get('video_name'),
                        'category': category,
                        'is_anomaly': category != NORMAL_CATEGORY
                    })
    
    # Count by category
    category_counts = {}
    anomaly_count = 0
    normal_count = 0
    
    for video in videos_info:
        cat = video['category']
        category_counts[cat] = category_counts.get(cat, 0) + 1
        if video['is_anomaly']:
            anomaly_count += 1
        else:
            normal_count += 1
    
    print(f"\nüìä Dataset Summary:")
    print(f"   Total Videos: {len(videos_info)}")
    print(f"   Anomaly Videos: {anomaly_count}")
    print(f"   Normal Videos: {normal_count}")
    
    print(f"\nüìÅ Per-Category Counts:")
    for cat in sorted(category_counts.keys()):
        label = "ANOMALY" if cat != NORMAL_CATEGORY else "NORMAL"
        print(f"   {cat:20s} [{label:8s}]: {category_counts[cat]:4d} videos")
    
    print("="*70)
    
    return {
        'videos': videos_info,
        'total_videos': len(videos_info),
        'anomaly_count': anomaly_count,
        'normal_count': normal_count,
        'category_counts': category_counts
    }


# Load metadata
phase1_metadata = load_phase1_metadata()


LOADING PHASE 1 METADATA

‚úì Loaded: C:\UCF_video_dataset\Extracted_Frames\extraction_metadata.json

üìä Dataset Summary:
   Total Videos: 1900
   Anomaly Videos: 950
   Normal Videos: 950

üìÅ Per-Category Counts:
   Abuse                [ANOMALY ]:   50 videos
   Arrest               [ANOMALY ]:   50 videos
   Arson                [ANOMALY ]:   50 videos
   Assault              [ANOMALY ]:   50 videos
   Burglary             [ANOMALY ]:  100 videos
   Explosion            [ANOMALY ]:   50 videos
   Fighting             [ANOMALY ]:   50 videos
   Normal               [NORMAL  ]:  950 videos
   RoadAccidents        [ANOMALY ]:  150 videos
   Robbery              [ANOMALY ]:  150 videos
   Shooting             [ANOMALY ]:   50 videos
   Shoplifting          [ANOMALY ]:   50 videos
   Stealing             [ANOMALY ]:  100 videos
   Vandalism            [ANOMALY ]:   50 videos


## Section 7: Test Feature Extraction on Single Video

Test the feature extraction pipeline on a single video before processing the entire dataset.

In [7]:
"""
Test Feature Extraction on Single Video
"""

def test_single_video_extraction(feature_extractor, metadata):
    """
    Test feature extraction on a single video.
    
    Args:
        feature_extractor: TimeSformerFeatureExtractor instance
        metadata: Phase 1 metadata dictionary
    """
    print("\n" + "="*70)
    print("TESTING FEATURE EXTRACTION ON SINGLE VIDEO")
    print("="*70)
    
    if metadata is None or len(metadata['videos']) == 0:
        print("\n‚ùå No videos available for testing!")
        return None
    
    # Get first video
    test_video = metadata['videos'][0]
    video_name = test_video['video_name']
    category = test_video['category']
    
    video_dir = os.path.join(EXTRACTED_FRAMES_PATH, category, video_name)
    
    print(f"\nüé¨ Test Video: {video_name}")
    print(f"   Category: {category}")
    print(f"   Path: {video_dir}")
    
    # Check if directory exists
    if not os.path.exists(video_dir):
        print(f"\n‚ùå Video directory not found: {video_dir}")
        return None
    
    # Count frames
    frame_count = len([f for f in os.listdir(video_dir) if f.endswith('.jpg')])
    print(f"   Frames: {frame_count}")
    
    # Extract features
    print(f"\n‚è≥ Extracting features...")
    
    # Monitor GPU before
    gpu_monitor.reset_peak()
    start_time = time.time()
    
    features = feature_extractor.process_video(video_dir)
    
    elapsed = time.time() - start_time
    
    if features is not None:
        print(f"\n‚úÖ Feature Extraction Successful!")
        print(f"   Feature Shape: {features.shape}")
        print(f"   Feature Dtype: {features.dtype}")
        print(f"   Feature Range: [{features.min():.4f}, {features.max():.4f}]")
        print(f"   Feature Mean: {features.mean():.4f}")
        print(f"   Feature Std: {features.std():.4f}")
        print(f"   Extraction Time: {elapsed:.2f} seconds")
        
        # GPU usage
        if torch.cuda.is_available():
            print(f"\nüíæ GPU Peak Memory: {gpu_monitor.get_peak_usage():.2f} GB")
        
        print("\n" + "="*70)
        return features
    else:
        print(f"\n‚ùå Feature extraction failed!")
        return None


# Run test
test_features = test_single_video_extraction(feature_extractor, phase1_metadata)


TESTING FEATURE EXTRACTION ON SINGLE VIDEO

üé¨ Test Video: Abuse001_x264
   Category: Abuse
   Path: C:\UCF_video_dataset\Extracted_Frames\Abuse\Abuse001_x264
   Frames: 32

‚è≥ Extracting features...


Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


model.safetensors:   0%|          | 0.00/486M [00:00<?, ?B/s]


‚úÖ Feature Extraction Successful!
   Feature Shape: (768,)
   Feature Dtype: float32
   Feature Range: [-3.1207, 3.2564]
   Feature Mean: -0.0141
   Feature Std: 0.9689
   Extraction Time: 3.82 seconds

üíæ GPU Peak Memory: 1.10 GB



## Section 8: Full Dataset Feature Extraction

Extract features from all videos in the dataset.

In [8]:
"""
Full Dataset Feature Extraction
"""

def extract_all_features(
    feature_extractor: TimeSformerFeatureExtractor,
    metadata: Dict,
    extracted_frames_path: str = EXTRACTED_FRAMES_PATH,
    output_path: str = FEATURES_PATH,
    save_individual: bool = True,
    resume: bool = True
) -> Dict:
    """
    Extract features from all videos in the dataset.
    
    Args:
        feature_extractor: TimeSformerFeatureExtractor instance
        metadata: Phase 1 metadata dictionary
        extracted_frames_path: Path to extracted frames
        output_path: Path to save features
        save_individual: Save individual video features
        resume: Skip already processed videos
        
    Returns:
        Dictionary containing extraction results and all features
    """
    print("\n" + "="*70)
    print("FULL DATASET FEATURE EXTRACTION")
    print("="*70)
    
    if metadata is None:
        print("\n‚ùå No metadata available!")
        return None
    
    videos = metadata['videos']
    total_videos = len(videos)
    
    print(f"\nüìä Videos to Process: {total_videos}")
    print(f"   Resume Mode: {resume}")
    print(f"   Save Individual: {save_individual}")
    print(f"   Output Path: {output_path}")
    
    # Create output directories
    os.makedirs(output_path, exist_ok=True)
    for category in ALL_CATEGORIES:
        os.makedirs(os.path.join(output_path, category), exist_ok=True)
    
    # Estimate time
    est_time_per_video = 2.0 if torch.cuda.is_available() else 10.0
    est_total_time = total_videos * est_time_per_video
    print(f"\n‚è±Ô∏è  Estimated Time: {est_total_time/60:.1f} minutes ({est_total_time/3600:.2f} hours)")
    
    # Initialize tracking
    results = {
        'successful': 0,
        'failed': 0,
        'skipped': 0,
        'videos': [],
        'features': {},
        'labels': {},
        'processing_times': []
    }
    
    all_features = []
    all_labels = []
    all_video_names = []
    
    # Reset GPU peak stats
    gpu_monitor.reset_peak()
    
    start_time = time.time()
    
    print("\nüöÄ Starting extraction...\n")
    
    # Process each video
    pbar = tqdm(videos, desc="Extracting features", unit="video")
    
    for video_info in pbar:
        video_name = video_info['video_name']
        category = video_info['category']
        is_anomaly = video_info['is_anomaly']
        
        video_dir = os.path.join(extracted_frames_path, category, video_name)
        feature_path = os.path.join(output_path, category, f"{video_name}.npy")
        
        # Check if already processed
        if resume and os.path.exists(feature_path):
            try:
                # Load existing features
                features = np.load(feature_path)
                if features.shape == (FEATURE_DIM,):
                    results['skipped'] += 1
                    all_features.append(features)
                    all_labels.append(1 if is_anomaly else 0)
                    all_video_names.append(video_name)
                    pbar.set_postfix({'success': results['successful'], 
                                     'failed': results['failed'], 
                                     'skipped': results['skipped']})
                    continue
            except:
                pass  # Re-process if loading fails
        
        # Check if video directory exists
        if not os.path.exists(video_dir):
            results['failed'] += 1
            results['videos'].append({
                'video_name': video_name,
                'category': category,
                'status': 'failed',
                'error': 'Video directory not found'
            })
            continue
        
        # Extract features
        video_start = time.time()
        features = feature_extractor.process_video(video_dir)
        video_elapsed = time.time() - video_start
        
        if features is not None:
            results['successful'] += 1
            results['processing_times'].append(video_elapsed)
            
            # Save individual feature file
            if save_individual:
                np.save(feature_path, features)
            
            # Store for combined output
            all_features.append(features)
            all_labels.append(1 if is_anomaly else 0)
            all_video_names.append(video_name)
            
            results['videos'].append({
                'video_name': video_name,
                'category': category,
                'is_anomaly': is_anomaly,
                'feature_path': feature_path,
                'status': 'success',
                'processing_time': video_elapsed
            })
        else:
            results['failed'] += 1
            results['videos'].append({
                'video_name': video_name,
                'category': category,
                'status': 'failed',
                'error': 'Feature extraction failed'
            })
        
        # Update progress bar
        pbar.set_postfix({'success': results['successful'], 
                         'failed': results['failed'], 
                         'skipped': results['skipped']})
        
        # Periodic GPU memory cleanup
        if results['successful'] % 50 == 0 and torch.cuda.is_available():
            torch.cuda.empty_cache()
    
    pbar.close()
    
    total_elapsed = time.time() - start_time
    
    # Save combined features
    if len(all_features) > 0:
        all_features_array = np.stack(all_features, axis=0)
        all_labels_array = np.array(all_labels)
        
        # Save combined files
        np.save(os.path.join(output_path, 'all_features.npy'), all_features_array)
        np.save(os.path.join(output_path, 'all_labels.npy'), all_labels_array)
        
        # Save video names mapping
        with open(os.path.join(output_path, 'video_names.json'), 'w') as f:
            json.dump(all_video_names, f, indent=2)
        
        results['features_shape'] = all_features_array.shape
        results['labels_shape'] = all_labels_array.shape
    
    # Save extraction metadata
    extraction_metadata = {
        'total_videos': total_videos,
        'successful': results['successful'],
        'failed': results['failed'],
        'skipped': results['skipped'],
        'feature_dim': FEATURE_DIM,
        'model': TIMESFORMER_MODEL,
        'processing_time_seconds': total_elapsed,
        'processing_time_minutes': total_elapsed / 60,
        'avg_time_per_video': np.mean(results['processing_times']) if results['processing_times'] else 0,
        'device': str(DEVICE),
        'timestamp': datetime.now().isoformat(),
        'videos': results['videos']
    }
    
    with open(os.path.join(output_path, 'extraction_metadata.json'), 'w') as f:
        json.dump(extraction_metadata, f, indent=2)
    
    # Print summary
    print("\n" + "="*70)
    print("FEATURE EXTRACTION COMPLETE!")
    print("="*70)
    print(f"\nüìä Results:")
    print(f"   Total Videos: {total_videos}")
    print(f"   Successful: {results['successful']}")
    print(f"   Failed: {results['failed']}")
    print(f"   Skipped (resumed): {results['skipped']}")
    print(f"\n‚è±Ô∏è  Time:")
    print(f"   Total: {total_elapsed/60:.2f} minutes")
    if results['processing_times']:
        print(f"   Average per video: {np.mean(results['processing_times']):.2f} seconds")
    print(f"\nüíæ Output:")
    print(f"   Features Shape: {results.get('features_shape', 'N/A')}")
    print(f"   Labels Shape: {results.get('labels_shape', 'N/A')}")
    print(f"   Output Path: {output_path}")
    
    # GPU stats
    if torch.cuda.is_available():
        print(f"\nüñ•Ô∏è  GPU Peak Memory: {gpu_monitor.get_peak_usage():.2f} GB")
    
    print("="*70)
    
    return results


print("‚úì Feature extraction function defined")
print("\n‚ö†Ô∏è  Run the next cell to start full dataset extraction.")

‚úì Feature extraction function defined

‚ö†Ô∏è  Run the next cell to start full dataset extraction.


In [9]:
# Run full dataset feature extraction
# This will take approximately 1-2 hours depending on GPU

extraction_results = extract_all_features(
    feature_extractor=feature_extractor,
    metadata=phase1_metadata,
    extracted_frames_path=EXTRACTED_FRAMES_PATH,
    output_path=FEATURES_PATH,
    save_individual=True,  # Save individual .npy files per video
    resume=True            # Skip already processed videos
)


FULL DATASET FEATURE EXTRACTION

üìä Videos to Process: 1900
   Resume Mode: True
   Save Individual: True
   Output Path: C:\UCF_video_dataset\TimeSformer_Features

‚è±Ô∏è  Estimated Time: 63.3 minutes (1.06 hours)

üöÄ Starting extraction...



Extracting features:   0%|          | 0/1900 [00:00<?, ?video/s]


FEATURE EXTRACTION COMPLETE!

üìä Results:
   Total Videos: 1900
   Successful: 1900
   Failed: 0
   Skipped (resumed): 0

‚è±Ô∏è  Time:
   Total: 11.58 minutes
   Average per video: 0.36 seconds

üíæ Output:
   Features Shape: (1900, 768)
   Labels Shape: (1900,)
   Output Path: C:\UCF_video_dataset\TimeSformer_Features

üñ•Ô∏è  GPU Peak Memory: 1.10 GB


## Section 9: Verify Extracted Features

Verify that all features were extracted correctly.

In [10]:
"""
Verify Extracted Features
"""

def verify_features(features_path: str = FEATURES_PATH):
    """
    Verify that all features were extracted correctly.
    
    Args:
        features_path: Path to features directory
    """
    print("\n" + "="*70)
    print("FEATURE VERIFICATION REPORT")
    print("="*70)
    
    # Check combined files
    combined_features_path = os.path.join(features_path, 'all_features.npy')
    combined_labels_path = os.path.join(features_path, 'all_labels.npy')
    video_names_path = os.path.join(features_path, 'video_names.json')
    
    if os.path.exists(combined_features_path):
        features = np.load(combined_features_path)
        labels = np.load(combined_labels_path)
        
        with open(video_names_path, 'r') as f:
            video_names = json.load(f)
        
        print(f"\n‚úì Combined Features File Found")
        print(f"\nüìä Feature Statistics:")
        print(f"   Shape: {features.shape}")
        print(f"   Dtype: {features.dtype}")
        print(f"   Min: {features.min():.4f}")
        print(f"   Max: {features.max():.4f}")
        print(f"   Mean: {features.mean():.4f}")
        print(f"   Std: {features.std():.4f}")
        
        print(f"\nüìä Label Statistics:")
        print(f"   Shape: {labels.shape}")
        print(f"   Anomaly (1): {np.sum(labels == 1)}")
        print(f"   Normal (0): {np.sum(labels == 0)}")
        
        print(f"\nüìÅ Total Videos: {len(video_names)}")
        
        # Check for NaN or Inf
        nan_count = np.sum(np.isnan(features))
        inf_count = np.sum(np.isinf(features))
        
        if nan_count > 0 or inf_count > 0:
            print(f"\n‚ö†Ô∏è  Data Issues:")
            print(f"   NaN values: {nan_count}")
            print(f"   Inf values: {inf_count}")
        else:
            print(f"\n‚úì No NaN or Inf values detected")
    else:
        print(f"\n‚ùå Combined features file not found: {combined_features_path}")
    
    # Count individual feature files
    print(f"\nüìÅ Per-Category Feature Files:")
    total_files = 0
    for category in ALL_CATEGORIES:
        category_path = os.path.join(features_path, category)
        if os.path.exists(category_path):
            files = [f for f in os.listdir(category_path) if f.endswith('.npy')]
            total_files += len(files)
            label = "ANOMALY" if category != NORMAL_CATEGORY else "NORMAL"
            print(f"   {category:20s} [{label:8s}]: {len(files):4d} files")
    
    print(f"\n   Total Feature Files: {total_files}")
    print("="*70)


# Run verification
verify_features()


FEATURE VERIFICATION REPORT

‚úì Combined Features File Found

üìä Feature Statistics:
   Shape: (1900, 768)
   Dtype: float32
   Min: -6.4888
   Max: 6.1442
   Mean: -0.0168
   Std: 0.9737

üìä Label Statistics:
   Shape: (1900,)
   Anomaly (1): 950
   Normal (0): 950

üìÅ Total Videos: 1900

‚úì No NaN or Inf values detected

üìÅ Per-Category Feature Files:
   Abuse                [ANOMALY ]:   50 files
   Arrest               [ANOMALY ]:   50 files
   Arson                [ANOMALY ]:   50 files
   Assault              [ANOMALY ]:   50 files
   Burglary             [ANOMALY ]:  100 files
   Explosion            [ANOMALY ]:   50 files
   Fighting             [ANOMALY ]:   50 files
   RoadAccidents        [ANOMALY ]:  150 files
   Robbery              [ANOMALY ]:  150 files
   Shooting             [ANOMALY ]:   50 files
   Shoplifting          [ANOMALY ]:   50 files
   Stealing             [ANOMALY ]:  100 files
   Vandalism            [ANOMALY ]:   50 files
   Normal             

## Section 10: Summary and Next Steps

Summary of Phase 2 and preparation for Phase 3.

In [11]:
print("\n" + "="*70)
print("PHASE 2 COMPLETE: FEATURE EXTRACTION")
print("="*70)
print("""
‚úÖ What was accomplished:
   1. Loaded pretrained TimeSformer model (facebook/timesformer-base-finetuned-k400)
   2. Extracted 768-dimensional [CLS] token features from each video
   3. Saved features as individual .npy files and combined arrays
   4. Generated metadata for tracking

üìÅ Output Structure:
   TimeSformer_Features/
   ‚îú‚îÄ‚îÄ Abuse/
   ‚îÇ   ‚îú‚îÄ‚îÄ video_001.npy  (768-dim feature vector)
   ‚îÇ   ‚îî‚îÄ‚îÄ ...
   ‚îú‚îÄ‚îÄ Normal/
   ‚îÇ   ‚îî‚îÄ‚îÄ ...
   ‚îú‚îÄ‚îÄ all_features.npy   (N_videos √ó 768)
   ‚îú‚îÄ‚îÄ all_labels.npy     (N_videos,) - 1=anomaly, 0=normal
   ‚îú‚îÄ‚îÄ video_names.json   (List of video names)
   ‚îî‚îÄ‚îÄ extraction_metadata.json

üöÄ Next Steps (Phase 3):
   1. Load extracted features
   2. Implement MIL (Multiple Instance Learning) network
   3. Train with:
      - Ranking Loss (anomaly > normal scores)
      - Focal Loss (handle class imbalance)
      - Temporal Smoothness Loss (consistent predictions)
   4. Evaluate on test set
""")
print("="*70)

# Final GPU status
if torch.cuda.is_available():
    gpu_monitor.print_status()
    clear_gpu_memory()


PHASE 2 COMPLETE: FEATURE EXTRACTION

‚úÖ What was accomplished:
   1. Loaded pretrained TimeSformer model (facebook/timesformer-base-finetuned-k400)
   2. Extracted 768-dimensional [CLS] token features from each video
   3. Saved features as individual .npy files and combined arrays
   4. Generated metadata for tracking

üìÅ Output Structure:
   TimeSformer_Features/
   ‚îú‚îÄ‚îÄ Abuse/
   ‚îÇ   ‚îú‚îÄ‚îÄ video_001.npy  (768-dim feature vector)
   ‚îÇ   ‚îî‚îÄ‚îÄ ...
   ‚îú‚îÄ‚îÄ Normal/
   ‚îÇ   ‚îî‚îÄ‚îÄ ...
   ‚îú‚îÄ‚îÄ all_features.npy   (N_videos √ó 768)
   ‚îú‚îÄ‚îÄ all_labels.npy     (N_videos,) - 1=anomaly, 0=normal
   ‚îú‚îÄ‚îÄ video_names.json   (List of video names)
   ‚îî‚îÄ‚îÄ extraction_metadata.json

üöÄ Next Steps (Phase 3):
   1. Load extracted features
   2. Implement MIL (Multiple Instance Learning) network
   3. Train with:
      - Ranking Loss (anomaly > normal scores)
      - Focal Loss (handle class imbalance)
      - Temporal Smoothness Loss (consistent pred