In [None]:
"""
==================================================
ML LEARNING JOURNEY - DAY 26
==================================================
Week: 4 of 24
Day: 26 of 168
Date: November 21, 2025
Topic: Testing & Performance Optimization
Overall Progress: 15.5%

Week 4: Detection & Tracking Foundation
‚úÖ Day 22: Project Planning & Architecture (COMPLETED)
‚úÖ Day 23: Multi-Object Tracking (DeepSORT) (COMPLETED)
‚úÖ Day 24: Tracking Optimization (COMPLETED)
‚úÖ Day 25: Video Processing Pipeline (COMPLETED)
üîÑ Day 26: Testing & Performance (TODAY!)
‚¨ú Day 27: Code Cleanup & Modularization
‚¨ú Day 28: Week 4 Review

Progress: 71% (5/7 days)

==================================================
üéØ Week 4 Project: Security System - Detection & Tracking
- Comprehensive testing of tracking system
- Performance benchmarking and optimization
- Accuracy measurement and validation
- Identify and fix bottlenecks
- Document system capabilities and limitations

üéØ Today's Learning Objectives:
1. Build comprehensive test framework
2. Test various scenarios (crowded, sparse, occlusions)
3. Measure tracking accuracy metrics (MOTA, ID switches)
4. Benchmark detection and tracking performance
5. Profile code for bottlenecks
6. Optimize slow sections
7. Compare GPU vs CPU performance
8. Document findings and recommendations

üìö Today's Structure:
   Part 1 (2h): Testing Framework & Scenarios
   Part 2 (2h): Performance Benchmarking
   Part 3 (1.5h): Optimization & Profiling
   Part 4 (1h): Results & Summary

üéØ SUCCESS CRITERIA:
   ‚úÖ Test suite created and passing
   ‚úÖ Various scenarios tested
   ‚úÖ Tracking accuracy measured
   ‚úÖ Performance benchmarks documented
   ‚úÖ Bottlenecks identified
   ‚úÖ Optimization applied
   ‚úÖ FPS targets achieved (25-30 FPS)
   ‚úÖ System validated for production use
==================================================
"""

In [2]:
# ==================================================
# INSTALL REQUIRED LIBRARIES
# ==================================================

import subprocess
import sys

print("üì¶ Installing required libraries...")
print("‚è±Ô∏è  This should be quick (most already installed)...\n")

packages = [
    'ultralytics',
    'deep-sort-realtime',
    'opencv-python',
    'numpy',
    'pandas',
    'matplotlib',
    'tqdm',
    'psutil',  # System monitoring
    'memory-profiler'  # Memory profiling
]

for package in packages:
    print(f"Checking {package}...")
    subprocess.check_call([sys.executable, '-m', 'pip', 'install', package, '-q'])

print("\n‚úÖ All libraries ready!")

print("\n" + "=" * 80)

# ==================================================
# IMPORT LIBRARIES
# ==================================================

print("\n" + "=" * 80)
print("üìö IMPORTING LIBRARIES")
print("=" * 80)

# Standard libraries
import os
import time
import json
from pathlib import Path
from collections import defaultdict, deque
from datetime import datetime
import warnings
warnings.filterwarnings('ignore')

# Data science
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Computer vision
import cv2

# Deep learning
from ultralytics import YOLO

# Tracking
from deep_sort_realtime.deepsort_tracker import DeepSort

# Progress & monitoring
from tqdm import tqdm
import psutil

print("\n‚úÖ All libraries imported successfully!")
print("\nüìä Library versions:")
print(f"   ‚Ä¢ OpenCV: {cv2.__version__}")
print(f"   ‚Ä¢ NumPy: {np.__version__}")
print(f"   ‚Ä¢ Pandas: {pd.__version__}")
print(f"   ‚Ä¢ Matplotlib: {plt.matplotlib.__version__}")
print("   ‚Ä¢ Ultralytics: Installed ‚úì")
print("   ‚Ä¢ DeepSORT: Installed ‚úì")
print("   ‚Ä¢ psutil: Installed ‚úì")

# Set plotting style
plt.style.use('seaborn-v0_8-darkgrid')
sns.set_palette("husl")

print("=" * 80)

üì¶ Installing required libraries...
‚è±Ô∏è  This should be quick (most already installed)...

Checking ultralytics...
Checking deep-sort-realtime...
Checking opencv-python...
Checking numpy...
Checking pandas...
Checking matplotlib...
Checking tqdm...
Checking psutil...
Checking memory-profiler...

‚úÖ All libraries ready!


üìö IMPORTING LIBRARIES

‚úÖ All libraries imported successfully!

üìä Library versions:
   ‚Ä¢ OpenCV: 4.12.0
   ‚Ä¢ NumPy: 2.2.6
   ‚Ä¢ Pandas: 2.3.2
   ‚Ä¢ Matplotlib: 3.10.6
   ‚Ä¢ Ultralytics: Installed ‚úì
   ‚Ä¢ DeepSORT: Installed ‚úì
   ‚Ä¢ psutil: Installed ‚úì


In [3]:
print("\n" + "=" * 80)
print("üìö PART 1: TESTING FRAMEWORK & SCENARIOS")
print("=" * 80)


üìö PART 1: TESTING FRAMEWORK & SCENARIOS


In [4]:
# ==================================================
# EXERCISE 1.1: UNDERSTAND TESTING METHODOLOGY
# ==================================================

print("\n" + "=" * 80)
print("EXERCISE 1.1: Understanding Testing Methodology")
print("=" * 80)

"""
üìñ THEORY: Testing Object Tracking Systems

Why Test?
- Validate functionality
- Measure accuracy
- Identify edge cases
- Ensure reliability
- Performance baselines
- Compare configurations

==================================================

TYPES OF TESTS:

1. FUNCTIONAL TESTS
   ‚Ä¢ Does it work as expected?
   ‚Ä¢ Are objects detected?
   ‚Ä¢ Are tracks created?
   ‚Ä¢ Are IDs persistent?

2. ACCURACY TESTS
   ‚Ä¢ How accurate are detections?
   ‚Ä¢ How well are tracks maintained?
   ‚Ä¢ How many ID switches?
   ‚Ä¢ False positive/negative rates

3. PERFORMANCE TESTS
   ‚Ä¢ How fast does it run?
   ‚Ä¢ FPS on different hardware
   ‚Ä¢ Memory usage
   ‚Ä¢ CPU/GPU utilization

4. STRESS TESTS
   ‚Ä¢ Many objects (crowded)
   ‚Ä¢ Long videos
   ‚Ä¢ High resolution
   ‚Ä¢ Edge cases

==================================================

TRACKING ACCURACY METRICS:

1. MOTA (Multiple Object Tracking Accuracy)
   ‚Ä¢ Overall tracking quality
   ‚Ä¢ Range: -‚àû to 100% (higher better)
   ‚Ä¢ Formula: MOTA = 1 - (FP + FN + IDS) / GT
   ‚Ä¢ FP = False Positives (wrong detections)
   ‚Ä¢ FN = False Negatives (missed people)
   ‚Ä¢ IDS = ID Switches (same person, different ID)
   ‚Ä¢ GT = Ground Truth (actual people)

2. MOTP (Multiple Object Tracking Precision)
   ‚Ä¢ Localization accuracy
   ‚Ä¢ Average overlap between predicted and ground truth
   ‚Ä¢ Higher = better bounding box accuracy

3. ID Switches (IDS)
   ‚Ä¢ Number of times track ID changes for same person
   ‚Ä¢ Lower is better
   ‚Ä¢ Indicates tracking consistency

4. Track Fragmentation
   ‚Ä¢ How often tracks are broken
   ‚Ä¢ Lower is better
   ‚Ä¢ Good tracking maintains IDs

5. False Positives (FP)
   ‚Ä¢ Detected objects that aren't real
   ‚Ä¢ Lower is better
   ‚Ä¢ Indicates detection quality

6. False Negatives (FN)
   ‚Ä¢ Real objects that weren't detected
   ‚Ä¢ Lower is better
   ‚Ä¢ Indicates detection recall

==================================================

TEST SCENARIOS:

Scenario 1: NORMAL CONDITIONS
- 1-5 people in frame
- Good lighting
- Clear visibility
- Standard movement
- Expected: 95%+ accuracy

Scenario 2: CROWDED
- 10+ people in frame
- Overlapping bounding boxes
- Occlusions common
- Expected: 85-90% accuracy

Scenario 3: SPARSE
- 0-2 people in frame
- Minimal occlusions
- Easy tracking
- Expected: 98%+ accuracy

Scenario 4: OCCLUSIONS
- People behind objects
- Temporary disappearance
- Re-identification needed
- Tests: max_age parameter

Scenario 5: FAST MOVEMENT
- People running
- Quick direction changes
- Motion blur
- Tests: Kalman filter prediction

Scenario 6: ENTRY/EXIT
- People entering frame
- People leaving frame
- Track initialization
- Track deletion

Scenario 7: LIGHTING VARIATIONS
- Bright areas
- Dark areas
- Shadows
- Tests: detection robustness

==================================================

PERFORMANCE METRICS:

1. FPS (Frames Per Second)
   ‚Ä¢ How fast processing runs
   ‚Ä¢ Target: 25-30 FPS for real-time
   ‚Ä¢ Measure: total_frames / total_time

2. Processing Time per Frame
   ‚Ä¢ Milliseconds per frame
   ‚Ä¢ Breakdown: detection, tracking, visualization
   ‚Ä¢ Target: <33ms for 30 FPS

3. Memory Usage
   ‚Ä¢ RAM consumption
   ‚Ä¢ GPU memory (if using GPU)
   ‚Ä¢ Monitor for memory leaks

4. CPU/GPU Utilization
   ‚Ä¢ Percentage of resources used
   ‚Ä¢ Identify bottlenecks
   ‚Ä¢ Optimize resource usage

==================================================

TESTING APPROACH:

1. Unit Tests
   ‚Ä¢ Test individual components
   ‚Ä¢ VideoInput, VideoOutput, Tracker
   ‚Ä¢ Verify basic functionality

2. Integration Tests
   ‚Ä¢ Test complete pipeline
   ‚Ä¢ End-to-end processing
   ‚Ä¢ Verify data flow

3. Scenario Tests
   ‚Ä¢ Test specific situations
   ‚Ä¢ Crowded, sparse, occlusions
   ‚Ä¢ Edge cases

4. Performance Tests
   ‚Ä¢ Benchmark speed
   ‚Ä¢ Different resolutions
   ‚Ä¢ GPU vs CPU

5. Regression Tests
   ‚Ä¢ Ensure changes don't break things
   ‚Ä¢ Compare to baselines
   ‚Ä¢ Automated testing

==================================================

BEST PRACTICES:

1. Use Representative Data
   ‚Ä¢ Test on real-world scenarios
   ‚Ä¢ Various conditions
   ‚Ä¢ Different camera angles

2. Establish Baselines
   ‚Ä¢ Record initial performance
   ‚Ä¢ Compare improvements
   ‚Ä¢ Track over time

3. Automate Where Possible
   ‚Ä¢ Automated test scripts
   ‚Ä¢ Continuous testing
   ‚Ä¢ Quick feedback

4. Document Results
   ‚Ä¢ Record metrics
   ‚Ä¢ Note observations
   ‚Ä¢ Track improvements

5. Test Edge Cases
   ‚Ä¢ Unusual situations
   ‚Ä¢ Failure modes
   ‚Ä¢ Recovery behavior
"""

print("""
üìä TESTING PRIORITY MATRIX:

Scenario         | Priority | Frequency | Expected Accuracy
-----------------|----------|-----------|------------------
Normal (1-5 ppl) | High     | 80%       | 95%+
Crowded (10+ ppl)| Medium   | 15%       | 85-90%
Sparse (0-2 ppl) | Medium   | 5%        | 98%+
Occlusions       | High     | Common    | 80-90%
Fast Movement    | Medium   | Moderate  | 85-90%
Entry/Exit       | High     | Constant  | 95%+
Poor Lighting    | Low      | Rare      | 70-80%

Focus testing on:
‚úì Normal conditions (most common)
‚úì Occlusions (challenging)
‚úì Entry/Exit (critical for counting)

Performance Targets:
- FPS: 25-30 (real-time)
- Memory: <2GB RAM
- CPU: <80% utilization
- Latency: <100ms
""")

print("\n‚úÖ Exercise 1.1 Complete!")
print("=" * 80)


EXERCISE 1.1: Understanding Testing Methodology

üìä TESTING PRIORITY MATRIX:

Scenario         | Priority | Frequency | Expected Accuracy
-----------------|----------|-----------|------------------
Normal (1-5 ppl) | High     | 80%       | 95%+
Crowded (10+ ppl)| Medium   | 15%       | 85-90%
Sparse (0-2 ppl) | Medium   | 5%        | 98%+
Occlusions       | High     | Common    | 80-90%
Fast Movement    | Medium   | Moderate  | 85-90%
Entry/Exit       | High     | Constant  | 95%+
Poor Lighting    | Low      | Rare      | 70-80%

Focus testing on:
‚úì Normal conditions (most common)
‚úì Occlusions (challenging)
‚úì Entry/Exit (critical for counting)

Performance Targets:
- FPS: 25-30 (real-time)
- Memory: <2GB RAM
- CPU: <80% utilization
- Latency: <100ms


‚úÖ Exercise 1.1 Complete!


In [5]:
# ==================================================
# EXERCISE 1.2: BUILD PERFORMANCE BENCHMARKING CLASS
# ==================================================

print("\n" + "=" * 80)
print("EXERCISE 1.2: Build Performance Benchmarking Class")
print("=" * 80)

"""
üìñ THEORY: Performance Benchmarking

What to Measure:
- FPS (frames per second)
- Processing time per component
- Memory usage
- CPU/GPU utilization
- Detection count
- Track count

Why Benchmark:
- Identify bottlenecks
- Track improvements
- Compare configurations
- Set baselines
- Validate optimization
"""

class PerformanceBenchmark:
    """
    Performance benchmarking and profiling system
    """
    
    def __init__(self):
        """Initialize benchmark"""
        self.results = {
            'detection_times': [],
            'tracking_times': [],
            'visualization_times': [],
            'total_times': [],
            'memory_usage': [],
            'cpu_usage': [],
            'detection_counts': [],
            'track_counts': []
        }
        
        self.process = psutil.Process()
        
        print("‚úÖ PerformanceBenchmark initialized")
    
    def start_frame(self):
        """Start timing a frame"""
        self.frame_start = time.time()
        self.component_start = time.time()
    
    def mark_detection(self, detection_count):
        """Mark detection phase complete"""
        detection_time = time.time() - self.component_start
        self.results['detection_times'].append(detection_time)
        self.results['detection_counts'].append(detection_count)
        self.component_start = time.time()
    
    def mark_tracking(self, track_count):
        """Mark tracking phase complete"""
        tracking_time = time.time() - self.component_start
        self.results['tracking_times'].append(tracking_time)
        self.results['track_counts'].append(track_count)
        self.component_start = time.time()
    
    def mark_visualization(self):
        """Mark visualization phase complete"""
        viz_time = time.time() - self.component_start
        self.results['visualization_times'].append(viz_time)
    
    def end_frame(self):
        """End timing a frame"""
        total_time = time.time() - self.frame_start
        self.results['total_times'].append(total_time)
        
        # Record system metrics
        mem_info = self.process.memory_info()
        self.results['memory_usage'].append(mem_info.rss / 1024 / 1024)  # MB
        self.results['cpu_usage'].append(self.process.cpu_percent())
    
    def get_summary(self):
        """Get benchmark summary statistics"""
        if not self.results['total_times']:
            return {}
        
        summary = {
            'frames': len(self.results['total_times']),
            'avg_fps': 1 / np.mean(self.results['total_times']),
            'avg_detection_time': np.mean(self.results['detection_times']) * 1000,  # ms
            'avg_tracking_time': np.mean(self.results['tracking_times']) * 1000,  # ms
            'avg_viz_time': np.mean(self.results['visualization_times']) * 1000,  # ms
            'avg_total_time': np.mean(self.results['total_times']) * 1000,  # ms
            'avg_memory_mb': np.mean(self.results['memory_usage']),
            'avg_cpu_percent': np.mean(self.results['cpu_usage']),
            'avg_detections': np.mean(self.results['detection_counts']),
            'avg_tracks': np.mean(self.results['track_counts'])
        }
        
        return summary
    
    def print_summary(self):
        """Print benchmark summary"""
        summary = self.get_summary()
        
        if not summary:
            print("‚ö†Ô∏è  No benchmark data collected")
            return
        
        print("\n" + "=" * 80)
        print("üìä PERFORMANCE BENCHMARK SUMMARY")
        print("=" * 80)
        print(f"Frames processed: {summary['frames']}")
        print(f"\n‚è±Ô∏è  TIMING:")
        print(f"   Average FPS: {summary['avg_fps']:.1f}")
        print(f"   Total time per frame: {summary['avg_total_time']:.1f}ms")
        print(f"   ‚Ä¢ Detection: {summary['avg_detection_time']:.1f}ms ({summary['avg_detection_time']/summary['avg_total_time']*100:.1f}%)")
        print(f"   ‚Ä¢ Tracking: {summary['avg_tracking_time']:.1f}ms ({summary['avg_tracking_time']/summary['avg_total_time']*100:.1f}%)")
        print(f"   ‚Ä¢ Visualization: {summary['avg_viz_time']:.1f}ms ({summary['avg_viz_time']/summary['avg_total_time']*100:.1f}%)")
        print(f"\nüíæ RESOURCES:")
        print(f"   Average memory: {summary['avg_memory_mb']:.1f} MB")
        print(f"   Average CPU: {summary['avg_cpu_percent']:.1f}%")
        print(f"\nüéØ DETECTION/TRACKING:")
        print(f"   Average detections: {summary['avg_detections']:.1f}")
        print(f"   Average tracks: {summary['avg_tracks']:.1f}")
        
        # Performance assessment
        print(f"\nüìà PERFORMANCE ASSESSMENT:")
        if summary['avg_fps'] >= 30:
            print(f"   ‚úÖ EXCELLENT: {summary['avg_fps']:.1f} FPS (30+ target met)")
        elif summary['avg_fps'] >= 25:
            print(f"   ‚úì GOOD: {summary['avg_fps']:.1f} FPS (acceptable for real-time)")
        elif summary['avg_fps'] >= 15:
            print(f"   ‚ö†Ô∏è  MODERATE: {summary['avg_fps']:.1f} FPS (may feel choppy)")
        else:
            print(f"   ‚ùå LOW: {summary['avg_fps']:.1f} FPS (too slow for real-time)")
        
        print("=" * 80)
    
    def plot_results(self):
        """Plot benchmark results"""
        if not self.results['total_times']:
            print("‚ö†Ô∏è  No data to plot")
            return
        
        fig, axes = plt.subplots(2, 2, figsize=(15, 10))
        fig.suptitle('Performance Benchmark Results', fontsize=16, fontweight='bold')
        
        # 1. FPS over time
        fps_data = [1/t for t in self.results['total_times']]
        axes[0, 0].plot(fps_data, linewidth=2)
        axes[0, 0].axhline(y=30, color='g', linestyle='--', label='Target: 30 FPS')
        axes[0, 0].axhline(y=np.mean(fps_data), color='r', linestyle='--', label=f'Average: {np.mean(fps_data):.1f} FPS')
        axes[0, 0].set_title('FPS Over Time', fontweight='bold')
        axes[0, 0].set_xlabel('Frame')
        axes[0, 0].set_ylabel('FPS')
        axes[0, 0].legend()
        axes[0, 0].grid(True, alpha=0.3)
        
        # 2. Component timing breakdown
        components = ['Detection', 'Tracking', 'Visualization']
        times = [
            np.mean(self.results['detection_times']) * 1000,
            np.mean(self.results['tracking_times']) * 1000,
            np.mean(self.results['visualization_times']) * 1000
        ]
        colors = ['#3498db', '#e74c3c', '#2ecc71']
        axes[0, 1].bar(components, times, color=colors, edgecolor='black')
        axes[0, 1].set_title('Average Time per Component', fontweight='bold')
        axes[0, 1].set_ylabel('Time (ms)')
        axes[0, 1].grid(axis='y', alpha=0.3)
        
        # 3. Memory usage
        axes[1, 0].plot(self.results['memory_usage'], linewidth=2, color='purple')
        axes[1, 0].set_title('Memory Usage Over Time', fontweight='bold')
        axes[1, 0].set_xlabel('Frame')
        axes[1, 0].set_ylabel('Memory (MB)')
        axes[1, 0].grid(True, alpha=0.3)
        
        # 4. Detection/Track counts
        axes[1, 1].plot(self.results['detection_counts'], label='Detections', linewidth=2)
        axes[1, 1].plot(self.results['track_counts'], label='Tracks', linewidth=2)
        axes[1, 1].set_title('Detection & Track Counts', fontweight='bold')
        axes[1, 1].set_xlabel('Frame')
        axes[1, 1].set_ylabel('Count')
        axes[1, 1].legend()
        axes[1, 1].grid(True, alpha=0.3)
        
        plt.tight_layout()
        plt.show()

print("‚úÖ Class created: PerformanceBenchmark")
print("\nüìä Features:")
print("   ‚Ä¢ Component-level timing (detection, tracking, viz)")
print("   ‚Ä¢ System resource monitoring (memory, CPU)")
print("   ‚Ä¢ Detection/track counting")
print("   ‚Ä¢ Summary statistics")
print("   ‚Ä¢ Performance assessment")
print("   ‚Ä¢ Visualization plots")

print("\n‚úÖ Exercise 1.2 Complete!")
print("=" * 80)


EXERCISE 1.2: Build Performance Benchmarking Class
‚úÖ Class created: PerformanceBenchmark

üìä Features:
   ‚Ä¢ Component-level timing (detection, tracking, viz)
   ‚Ä¢ System resource monitoring (memory, CPU)
   ‚Ä¢ Detection/track counting
   ‚Ä¢ Summary statistics
   ‚Ä¢ Performance assessment
   ‚Ä¢ Visualization plots

‚úÖ Exercise 1.2 Complete!
