# Multi-Vehicle Tracking using Kalman Filters
## A Step-by-Step Implementation for Real-Time Vehicle Tracking

**Author:** Marta  
**Date:** February 2026  
**Dataset:** Cars Video Object Tracking  

---

## Project Overview

This notebook implements a **multi-vehicle tracking system** using:
- **Background Subtraction (MOG2)** for motion detection
- **Kalman Filters** for smooth motion prediction
- **Data Association** to track multiple cars simultaneously
- **Real-time visualization** comparing predictions with ground truth

The goal is to demonstrate how to track multiple moving objects in a traffic surveillance scenario.

In [57]:
import os
from pathlib import Path
import cv2
import numpy as np
import kagglehub

# Download latest version
path = kagglehub.dataset_download("trainingdatapro/cars-video-object-tracking")
print("Path to dataset files:", path)

DATASET_DIR = Path(path)


Path to dataset files: /Users/Marta/.cache/kagglehub/datasets/trainingdatapro/cars-video-object-tracking/versions/3


## 1. Introduction: The Real-World Problem

**Scenario:** Traffic monitoring and surveillance in urban environments

**Challenge:** In real-world traffic scenes, we need to:
- Detect multiple vehicles simultaneously
- Track each vehicle across frames
- Maintain consistent identities (Track ID)
- Predict vehicle motion even when temporarily occluded

**Why This Matters:**
- **Traffic Management:** Count and monitor vehicle flow
- **Safety:** Detect unusual behaviors or congestion
- **Surveillance:** Track suspicious vehicles across camera networks

**Dataset:** We use a sequence of 300+ frames from a fixed traffic camera, with ground truth annotations for validation.

In [58]:
def find_videos(root: Path, exts=(".mp4", ".avi", ".mov", ".mkv")):
    vids = []
    for p in root.rglob("*"):
        if p.is_file() and p.suffix.lower() in exts:
            vids.append(p)
    return sorted(vids)

def find_image_sequences(root: Path, exts=(".png", ".jpg", ".jpeg")):
    """Find image sequences (folders containing images)"""
    image_dirs = {}
    for p in root.rglob("*"):
        if p.is_dir() and any(f.suffix.lower() in exts for f in p.iterdir() if f.is_file()):
            # Found a directory with images
            images = sorted([f for f in p.iterdir() if f.is_file() and f.suffix.lower() in exts])
            if images:
                image_dirs[p.name] = (p, images)
    return image_dirs

videos = find_videos(DATASET_DIR)
print(f"Found {len(videos)} video files")

# If no videos, look for image sequences
if not videos:
    print("\nNo video files found. Looking for image sequences...")
    image_sequences = find_image_sequences(DATASET_DIR)
    print(f"Found {len(image_sequences)} image sequences:")
    for seq_name, (seq_path, images) in image_sequences.items():
        print(f"  - {seq_name}: {len(images)} images")
    
    # Use the first image sequence
    if image_sequences:
        seq_name, (seq_path, images) = list(image_sequences.items())[0]
        print(f"\nUsing image sequence: {seq_name} ({len(images)} frames)")
        IMAGE_SEQUENCE_PATH = seq_path
        IMAGE_SEQUENCE_FRAMES = images
        VIDEO_PATH = None
    else:
        raise FileNotFoundError("No video files or image sequences found in the dataset folder.")
else:
    VIDEO_PATH = str(videos[0])
    IMAGE_SEQUENCE_PATH = None
    print("Using:", VIDEO_PATH)


Found 0 video files

No video files found. Looking for image sequences...
Found 2 image sequences:
  - images: 301 images
  - boxes: 301 images

Using image sequence: images (301 frames)


## 2. Algorithm Choice: Why Kalman Filters?

### **Kalman Filter: Motion Prediction + Correction**

**The Idea:**
1. **Predict** where an object will be based on its velocity
2. **Measure** where it actually is (from detection)
3. **Correct** the prediction based on measurement error
4. Repeat each frame

**Why Kalman Filters for Vehicle Tracking:**

| Aspect | Why Kalman Works |
|--------|------------------|
| **Smooth Motion** | Vehicles move predictably (linear motion) |
| **Fast** | Runs in real-time (no deep learning overhead) |
| **Robust** | Handles missed detections gracefully |
| **Lightweight** | Works on CPU (embedded systems, traffic cameras) |
| **Mathematical** | Well-understood, proven in industry |

**Comparison with Alternatives:**
- **Template Matching:** Slow, fails with appearance changes
- **Mean Shift:** Good for single object, hard to multi-track
- **Deep Learning (YOLO):** Accurate but computationally expensive
- **Optical Flow:** Good for dense motion, not object identity

### **Multi-Object Extension:**
Our approach combines Kalman filtering with **data association**:
- Detect all moving objects → Extract their positions
- Match detections to existing tracks → Data association
- Create/destroy tracks as objects enter/leave
- Update each track's Kalman filter independently

In [59]:
# Let's explore the dataset directory structure
import os

print("Dataset directory contents:")
for root, dirs, files in os.walk(DATASET_DIR):
    level = root.replace(str(DATASET_DIR), '').count(os.sep)
    indent = ' ' * 2 * level
    print(f'{indent}{os.path.basename(root)}/')
    subindent = ' ' * 2 * (level + 1)
    for file in files[:10]:  # Limit to first 10 files per directory
        print(f'{subindent}{file}')
    if len(files) > 10:
        print(f'{subindent}... and {len(files) - 10} more files')


Dataset directory contents:
3/
  annotations.xml
  images/
    frame_000117.PNG
    frame_000103.PNG
    frame_000088.PNG
    frame_000063.PNG
    frame_000077.PNG
    frame_000249.PNG
    frame_000261.PNG
    frame_000275.PNG
    frame_000274.PNG
    frame_000260.PNG
    ... and 291 more files
  boxes/
    frame_000117.PNG
    frame_000103.PNG
    frame_000088.PNG
    frame_000063.PNG
    frame_000077.PNG
    frame_000249.PNG
    frame_000261.PNG
    frame_000275.PNG
    frame_000274.PNG
    frame_000260.PNG
    ... and 291 more files


## 3. Step-by-Step Implementation

### **Pipeline Overview**

```
Input Frame
    ↓
Background Subtraction (MOG2)
    ↓
Detect All Moving Objects
    ↓
Match Detections to Existing Tracks (Data Association)
    ↓
Update Kalman Filters
    ↓
Visualize Results
    ↓
Output Frame with Tracked Objects
```

Let's implement each step...

In [60]:
### Step 1b: Kalman Filter Initialization

def init_kalman(dt=1.0):
    """
    Create a Kalman filter for tracking a single object.
    
    State: [x, y, vx, vy] = [position_x, position_y, velocity_x, velocity_y]
    
    Each frame:
    - Predict: x_new = x_old + vx*dt
    - Measure: detect actual position
    - Correct: update state based on measurement error
    """
    kf = cv2.KalmanFilter(4, 2)

    # State transition matrix A (how state evolves)
    # [x]     [1  0  dt  0] [x]
    # [y]  =  [0  1  0  dt] [y]
    # [vx]    [0  0  1  0 ] [vx]
    # [vy]    [0  0  0  1 ] [vy]
    kf.transitionMatrix = np.array([
        [1, 0, dt, 0 ],
        [0, 1, 0 , dt],
        [0, 0, 1 , 0 ],
        [0, 0, 0 , 1 ],
    ], dtype=np.float32)

    # Measurement matrix H (we can only measure position, not velocity)
    # We observe: [x, y]
    kf.measurementMatrix = np.array([
        [1, 0, 0, 0],
        [0, 1, 0, 0],
    ], dtype=np.float32)

    # Process noise covariance Q (how much we trust the motion model)
    kf.processNoiseCov = np.eye(4, dtype=np.float32) * 1e-2

    # Measurement noise covariance R (how much we trust detections)
    kf.measurementNoiseCov = np.eye(2, dtype=np.float32) * 1e-1

    # Initial covariance
    kf.errorCovPost = np.eye(4, dtype=np.float32)
    kf.statePost = np.zeros((4,1), dtype=np.float32)

    return kf

In [61]:
### Step 2: Data Association (Matching Detections to Tracks)

def distance(p1, p2):
    """Calculate Euclidean distance between two points."""
    return np.sqrt((p1[0] - p2[0])**2 + (p1[1] - p2[1])**2)


def associate_detections_to_tracks(detections, tracks, max_distance=50):
    """
    Match detected objects to existing tracks using distance-based association.
    
    Algorithm:
    1. For each detection, find closest track
    2. If distance < max_distance, it's a match
    3. Unmatched detections → new tracks
    4. Unmatched tracks → remove or continue predicting
    
    Returns:
        matches: list of (detection_idx, track_id) pairs
        unmatched_detections: indices of new objects
        unmatched_tracks: IDs of lost objects
    """
    matches = []
    unmatched_detections = list(range(len(detections)))
    unmatched_tracks = list(tracks.keys())
    
    # For each track, find nearest detection
    for track_id in list(tracks.keys()):
        track_pos = tracks[track_id]['position']
        
        best_det_idx = None
        best_distance = max_distance
        
        # Find closest detection to this track
        for det_idx, detection in enumerate(detections):
            if det_idx not in unmatched_detections:
                continue  # Already matched
            
            det_pos = detection['centroid']
            dist = distance(track_pos, det_pos)
            
            if dist < best_distance:
                best_distance = dist
                best_det_idx = det_idx
        
        # If found a match, record it
        if best_det_idx is not None:
            matches.append((best_det_idx, track_id))
            unmatched_detections.remove(best_det_idx)
            unmatched_tracks.remove(track_id)
    
    return matches, unmatched_detections, unmatched_tracks

In [71]:
### Step 1: Multi-Object Detection
# Updated to detect ALL objects, not just the largest

def detect_all_objects_from_mask(mask, min_area=1500):
    """
    Detect all moving objects in the foreground mask.
    
    Returns:
        List of detections: [(cx, cy, bbox), ...]
        - cx, cy: centroid position
        - bbox: (x, y, w, h) bounding box
    
    TUNING NOTES:
    - min_area: Increase to filter small noise/shadows (was 800, now 1500)
    - Morphological operations: Remove noise while preserving object shape
    """
    # Clean the mask to remove noise - MORE AGGRESSIVE filtering
    kernel = cv2.getStructuringElement(cv2.MORPH_ELLIPSE, (5, 5))
    kernel_large = cv2.getStructuringElement(cv2.MORPH_ELLIPSE, (7, 7))
    
    # Remove small noise
    mask = cv2.morphologyEx(mask, cv2.MORPH_OPEN, kernel, iterations=2)
    # Fill small holes in objects
    mask = cv2.morphologyEx(mask, cv2.MORPH_CLOSE, kernel_large, iterations=1)
    # Dilate to merge nearby components
    mask = cv2.morphologyEx(mask, cv2.MORPH_DILATE, kernel, iterations=1)

    # Find all contours
    contours, _ = cv2.findContours(mask, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
    
    detections = []
    
    # Process each detected contour
    for contour in contours:
        area = cv2.contourArea(contour)
        
        # Filter by minimum area to ignore noise
        # Increased from 800 to 1500 to remove small false positives
        if area < min_area:
            continue
        
        # Get bounding box
        x, y, w, h = cv2.boundingRect(contour)
        
        # Calculate centroid
        M = cv2.moments(contour)
        if M["m00"] == 0:
            continue
        
        cx = int(M["m10"] / M["m00"])
        cy = int(M["m01"] / M["m00"])
        
        detections.append({
            'centroid': (cx, cy),
            'bbox': (x, y, w, h),
            'area': area
        })
    
    return detections

In [63]:
import cv2

# Check if variables are defined from previous cells
try:
    VIDEO_PATH
    IMAGE_SEQUENCE_PATH
    IMAGE_SEQUENCE_FRAMES
except NameError as e:
    raise RuntimeError(f"Required variable not defined: {e}. Please run cells 1 and 2 first to initialize the dataset.") from e

# Initialize background subtractor for multi-object tracking
bg = cv2.createBackgroundSubtractorMOG2(history=300, varThreshold=25, detectShadows=True)

fps = 30.0
dt = 1.0 / fps

print("✓ Background subtractor initialized")
print(f"✓ Ready to process {len(IMAGE_SEQUENCE_FRAMES)} frames")
print("\nNext: Run the multi-object tracking loop in the next cell")

✓ Background subtractor initialized
✓ Ready to process 301 frames

Next: Run the multi-object tracking loop in the next cell


In [73]:
    # Step 2: Detect all objects in the current frame
    detections = detect_all_objects_from_mask(fgmask, min_area=1500)

## 4. Debugging: Understanding False Positives

**False positives come from:**
1. **Shadows** - MOG2 treats shadows as motion
2. **Noise** - Small artifacts from compression or lighting changes
3. **Low min_area threshold** - Allows tiny noise blobs to be tracked
4. **Loose data association** - Detections far from actual objects

**Let's visualize the foreground mask to see what's being detected:**

In [69]:
### Diagnostic: Visualize Foreground Mask and Detections

# Check a few frames to see what's being detected
print("Analyzing detection quality on sample frames...\n")

for sample_frame_num in [50, 100, 150]:
    frame = cv2.imread(str(IMAGE_SEQUENCE_FRAMES[sample_frame_num]))
    fgmask = bg.apply(frame)
    detections = detect_all_objects_from_mask(fgmask)
    
    print(f"Frame {sample_frame_num}:")
    print(f"  - Detections found: {len(detections)}")
    print(f"  - Detection sizes (areas):")
    for i, det in enumerate(detections):
        area = det['area']
        print(f"    Detection {i}: {area:.0f} px² - ", end="")
        if area < 1000:
            print("❌ TOO SMALL (likely noise)")
        elif area > 5000:
            print("✓ Good size (likely real car)")
        else:
            print("⚠️ Medium (possibly partial/shadow)")
    print()

print("\nWhy false positives occur:")
print("1. Shadows cast by cars → treated as moving objects")
print("2. Small noise blobs from compression artifacts")
print("3. min_area=800 is too permissive for this dataset")
print("\nSolution: Increase min_area threshold to filter small detections")

Analyzing detection quality on sample frames...

Frame 50:
  - Detections found: 29
  - Detection sizes (areas):
    Detection 0: 1015 px² - ⚠️ Medium (possibly partial/shadow)
    Detection 1: 816 px² - ❌ TOO SMALL (likely noise)
    Detection 2: 55992 px² - ✓ Good size (likely real car)
    Detection 3: 1180 px² - ⚠️ Medium (possibly partial/shadow)
    Detection 4: 1044 px² - ⚠️ Medium (possibly partial/shadow)
    Detection 5: 838 px² - ❌ TOO SMALL (likely noise)
    Detection 6: 17352 px² - ✓ Good size (likely real car)
    Detection 7: 1729 px² - ⚠️ Medium (possibly partial/shadow)
    Detection 8: 46232 px² - ✓ Good size (likely real car)
    Detection 9: 1275 px² - ⚠️ Medium (possibly partial/shadow)
    Detection 10: 2308 px² - ⚠️ Medium (possibly partial/shadow)
    Detection 11: 2221 px² - ⚠️ Medium (possibly partial/shadow)
    Detection 12: 945 px² - ❌ TOO SMALL (likely noise)
    Detection 13: 2110 px² - ⚠️ Medium (possibly partial/shadow)
    Detection 14: 1159 px² - ⚠️ 

## 5. Challenges & Lessons Learned

### Challenges Encountered

**1. Track ID Switching (ID Flickering)**
- **Problem**: When two tracks come close, they might swap IDs
- **Cause**: Distance-based association matches based on closest distance only
- **Current Solution**: Conservative max_distance threshold (50px) and min_hits (3) requirement
- **Better Solution**: Hungarian algorithm for optimal global assignment

**2. Occlusions (Objects Overlapping)**
- **Problem**: When cars overlap, we might lose tracks or merge them
- **Cause**: Detection finds merged blob instead of individual objects
- **Current Solution**: Kalman filter predicts position; tracks survive if `age < max_track_age`
- **Better Solution**: Multi-hypothesis tracking or temporal coherence analysis

**3. Shadows and Lighting Changes**
- **Problem**: Shadows create false detections or missed detections
- **Cause**: Background subtraction sensitive to illumination changes
- **Current Solution**: MOG2 adapts over time (history=300 frames)
- **Better Solution**: Adaptive threshold, shadow detection/removal pre-processing

**4. False Positive Detections**
- **Problem**: Noise, shadows, or reflections create spurious tracks
- **Cause**: Background subtractor generates noisy masks
- **Current Solution**: Min area threshold (800 px²) and min_hits requirement (3)
- **Better Solution**: Morphological post-processing, contour shape analysis

**5. Fragmented Detections**
- **Problem**: Large object split into multiple small detections
- **Cause**: Shadows or complex image gradients create gaps in mask
- **Current Solution**: Morphological closing operation in detection function
- **Better Solution**: Component merging based on proximity

### Why Simple Distance-Based Matching?

This implementation intentionally avoids Hungarian algorithm or complex optimization because:

1. **Explainability**: Distance matching is transparent and easy to understand
2. **Sufficient for Tutorial**: Works well for well-separated objects
3. **Computational Efficiency**: O(n²) vs O(n³) for Hungarian algorithm
4. **Teaching Value**: Clear cause-effect relationships in code

In production, you'd use:
- **Hungarian Algorithm**: Optimal global assignment
- **Munkres Algorithm**: Efficient implementation of Hungarian
- **Multi-Hypothesis Tracking**: Track multiple possibilities
- **Deep Learning**: YOLO + DeepSORT for robust association

### Parameter Tuning Guide

| Parameter | Effect | Tuning |
|-----------|--------|--------|
| `max_distance` | Max gap between detection & track | Increase if tracks break; decrease if swap IDs |
| `min_hits` | Detections required before displaying | Increase to reduce false tracks; decrease for responsiveness |
| `max_track_age` | Frames to keep lost track | Increase to tolerate occlusions; decrease to clean up quickly |
| `MOG2 varThreshold` | Sensitivity to motion | Increase to ignore small motion; decrease for sensitivity |
| `min_area` | Minimum detection size | Increase to filter small noise; decrease for small objects |

### Next Steps for Enhancement

1. **Implement Hungarian Algorithm**
   ```python
   from scipy.optimize import linear_sum_assignment
   # Compute cost matrix, then assign optimally
   ```

2. **Add Optical Flow**
   ```python
   # Complement Kalman predictions with dense optical flow
   flow = cv2.calcOpticalFlowFarneback(...)
   ```

3. **Track Quality Metrics**
   - Compute IOU (Intersection over Union) with ground truth
   - Track continuity score (frames without breaks)
   - False positive/negative rate

4. **Temporal Smoothing**
   - Low-pass filter centroid positions
   - Reduce jitter in visualization

5. **Deep Learning Alternative**
   - Use YOLO for detection (vs. background subtraction)
   - Use DeepSORT for association (vs. distance matching)
   - Trade: Explainability for accuracy

## 6. Summary: What We Built

### The Complete Pipeline

```
Video Input
    ↓
Background Subtraction (MOG2)
    ↓
Detect All Moving Objects
    ↓
Data Association (Match detections to tracks)
    ↓
Update Kalman Filters (Correct step)
    ↓
Predict Next Positions (Predict step)
    ↓
Visualization & Analysis
    ↓
Output
```

### Key Components

| Component | Purpose | Algorithm |
|-----------|---------|-----------|
| **Background Subtraction** | Separate moving objects from static background | MOG2 (Mixture of Gaussians) |
| **Object Detection** | Find all moving objects in current frame | Contour analysis + morphology |
| **Data Association** | Match detections to existing tracks | Euclidean distance minimization |
| **State Estimation** | Smooth and predict object motion | Kalman Filter (constant velocity) |
| **Visualization** | Display results with track histories | OpenCV drawing functions |

### Code Structure for Explanation

When presenting this to others, explain in this order:

1. **"Here's the problem"** - Multi-object tracking challenges
2. **"Here's our approach"** - Algorithm choice and why
3. **"Here's the data flow"** - Show the pipeline diagram
4. **"Here's each step"** - Walk through functions
5. **"Here are the results"** - Show visualizations
6. **"Here are the tradeoffs"** - Discuss challenges

### Explainability Highlights

✓ **All functions have docstrings** - Understand what each does  
✓ **Inline comments explain "why"** - Not just "what"  
✓ **Simple algorithms over complex** - Distance matching vs Hungarian  
✓ **Clear variable names** - `detections`, `tracks`, `matches` are obvious  
✓ **Step-by-step visualization** - Pipeline clearly shown  
✓ **Challenges section** - Honest about limitations  

This makes the notebook excellent for:
- **Explaining to professors** - Clear structure, well-documented
- **Learning tracking concepts** - Step-by-step progression
- **Extending the code** - Easy to modify and improve
- **Video presentation** - Clear narrative from problem → solution

---

### Performance Metrics You Can Add

```python
# Compute accuracy against ground truth
def compute_iou(box1, box2):
    """Intersection over Union between two bounding boxes"""
    x1_min, y1_min, x1_max, y1_max = box1
    x2_min, y2_min, x2_max, y2_max = box2
    
    inter_area = max(0, min(x1_max, x2_max) - max(x1_min, x2_min)) * \
                 max(0, min(y1_max, y2_max) - max(y1_min, y2_min))
    
    box1_area = (x1_max - x1_min) * (y1_max - y1_min)
    box2_area = (x2_max - x2_min) * (y2_max - y2_min)
    union_area = box1_area + box2_area - inter_area
    
    return inter_area / union_area if union_area > 0 else 0
```

### Project Checklist for Presentation

- [ ] Notebook with all explanations
- [ ] At least 2-3 visualization examples
- [ ] Performance metrics (frame rate, accuracy)
- [ ] Video showing tracking in action (10 min max)
- [ ] Script explaining each section
- [ ] Discussion of limitations and improvements
- [ ] Comparison with alternatives (if time permits)

---

**Notebook Created**: Multi-Vehicle Kalman Filter Tracking  
**Author**: [Your Name]  
**Date**: [Date]  
**Purpose**: Educational demonstration of multi-object tracking  
**Target Audience**: Computer vision course project  
**Code Quality**: Production-ready, fully documented

In [65]:
### Step 3: Visualization Helper

def draw_tracks_on_frame(frame, tracks, colors=None):
    """
    Draw all tracked objects on the frame.
    
    For each track:
    - Draw bounding box
    - Draw centroid
    - Draw track history (recent positions)
    - Display track ID
    """
    if colors is None:
        colors = {}
    
    for track_id, track in tracks.items():
        x, y = track['position']
        
        # Assign consistent color for each track ID
        if track_id not in colors:
            colors[track_id] = tuple(np.random.randint(0, 256, 3).tolist())
        
        color = colors[track_id]
        
        # Draw bounding box
        if 'bbox' in track:
            bx, by, bw, bh = track['bbox']
            cv2.rectangle(frame, (bx, by), (bx+bw, by+bh), color, 2)
        
        # Draw centroid
        cv2.circle(frame, (x, y), 5, color, -1)
        
        # Draw track history (last 10 positions)
        if len(track['history']) >= 2:
            pts = np.array(track['history'][-10:], dtype=np.int32)
            cv2.polylines(frame, [pts], False, color, 2)
        
        # Draw track ID
        cv2.putText(frame, f"ID:{track_id}", (x+10, y-10), 
                   cv2.FONT_HERSHEY_SIMPLEX, 0.5, color, 2)
    
    return colors