# Drone follow me using Kalman Filters

Multi-Object Tracking (MOT) is a core visual ability that humans poses to perform kinetic tasks and coordinate other tasks. The AI community has recognized the importance of MOT via a series of [competitions](https://motchallenge.net).

In this assignment, the object class is `bicycle` and `car` the ability to track these objects  will be demonstrated using [Kalman Filters](https://en.wikipedia.org/wiki/Kalman_filter).  


## Task 1: Setup your development environment and store the test video locally (10 points)

Your environment must be docker based and you can use any TF2 or PT2 based docker container compatible with your environment. You can also use colab.

# Install Libraries

In [None]:
!pip install pytube opencv-python psycopg2 ultralytics roboflow

# Import Dependencies

In [None]:
from pytube import YouTube
import os
import cv2
from google.colab import drive
from google.colab import files
from google.colab import userdata
# google patch for cv2.imshow
from google.colab.patches import cv2_imshow
import numpy as np
from roboflow import Roboflow
from ultralytics import YOLO

# Download Videos

In [None]:
def download_videos(video_id_list, output_path="."):
    """
    Download videos from YouTube based on a list of video IDs using PyTube API

    Args:
        video_id_list (list): A list of YouTube video IDs to download.
        output_path (str, optional): The directory where the downloaded videos will be saved. Defaults to ./videos
    """
    for video_id in video_id_list:
      try:
        yt = YouTube(f"https://www.youtube.com/watch?v={video_id}")
        video = yt.streams.filter(file_extension='mp4', resolution='360p').first()
        video.download(output_path)
        print(f"Video downloaded successfully: {yt.title}")
      except Exception as e:
        print(f"Error downloading video: {e}")

def preprocess_video_from_file(video_path, timestamps = [], sample_rate=1, target_size=(640, 640)):
    """
    Preprocesses frames from a video file.

    Parameters:
    - video_path (str): Path to the video file.
    - sample_rate (int): Frame sampling rate, indicating the frequency of frame sampling (e.g., sample every N frames).
    - target_size (tuple): Target size for resizing frames, specified as (height, width).
    - timestamps (list): A list to store timestamps corresponding to each sampled frame.

    Returns:
    - numpy.ndarray: Array of preprocessed frames with shape (num_frames, height, width, channels).
      Each frame is resized, converted to RGB, and appended with its corresponding timestamp.

    Note:
    - The frames are resized to the specified target size.
    - Timestamps are saved in a separate array for each sampled frame.
    """

    # Create a VideoCapture object
    cap = cv2.VideoCapture(video_path)

    # Check if the video opened successfully
    if not cap.isOpened():
        print("Error: Could not open video.")
        return None

    # Lists to store preprocessed frames
    frames = []

    # Iterate through the frames
    while True:
        # Read a frame from the video
        ret, frame = cap.read()

        # Check if the video has ended
        if not ret:
            break

        # Sample frames
        if cap.get(1) % sample_rate == 0:
            # Resize the frame
            frame = cv2.resize(frame, target_size)

            # Convert BGR to RGB
            frame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)

            # Normalize the frame
            frame = frame / 255.0

            # save timestamps to separate array
            timestamps.append(cap.get(cv2.CAP_PROP_POS_MSEC))

            # Append the preprocessed frame to the list
            frames.append(frame)

    # Release the VideoCapture object
    cap.release()

    # Return list of frames converted to a NumPy array
    return np.array(frames)

In [None]:
# Video IDs specified in assignment
video_id_list = ["WeF4wpw7w9k", "2NFwY15tRtA", "5dRramZVu2Q"]

# Get current directory, define subdirectory for videos, and define path
current_directory = os.getcwd()
videos_directory = "videos"
output_path = os.path.join(current_directory, videos_directory)

# Join the current directory with the videos directory and download videos
download_videos(video_id_list, output_path)

# save all mp4 paths to video_paths list
video_paths = []
for root, dirs, files in os.walk(output_path):
  for file in files:
    full_path = os.path.join(root, file)
    video_paths.append(full_path)

# keep in sorted order for consistent IDs in PG instance
video_paths.sort()

for video_path in video_paths:
  print(video_path)

Video downloaded successfully: Cyclist and vehicle Tracking - 1
Video downloaded successfully: Cyclist and vehicle tracking - 2
Video downloaded successfully: Drone Tracking Video
/content/videos/Cyclist and vehicle Tracking - 1.mp4
/content/videos/Cyclist and vehicle tracking - 2.mp4
/content/videos/Drone Tracking Video.mp4


In [None]:
frames = preprocess_video_from_file(video_paths[0], sample_rate=5)
print(frames.shape)

(269, 640, 640, 3)


## Task 2: Object Detection (40 points)

Perform object detection on the following videos.

```{eval-rst}
.. youtube:: WeF4wpw7w9k
```

```{eval-rst}
.. youtube:: 2NFwY15tRtA
```

```{eval-rst}
.. youtube:: 5dRramZVu2Q
```

Split the videos into frames and use an object detector of your choice, in a framework of your choice to detect the cyclists.  

In [None]:
rf = Roboflow(api_key=userdata.get('roboflow_key'))
project = rf.workspace("test-yuzee").project("visdronedi_mv")
version = project.version(3)
dataset = version.download("yolov8")

In [None]:
!yolo task=detect mode=train model=yolov8x.pt data=/content/VisDroneDI_MV-3/data.yaml epochs=20

Ultralytics YOLOv8.1.34 🚀 Python-3.10.12 torch-2.2.1+cu121 CUDA:0 (Tesla T4, 15102MiB)
[34m[1mengine/trainer: [0mtask=detect, mode=train, model=yolov8x.pt, data=/content/VisDroneDI_MV-3/data.yaml, epochs=20, time=None, patience=100, batch=16, imgsz=640, save=True, save_period=-1, cache=False, device=None, workers=8, project=None, name=train2, exist_ok=False, pretrained=True, optimizer=auto, verbose=True, seed=0, deterministic=True, single_cls=False, rect=False, cos_lr=False, close_mosaic=10, resume=False, amp=True, fraction=1.0, profile=False, freeze=None, multi_scale=False, overlap_mask=True, mask_ratio=4, dropout=0.0, val=True, split=val, save_json=False, save_hybrid=False, conf=None, iou=0.7, max_det=300, half=False, dnn=False, plots=True, source=None, vid_stride=1, stream_buffer=False, visualize=False, augment=False, agnostic_nms=False, classes=None, retina_masks=False, embed=None, show=False, save_frames=False, save_txt=False, save_conf=False, save_crop=False, show_labels=True,

## Task 3: Kalman Filter (50 points)

Use the  [filterpy](https://filterpy.readthedocs.io/en/latest/kalman/KalmanFilter.html) library to implement Kalman filters that will track the cyclist and the vehicle (if present) in the video. You will need to use the detections from the previous task to initialize and run the Kalman filter.

You need to deliver a video that contains the trajectory of the objects as a line that connects the pixels that the tracker indicated. You can use the `ffmpeg` command line tool and OpenCV to superpose the bounding box of the drone on the video as well as plot its trajectory.

Suggest methods that you can use to address  false positives and how the tracker can help you in this regard.

You will need to have one Kalman filter to track each of the required and present objects (cyclist and vehicle).

## Extra Bonus (20 points)

```{eval-rst}
.. youtube:: 2hQx48U1L-Y
```

The cyclist in the video goes in and out of occlusions. In addition the object is small making detections fairly problematic without finetuning and other optimizations.  Fintetuning involves using the pretrained model and training it further using images of cyclists from a training dataset such as [VisDrone](https://github.com/VisDrone/VisDrone-Dataset). At the same time,  reducing the number of classes to a much smaller number such as person & bicycle may help.  Also some 2 stage detectors may need to be further optimized in terms of parameters for small objects. See [this paper](https://www.mdpi.com/1424-8220/23/15/6887) for ideas around small object tracking.


```{note}
The extra points can only be awarded in the category of `assignments` and cannot be used to compensate for any other category such as `exams`.
```