## Kilometers per hour Feature
Status: ***Work In Progress***

### Purpose
We aim to improve the current object detection and tracking system by calculating the speed of detected objects (dogs) in kilometers per hour (km/h). This enhancement will allow us to validate the speeds using real-world data, such as GPS information from greyhound racing, which shows an average dog speed of around 62-70 km/h. This real-time speed calculation will add accuracy to the object tracking system and make it easier to verify results.

### How?
1. **Object Detection:**

* We use the Detectron2 library for object detection, specifically utilizing the Mask R-CNN model pre-trained on the COCO dataset to detect dogs. In COCO, dogs are labeled with class ID 17.

2. **Object Tracking:**

* For tracking, we use the DeepSort tracker to follow detected objects across multiple frames. DeepSort assigns unique IDs to track each object over time.

3. **Speed Calculation:**

* The speed of each tracked object is calculated by measuring the change in position of the object across frames, using Euclidean distance to find the pixel displacement.
* We convert the pixel displacement to meters by using a predefined scale (meters per pixel), then compute the object's speed in meters per second and finally convert it to km/h.

### Limitation
1. Scale Assumption: We assume a fixed conversion factor for pixels to meters (0.05 meters/pixel in this example). This factor might vary depending on the camera angle, distance, and actual track size. Thus, the calculated speeds may need tuning based on actual GPS data or other real-world measurements.

2. Bounding Box Consistency: The size of the bounding box is dynamic and can change with the object's distance from the camera, making speed calculations less reliable when an object moves farther away.

### Step-by-step Description of the Code:
**Installing Dependencies:**

The first block installs the Detectron2 library, which provides the tools for object detection.

In [17]:
!python -m pip install 'git+https://github.com/facebookresearch/detectron2.git'

Collecting git+https://github.com/facebookresearch/detectron2.git
  Cloning https://github.com/facebookresearch/detectron2.git to /tmp/pip-req-build-jn11fztz
  Running command git clone --filter=blob:none --quiet https://github.com/facebookresearch/detectron2.git /tmp/pip-req-build-jn11fztz
  Resolved https://github.com/facebookresearch/detectron2.git to commit ebe8b45437f86395352ab13402ba45b75b4d1ddb
  Preparing metadata (setup.py) ... [?25l[?25hdone


**Importing all necessary Libraries**

In [18]:
!pip install deep_sort_realtime
import torch
import detectron2
import pycocotools
import numpy as np
import cv2
import random
from google.colab.patches import cv2_imshow

# import some common detectron2 utilities
from detectron2.utils.logger import setup_logger
from detectron2 import model_zoo
from detectron2.engine import DefaultPredictor
from detectron2.config import get_cfg
from detectron2.utils.visualizer import Visualizer
from detectron2.data import MetadataCatalog
import os
import numpy as np
from deep_sort_realtime.deepsort_tracker import DeepSort # This should now work as the module has been installed.
from scipy.spatial.distance import euclidean



In [19]:
TORCH_VERSION = ".".join(torch.__version__.split(".")[:2])
CUDA_VERSION = torch.__version__.split("+")[-1]
print("torch: ", TORCH_VERSION, "; cuda: ", CUDA_VERSION)

setup_logger()

torch:  2.4 ; cuda:  cpu


<Logger detectron2 (DEBUG)>

**Model Setup:**

* We import the necessary libraries, including Detectron2 for detection, DeepSort for tracking, and cv2 (OpenCV) for video processing.

* Next, we configure the Mask R-CNN model from the Detectron2 model zoo. The model is pre-trained on the COCO dataset, and it uses a threshold of 0.5 for detection confidence. The system is set to use CUDA if a GPU is available.

In [20]:
# Define the video path
MARKET_SQUARE_VIDEO_PATH = "/greyhound1.mp4"

# Setup Detectron2 model configuration
cfg = get_cfg()
cfg.merge_from_file(model_zoo.get_config_file("COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml"))
cfg.MODEL.WEIGHTS = model_zoo.get_checkpoint_url("COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml")
cfg.MODEL.ROI_HEADS.SCORE_THRESH_TEST = 0.5  # Set threshold for this model
cfg.MODEL.DEVICE = "cuda" if torch.cuda.is_available() else "cpu"


**Video Setup:**

* The video is loaded using OpenCV’s VideoCapture, and properties such as width, height, and FPS are retrieved to configure the video writer for the output.

* We also specify the output directory where the processed video will be saved.

In [21]:
# Initialize the Detectron2 predictor
predictor = DefaultPredictor(cfg)

# Initialize the DeepSort tracker
tracker = DeepSort(max_age=30)

# Open the video file
cap = cv2.VideoCapture(MARKET_SQUARE_VIDEO_PATH)

# Verify the output directory and permissions
output_dir = "/content"
if not os.path.exists(output_dir):
    os.makedirs(output_dir)

if not os.access(output_dir, os.W_OK):
    raise PermissionError(f"Write permission denied for the directory {output_dir}")

# Define the output video path
output_path = os.path.join(output_dir, "dog_tracking_output_kmph.mp4")

assert cap.isOpened(), "Error reading video file"

# Get video properties
w, h, fps = (int(cap.get(x)) for x in (cv2.CAP_PROP_FRAME_WIDTH, cv2.CAP_PROP_FRAME_HEIGHT, cv2.CAP_PROP_FPS))

# Initialize VideoWriter with a successful FourCC code
fourcc_code = cv2.VideoWriter_fourcc(*"mp4v")
video_writer = cv2.VideoWriter(output_path, fourcc_code, fps, (w, h))

# Example scale: 1 pixel = 0.05 meters (adjust according to your video)
scale_meters_per_pixel = 0.05

[09/18 12:10:44 d2.checkpoint.detection_checkpoint]: [DetectionCheckpointer] Loading from https://dl.fbaipublicfiles.com/detectron2/COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x/137849600/model_final_f10217.pkl ...


  self.model.load_state_dict(torch.load(model_wts_path))


**Speed Calculation Function:**

* This function takes the previous and current positions of an object (in pixels) and calculates the speed in km/h.
* First, we calculate the pixel distance between the two positions using Euclidean distance.
* Next, we convert the distance in pixels to meters using the defined scale (0.05 meters/pixel).
* We then calculate the speed in meters per second by multiplying the distance by the frame rate (FPS) and finally convert it to km/h by multiplying by 3.6.

In [None]:
# Function to calculate speed in km/h
def calculate_speed(previous_position, current_position, fps):
    distance_pixels = euclidean(previous_position, current_position)
    # Convert pixels to meters
    distance_meters = distance_pixels * scale_meters_per_pixel
    # Speed in meters per second
    speed_mps = distance_meters * fps
    # Convert to kilometers per hour (km/h)
    speed_kmph = speed_mps * 3.6
    return speed_kmph

# Track previous positions of dogs to calculate speed
previous_positions = {}

# Process video frames
while cap.isOpened():
    ret, frame = cap.read()
    if not ret:
        print("Video frame is empty or video processing has been successfully completed.")
        break

    # Perform object detection
    outputs = predictor(frame)

    # Extract bounding boxes, confidences, and class IDs
    instances = outputs["instances"].to("cpu")
    boxes = instances.pred_boxes.tensor.numpy()
    confidences = instances.scores.numpy()
    class_ids = instances.pred_classes.numpy()

    # Filter out only dog detections (Class ID for dogs in COCO dataset is 17)
    dog_indices = np.where(class_ids == 17)[0]
    boxes = boxes[dog_indices]
    confidences = confidences[dog_indices]
    class_ids = class_ids[dog_indices]

    # Prepare detections for tracking
    detections = []
    for i in range(len(boxes)):
        x1, y1, x2, y2 = boxes[i]
        bbox = [x1, y1, x2 - x1, y2 - y1]  # Convert to [x, y, w, h]
        detection = (bbox, confidences[i], class_ids[i])
        detections.append(detection)

    tracked_objects = tracker.update_tracks(detections, frame=frame)

**Processing Each Frame:**

1. For each frame in the video:
* We first apply Detectron2 to perform object detection. It returns bounding boxes, class IDs, and confidence scores.
* We filter out the detections to only keep the ones that belong to the dog class (Class ID 17 in COCO).
* The detected bounding boxes are converted into a format suitable for tracking and passed to the DeepSort tracker.
2. For each tracked object:
* If the object has been detected before (i.e., it has a previous position), we calculate its speed using the calculate_speed function.
* We draw the bounding box and label (with the speed in km/h) on the frame.

In [None]:
    labels = []
    for obj in tracked_objects:
        if not obj.is_confirmed():
            continue

        box = obj.to_ltwh()  # Get the bounding box as [left, top, width, height]
        obj_id = obj.track_id
        class_id = obj.det_class

        center = (int(box[0] + box[2] / 2), int(box[1] + box[3] / 2))

        if obj_id in previous_positions:
            speed_kmph = calculate_speed(previous_positions[obj_id], center, fps)
            label = f"ID {obj_id} | Speed: {speed_kmph:.2f} km/h"
        else:
            label = f"ID {obj_id} | Speed: calculating..."

        previous_positions[obj_id] = center

        # Draw the box and label on the frame
        cv2.rectangle(frame, (int(box[0]), int(box[1])), (int(box[0] + box[2]), int(box[1] + box[3])), (0, 255, 0), 2)
        cv2.putText(frame, label, (int(box[0]), int(box[1] - 10)), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (255, 255, 255), 2)

    # Write the processed frame to the output video
    video_writer.write(frame)


**Video Setup:**

* The video is loaded using OpenCV’s VideoCapture, and properties such as width, height, and FPS are retrieved to configure the video writer for the output.
* We also specify the output directory where the processed video will be saved.

**Saving the Video:**

After processing all frames, the video is saved to the specified output path.

In [None]:
cap.release()
video_writer.release()
# cv2.destroyAllWindows()

print(f"Processed video saved to {output_path}")


**Conclusion**

This notebook demonstrates how to detect and track objects (dogs) in a video, calculate their speed in kilometers per hour, and output the result into a new video. The current implementation includes limitations related to the pixel-to-meter scale, and bounding box consistency, but it establishes a baseline for further refinements and validation using GPS data from real-world races.
