<a href="https://colab.research.google.com/github/itberrios/CV_tracking/blob/main/kitti_tracker/2_kitti_tracking.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **KITTI Tracking**

In this tutorial we will learn how to track objects in 3D on the KITTI dataset. We will build off of our object dector from part 1 and use each obejct detection to update the tracks.

For more information a readme for the KITTI data can be found [here](https://github.com/yanii/kitti-pcl/blob/master/KITTI_README.TXT), and a paper that details the data collection and coordinate systems can be found [here](http://www.cvlibs.net/publications/Geiger2013IJRR.pdf). 

<br>

## Get the data

In [None]:
!wget https://s3.eu-central-1.amazonaws.com/avg-kitti/raw_data/2011_10_03_drive_0047/2011_10_03_drive_0047_sync.zip

--2022-09-22 11:48:55--  https://s3.eu-central-1.amazonaws.com/avg-kitti/raw_data/2011_10_03_drive_0047/2011_10_03_drive_0047_sync.zip
Resolving s3.eu-central-1.amazonaws.com (s3.eu-central-1.amazonaws.com)... 52.219.171.13
Connecting to s3.eu-central-1.amazonaws.com (s3.eu-central-1.amazonaws.com)|52.219.171.13|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 3103291675 (2.9G) [application/zip]
Saving to: ‘2011_10_03_drive_0047_sync.zip’


2022-09-22 11:50:39 (28.6 MB/s) - ‘2011_10_03_drive_0047_sync.zip’ saved [3103291675/3103291675]



In [None]:
!wget https://s3.eu-central-1.amazonaws.com/avg-kitti/raw_data/2011_10_03_calib.zip

--2022-09-22 11:50:39--  https://s3.eu-central-1.amazonaws.com/avg-kitti/raw_data/2011_10_03_calib.zip
Resolving s3.eu-central-1.amazonaws.com (s3.eu-central-1.amazonaws.com)... 52.219.46.23
Connecting to s3.eu-central-1.amazonaws.com (s3.eu-central-1.amazonaws.com)|52.219.46.23|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 4075 (4.0K) [application/zip]
Saving to: ‘2011_10_03_calib.zip’


2022-09-22 11:50:40 (192 MB/s) - ‘2011_10_03_calib.zip’ saved [4075/4075]



In [None]:
!jar xf 2011_10_03_drive_0047_sync.zip
!jar xf 2011_10_03_calib.zip

## Base Library Import

In [1]:
import os
from glob import glob
import cv2
import numpy as np
import pandas as pd
import torch
import matplotlib.pyplot as plt

%matplotlib inline
plt.rcParams["figure.figsize"] = (20, 10)

## Import Utility functions

In [2]:
!wget https://github.com/itberrios/CV_tracking/raw/main/kitti_tracker/kitti_utils.py
from kitti_utils import *

In [None]:

!wget https://github.com/itberrios/CV_tracking/raw/main/kitti_tracker/kitti_detection_utils.py
from kitti_detection_utils import *

## Get Data Paths

In [4]:
DATA_PATH = r'2011_10_03/2011_10_03_drive_0047_sync'

# get RGB camera data
left_image_paths = sorted(glob(os.path.join(DATA_PATH, 'image_02/data/*.png')))
right_image_paths = sorted(glob(os.path.join(DATA_PATH, 'image_03/data/*.png')))

# get LiDAR data
bin_paths = sorted(glob(os.path.join(DATA_PATH, 'velodyne_points/data/*.bin')))

# get GPS/IMU data
oxts_paths = sorted(glob(os.path.join(DATA_PATH, r'oxts/data**/*.txt')))

print(f"Number of left images: {len(left_image_paths)}")
print(f"Number of right images: {len(right_image_paths)}")
print(f"Number of LiDAR point clouds: {len(bin_paths)}")
print(f"Number of GPS/IMU frames: {len(oxts_paths)}")

Number of left images: 837
Number of right images: 837
Number of LiDAR point clouds: 837
Number of GPS/IMU frames: 837


## Get Camera Transformation Matrices

In [5]:
with open('2011_10_03/calib_cam_to_cam.txt','r') as f:
    calib = f.readlines()

# get projection matrices (rectified left camera --> left camera (u,v,z))
P_rect2_cam2 = np.array([float(x) for x in calib[25].strip().split(' ')[1:]]).reshape((3,4))


# get rectified rotation matrices (left camera --> rectified left camera)
R_ref0_rect2 = np.array([float(x) for x in calib[24].strip().split(' ')[1:]]).reshape((3, 3,))

# add (0,0,0) translation and convert to homogeneous coordinates
R_ref0_rect2 = np.insert(R_ref0_rect2, 3, values=[0,0,0], axis=0)
R_ref0_rect2 = np.insert(R_ref0_rect2, 3, values=[0,0,0,1], axis=1)


# get rigid transformation from Camera 0 (ref) to Camera 2
R_2 = np.array([float(x) for x in calib[21].strip().split(' ')[1:]]).reshape((3,3))
t_2 = np.array([float(x) for x in calib[22].strip().split(' ')[1:]]).reshape((3,1))

# get cam0 to cam2 rigid body transformation in homogeneous coordinates
T_ref0_ref2 = np.insert(np.hstack((R_2, t_2)), 3, values=[0,0,0,1], axis=0)

## Get LiDAR and IMU Transformation matrices

In [6]:
T_velo_ref0 = get_rigid_transformation(r'2011_10_03/calib_velo_to_cam.txt')
T_imu_velo = get_rigid_transformation(r'2011_10_03/calib_imu_to_velo.txt')

#### Get LiDAR ⬌ Camera2 Rotation matrices

LiDAR &rarr; Cam Ref 0 &rarr; Cam Ref 2 &rarr; Rectified 2 &rarr; Camera 2

In [7]:
# transform from velo (LiDAR) to left color camera (shape 3x4)
T_velo_cam2 = P_rect2_cam2 @ R_ref0_rect2 @ T_ref0_ref2 @ T_velo_ref0 

# homogeneous transform from left color camera to velo (LiDAR) (shape: 4x4)
T_cam2_velo = np.linalg.inv(np.insert(T_velo_cam2, 3, values=[0,0,0,1], axis=0)) 

#### Get IMU ⬌ Camera2 Rotation matrices

IMU &rarr; LiDAR &rarr; Cam Ref 0 &rarr; Cam Ref 2 &rarr; Rectified 2 &rarr; Camera 2

In [8]:
# transform from IMU to left color camera (shape 3x4)
T_imu_cam2 = T_velo_cam2 @ T_imu_velo

# homogeneous transform from left color camera to IMU (shape: 4x4)
T_cam2_imu = np.linalg.inv(np.insert(T_imu_cam2, 3, values=[0,0,0,1], axis=0)) 

## **Get Object Detection pipeline**

In [9]:
!git clone https://github.com/ultralytics/yolov5

fatal: destination path 'yolov5' already exists and is not an empty directory.


In [10]:
!pip install -r yolov5/requirements.txt  #Install whatever is needed

In [10]:
model = torch.hub.load('ultralytics/yolov5', 'yolov5s')  # or yolov5m, yolov5l, yolov5x, custom

Using cache found in /root/.cache/torch/hub/ultralytics_yolov5_master
YOLOv5 🚀 2022-9-22 Python-3.7.14 torch-1.12.1+cu113 CUDA:0 (Tesla T4, 15110MiB)

Fusing layers... 
YOLOv5s summary: 213 layers, 7225885 parameters, 0 gradients
Adding AutoShape... 


In [11]:
# set confidence and IOU thresholds
model.conf = 0.25  # confidence threshold (0-1), default: 0.25
model.iou = 0.25  # NMS IoU threshold (0-1), default: 0.45

## **Set up tracking pipeline**

The tracking will be a 3D real world extension of the [SORT algorithm](https://arxiv.org/pdf/1602.00763.pdf). Instead of tracking bounding box location and aspect, we will simple track the (x, y) locations of each detected object. For this was we will neglect the z-axis. In our Kalman Filter we will use a constanct velocity model with a random accleration assumption.

The tracking pipeline will use the object detection methods from part 1 as a backbone. The L2 distance between object (x,y,z) centers will be used as a cost. The Hungarian Algorithm (linear_sum_assignemnt in Python) will be used to match old tracks with new updates and determine if tracks are not updated.

In [44]:
from scipy.optimize import linear_sum_assignment

# tracker params
MIN_HIT_STREAK = 1
MAX_UNMATCHED_AGE = 2

# helper functions
def total_cost(center1, center2):
    ''' Return L2 distance between object centers '''
    return np.linalg.norm(center1 - center2)


def associate(old_centers, new_centers, dist_thresh=1):
    """
    Inputs:
        old_centers - former center locations (at time 0)
        new_centers - new center locations (at time 1)
        dist_thresh - distance threshold to declare tracks matched or unmatched
    Outputs:
       matches - Matched tracks
       unmatched_detections - Unmatched Detections
       unmatched_trackers - Unmatched Tracks

    """
    if (len(new_centers) == 0) and (len(old_centers) == 0):
        return [], [], []
    elif(len(old_centers)==0):
        return [], new_centers, []
    elif(len(new_centers)==0):
        return [], [], old_centers

    # distances will store L2 distances between object centers
    distances = np.zeros((len(old_centers),len(new_centers)),dtype=np.float32)

    # Go through centers and store the L2 distances between all of them
    for i,old_cntr in enumerate(old_centers):
        for j,new_cntr in enumerate(new_centers):
            distances[i][j] = total_cost(old_cntr, new_cntr)

    # TEMP
    print(distances)

    # Hungarian Algorithm (with L2 distance metric as the cost)
    row_ind, col_ind = linear_sum_assignment(distances)
    hungarian_matrix = np.array(list(zip(row_ind, col_ind)))

    # Create new unmatched lists for old and new boxes
    matches, unmatched_detections, unmatched_tracks = [], [], []

    # Go through the Hungarian Matrix, if matched element has dist <= threshold (0.3), add it to the unmatched 
    # Else: add the match    
    for h in hungarian_matrix:
        if(distances[h[0],h[1]] > dist_thresh):
            unmatched_tracks.append(old_centers[h[0]])
            unmatched_detections.append(new_centers[h[1]])
        else:
            matches.append(h.reshape(1,2))

    if(len(matches)==0):
        matches = np.empty((0,2), dtype=int)
    else:
        matches = np.concatenate(matches, axis=0)

    # Go through old centers, if no matched detection, add it to the unmatched_old_centers
    for t, trk in enumerate(old_centers):
        if(t not in hungarian_matrix[:,0]):
            unmatched_tracks.append(trk)

    # Go through new boxes, if no matched tracking, add it to the unmatched_new_centers
    for d, det in enumerate(new_centers):
        if(d not in hungarian_matrix[:,1]):
            unmatched_detections.append(det)

    return matches, unmatched_detections, unmatched_tracks
  

## Test track pipeline

In [35]:
index1 = 20
index2 = 21

left_image_1 = cv2.cvtColor(cv2.imread(left_image_paths[index1]), cv2.COLOR_BGR2RGB)
bin_path_1 = bin_paths[index1]

left_image_2 = cv2.cvtColor(cv2.imread(left_image_paths[index2]), cv2.COLOR_BGR2RGB)
bin_path_2 = bin_paths[index2]


In [14]:
imu_xyz_1 = get_imu_xyz(left_image_1, bin_path_1, model, T_velo_cam2, T_cam2_imu)

In [15]:
imu_xyz_2 = get_imu_xyz(left_image_2, bin_path_2, model, T_velo_cam2, T_cam2_imu)

In [16]:
imu_xyz_1

array([[     11.911,      -2.722,    -0.16906],
       [     15.574,       7.346,    -0.16672],
       [     11.089,      3.1335,    -0.13419],
       [     23.339,     -2.8507,   -0.010265],
       [     22.384,      3.5784,   -0.070647],
       [     46.072,      0.4007,     0.27154],
       [     29.946,      3.2677,   -0.044347],
       [     56.769,     -2.0642,     0.13642],
       [     13.094,      2.8895,     0.36942],
       [     52.366,      4.0757,     0.11331],
       [     45.877,     -3.0317,   0.0097941],
       [     23.318,      2.8036,   -0.096056]])

In [17]:
imu_xyz_2

array([[     15.315,       7.299,     -0.1505],
       [     11.965,      -2.751,    -0.10473],
       [     11.027,      3.1306,    -0.12802],
       [      23.49,     -2.8317,   -0.016854],
       [     22.356,      3.6084,   -0.070034],
       [     29.881,      3.2596,   -0.042096],
       [     57.115,     -1.8578,    -0.26439],
       [     46.409,     0.33389,     0.26685],
       [     52.351,      3.9927,     0.11557],
       [     31.399,      7.6322,    -0.18401],
       [     46.348,     -3.0605, -0.00073378]])

In [45]:
matches, unmatched_detections, unmatched_tracks = associate(imu_xyz_1, imu_xyz_2, dist_thresh=1)

[[     10.583    0.088346      5.9191       11.58      12.214       18.94      45.212      34.635      40.994      22.067      34.439]
 [    0.26366      10.723      6.2004      12.895      7.7444       14.88      42.548      31.625      36.931      15.827      32.487]
 [     5.9338      5.9494    0.062362      13.761      11.277      18.793      46.296      35.433      41.272      20.802      35.799]
 [     12.939      11.375      13.688     0.15231      6.5337      8.9521      33.791      23.291      29.809      13.224       23.01]
 [     7.9888      12.191      11.366       6.505    0.041049      7.5041      35.154      24.246      29.971      9.8849      24.867]
 [     31.524      34.255      35.153      22.814      23.935      16.444      11.284     0.34363      7.2355      16.365      3.4829]
 [     15.177      18.962       18.92      8.8818       7.598     0.06552      27.649      16.725      22.417       4.602       17.58]
 [     42.499       44.81      46.036      33.288      

In [46]:
matches

array([[ 0,  1],
       [ 1,  0],
       [ 2,  2],
       [ 3,  3],
       [ 4,  4],
       [ 5,  7],
       [ 6,  5],
       [ 7,  6],
       [ 9,  8],
       [10, 10]])

In [47]:
unmatched_detections

[array([     31.399,      7.6322,    -0.18401])]

In [48]:
unmatched_tracks

[array([     23.318,      2.8036,   -0.096056]),
 array([     13.094,      2.8895,     0.36942])]