## Object Tracking with `arcgis.learn`

Object tracking is the process of:

* Taking an initial set of object detections (such as an input set of bounding box coordinates)
* Creating a unique ID for each of the initial detections
* And then tracking each of the objects as they move around frames in a video, maintaining the assignment of unique IDs

Multiple-objects tracking can be performed using `predict_video` function of the `arcgis.learn` module.

<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Prerequisites" data-toc-modified-id="Prerequisites-1">Prerequisites</a></span></li><li><span><a href="#How-Object-Tracking-Works" data-toc-modified-id="How-Object-Tracking-Works">How Object Tracking Works</a></span><ul class="toc-item"><li><span><a href="#Kalaman-Filter" data-toc-modified-id="Kalman-Filter">Kalman Filter</a></span></li><li><span><a href="#Hungarian-Assignment-Algorithm" data-toc-modified-id="Hungarian-Assignment-Algorithm">Hungarian Assignment Algorithm</a></span></li></ul></li><li><span><a href="#Track-Objects-Using-arcgis.learn" data-toc-modified-id="Track-Objects-Using-arcgis.learn">Track Objects Using arcgis.learn</a></span></li><li><span><a href="#Vehicle-Tracking-Example" data-toc-modified-id="Vehicle-Tracking-Example">Vehicle Tracking Example</a></span></li></ul></div>

## Prerequisites

- Please refer to the prerequisites section in our [guide](https://developers.arcgis.com/python/guide/geospatial-deep-learning/) for more information. This sample demonstrates how to do object tracking using arcgis.learn.
- Please refer to [guide](https://developers.arcgis.com/python/guide/object-detection/) to understand how object detection works.

## How Object Tracking Works

Object tracking in `arcgis.learn` is based SORT(Simple Online Realtime Tracking) Algorithm. This Algorithm combines __Kalman-filtering and Hungarian Assignment Algorithm__

__Kalman Filter__ is used to estimate the position of a tracker while __Hungarian Algorithm__ is used to assign trackers to a new detection.
Following sections briefly describe __Kalman Filter__ and __Hungarian Algorithm__.

## Kalman Filter

Kalman filtering uses a series of measurements observed over time and produces estimates of unknown variables by estimating a joint probability distribution over the variables for each timeframe. The filter is named after Rudolf E. Kálmán, one of the primary developers of its theory.

Our state contains 8 variables; `(u,v,a,h,u’,v’,a’,h’)` where `(u,v)` are centres of the bounding boxes, a is the aspect ratio and h, the height of the image. The other variables are the respective velocities of the variables.

A Kalman Filter is used on every bounding box, so it comes after a box has been matched with a tracker. When the association is made, predict and update functions are called. 

#### Predict

Prediction step is matrix multiplication that will tell us the position of our bounding box at time t based on its position at time t-1.

#### Update

Update phase is a correction step. It includes the new measurement from the Object Detection model and helps improve our filter.


## Hungarian Assignment Algorithm

The Hungarian algorithm, also known as Kuhn-Munkres algorithm, can associate an obstacle from one frame to another, based on a score such as Intersection over Union (IoU). 

We iterate through the list of trackers and detections and assign a tracker to each detection on the basis of IoU scores.



__The general process is to detect obstacles using an object detection algorithm, match these bounding box with former bounding boxes we have using The Hungarian Algorithm and then predict future bounding box positions or actual positions using Kalman Filters.__ 

## Track Objects Using arcgis.learn

Multiple-object tracking can be performed using `predict_video` function of the `arcgis.learn` module. To enable tracking, set the `track` parameter in the `predict_video` function as `track = True`.

The following options/parameters are available in the predict video function for the user to decide:-

* `vanish_frames` i.e. the number of frames the object remains absent from the frame to be considered as vanished.

* `detect_frames` i.e. the number of frames an object remains present in the frame to start tracking.

* `assignment_iou_thrd` i.e. There might be multiple trackers detecting and tracking objects. The Intersection over Union (iou) threshold can be set to assign a tracker with the mentioned threshold value.


## Vehicle Tracking Example

The following video has been created using `predict_video()` function of a `Retinanet` model from `arcgis.learn`. 

*The data is collected from a lamp post in Berlin.*



In [None]:
# Necessary Imports
from arcgis.learn import RetinaNet, prepare_data

In [None]:
#data preparation
data_path = "data/vehicle_detection"
data = prepare_data(data_path, 
                    batch_size=4, 
                    dataset_type="PASCAL_VOC_rectangles", 
                    chip_size=480)

In [None]:
#prepare retinanet model
retinanet = RetinaNet(data)
retinanet.fit(100, lr=0.00004365, tensorboard=True)

In [None]:
#Use predict video
retinanet.predict_video(input_video_path=r'data/test.mp4', 
                        metadata_file=r'data/vid1.csv',
                        track=True, 
                        visualize=True, 
                        threshold=0.5)

<video width="100%" height="450" loop="loop" controls src="data/test_predictions.mp4" />

## References

[1] Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He: “Focal Loss for Dense Object Detection”, 2017; [http://arxiv.org/abs/1708.02002 arXiv:1708.02002].

[2] https://towardsdatascience.com/computer-vision-for-tracking-8220759eee85

[3]https://arxiv.org/abs/1602.00763