%%latex
\tableofcontents

%%javascript
MathJax.Hub.Config({
    TeX: { equationNumbers: { autoNumber: "AMS" } }
});

# Introduction

This project deals with the problem of object tracking in a static-background environment. The goal is to find and track objects in a video sequence. The methods used are based on the work of Nummiaro et al. \cite{nummiaro2002}, where applying colour distributions, make the model more robust to brief and partial occlusion, rotation, and scale changes, especially efficient and useful in the case of single-target tracking.
The videos used in this project are from the MOT17 challenge dataset \cite{mot17}, available [here](https://motchallenge.net/data/MOT17/).


In [2]:
# general
from src.utils import *
import os
import time
# computation and vision
import cv2
import numpy as np
import pandas as pd
# plotting
import matplotlib.pyplot as plt



In [14]:
names = ['MOT17-09-DPM-raw']
videos = {}
for name in names:
    videos[name] = load_video('./MOT/'+name+'.mp4', show=False, f=1)


# Particle-Filter Tracking

This method, based on the work *An Adaptive Colour-Based Particle Filter* \cite{nummiaro2002}, and follows a top-down approach, where it generates object hypotheses and attempts to verify them with the data. Using the colour distributions, this method only attempts to match only objects that have a similar histogram and build on top of particle filtering.

## Definition
Particle filtering tracks the state of an object described by a vector $X_t$ while an observation vector $Z_t$ keeps track of all observations up to time $t$. These filters are often used for non-Gaussian posterior density $p(X_t|Z_t)$ and observation density $p(z_t|X_t)$. The key idea is approximating the probability distribution by a weighted samples set $S$ where each sample denotes a hypothetical state of the object with a discrete sampling probability distribution $\pi$.

The sample set evolves by propagating each sample according to a model, then each element is weighted in terms of observation and $N$ samples are drawn by choosing a particular sample with probabily $\pi^{(n)} = p(z_t|X_t = s_t^{(n)}).$ The mean state of an object is estimated at each time $t$ by:
$$ E[S] = \sum_{n=1}^{N}{\pi^{(n)}s^{(n)}}$$
By modelling uncertainty, the model keeps the options open and can consider multiple hypotheses and choose the closest. By keeping the less likely states in memory briefly, the model can deal with short-term occlusion.

### Colour Distribution
The colour distribution are used to provide robustness to non-rigidity, rotation, scaling, and partial occlusion. By discretizing the distributions into $m$ bins, histograms produced are produced in HSV colour space to allow for less sensitivity to illumination changes (fewer bins in the V channel). The distribution is determined inside an upright elliptic region with half axes $H_x$ and $H_y$. Smaller weights are assigned to pixels farther from the centre of the region using a weighting function.
$$k(r) = \left \{ \begin{array}{ccl}1-r^2 & r < 1 \\ 0 & otherwise \end{array}\right.$$
where $r$ is the distance from the centre of the region.
The colour distribution $p_y = {p_y^{(u)}}_{u=1..m}$ is defined as:
$$p_y^{(u)} = f \sum_{i=1}^{I}{k(\frac{||y-x_i||}{a})\delta[h(x_i) - u]}$$
where $f$ is the normalization factor, $I$ is the number of pixels in region, $\delta$ is the Kronecker delta function, and $a=\sqrt{H_x^2+H_y^2}$.
Considering the colour histograms $p$ and $q$, the similarity measure is defined using Bhattacharyya coefficient:
$$\rho[p,q] = \sum_{u=1}^{m}{\sqrt{p^{(u)}q^{(u)}}}$$
and hence the Bhattacharyya distance, which updates the a priori distribution given by particle filter, is:
$$d[p,q] = \sqrt{1-\rho[p,q]}$$


Each sample in the distribution represents an ellipse given by $state = {x, y, \dot{x}, \dot{y}, H_x, H_y, \dot{a}}$, where $(x,y)$ is the coordinate of the central point, $(H_x, H_y)$ are the half axes, $(\dot{x}, \dot{y})$ is the motion, and $\dot{a}$ is the scale change.
The sample set is then propagated using:
$$s_t = A s_{t-1} + w_{t-1}$$
where $A$ defines the deterministic component of the model and $w_{t-1}$ is a multivariate Gaussian random variable.

## Algorithm and Implementation
The Particle Filter model is first initialised with a set of samples placed inside the initial location of the target and the colour distribution is calculated for the first frame.
Each consequent iteration has four steps:
1. **Select**: selects $N$ samples from set $S_{t-1}$ with probability $\pi_{t-1}^{(n)}$.
2. **Propagate**: propagates each sample from set $S'_{t-1}$ by the equation given above.
3. **Observe**: observes the colour distributions using the method described.
4. **Estimate**: estimates the mean state of $S_t$.
Generated outputs are saved in the form of jpg images and stitched together using ffmpeg to create the output video.

### Performance Optimisations
The main parameter —aside from initial set— is the number of particles. For a low number of particles, the model might 'wander' incorrectly, but for a high number of particles, the model might be too slow to track the target.
To increase the speed of the algorithm, a third-size video, sliced to three seconds, is used from the MOT17 dataset \cite{MOT17}. The generated outputs are for the woman in green (right side of the image) for `particle_count` $\in [10, 50, 80]$ and the man in black (starting on the left side) with `particle_count` $ = [20, 30]$, to display its performance on occlusion.

In [4]:
# reload utils
from importlib import reload
import src.utils
import src.ParticleFilter 
reload(src.utils)
reload(src.ParticleFilter)
from src.utils import *
from src.ParticleFilter import ParticleFilter

In [6]:
name = 'MOT17-09-DPM-raw'
sframe = 50
eframe = 200
p_count = 20
target = 'man_black'
# get_points(videos[name][sframe])

In [7]:
frames = np.copy(videos[name][sframe:eframe])
PF = ParticleFilter(frames, particle_count=p_count, init_x=77, init_y=93, init_Hx=17, init_Hy=35, out_path='./output/')

In [8]:
# pf_df = pd.DataFrame(columns=['Test', 'Time', 'Particle Count'])
# or import from file
pf_df = pd.read_csv('./output/pf_df.csv')

In [None]:
start = time.time()
while PF.f_idx < len(PF.frames) - 1:
    PF.select()
    PF.propagate()
    PF.observe()
    PF.estimate()
pf_time = time.time() - start

In [10]:
pf_df.loc[len(pf_df)] = [(name+'-'+target), pf_time, len(PF.particles)]
pf_df.to_csv('./output/pf_df.csv', index=False)

In [12]:
save_output(output_name=f'{name}_{PF.particle_count}_{target}.mp4')

0

## Results
For the `woman_green` target tests, the optimal particle count is 50, since 10 particles has reduced accuracy, while 80 particles is both slow and more prone to scale changes (the final frames only track a limb).

In [11]:
pf_df

Unnamed: 0,Test,Time,Particle Count
0,MOT17-09-DPM-raw-woman_green,2087.3,50
1,MOT17-09-DPM-raw-woman_green,496.561201,10
2,MOT17-09-DPM-raw-woman_green,3197.979189,80
3,MOT17-09-DPM-raw-man_black,3441.367042,30
4,MOT17-09-DPM-raw-man_black,3487.032923,20


# Kalman Filter Tracking

In [13]:
from importlib import reload
import src.KalmanFilter
reload(src.KalmanFilter)
from src.KalmanFilter import KalmanFilter