<img align="center" src="img/course.png" width="800">

# 16720 (B)  Object Tracking in Videos - Assignment 6 - Q2
    Instructor: Kris                          TAs: Arka, Rohan, Rawal, Sheng-Yu, Jinkun


In [None]:
# Libraries

import numpy as np
from scipy.interpolate import RectBivariateSpline
from PIL import Image
import matplotlib.pyplot as plt
import matplotlib.patches as patches
%matplotlib inline

## Q2: Matthew-Bakers Inverse Compositional Alignment with Affine Matrix

### Q2.1: Implementation (10 PT write-up, 20 PT implementation)
Now we will implement the Matthew-Bakers tracker to alleviate the computational costs of the the Lucas-Kanade tracker, as it only calculates the Hessian and Jacobian once per each video. Write the function with the following function signature:

```
            M = InverseCompositionAffine(It, It1, rect)
```
that computes the optimal local motion represented by a $2x3$ affine transformation matrix $M$ from frame $I_t$ to frame $I_{t+1}$ that minimizes

$$
\begin{gathered}
\mathcal{L}=\sum_{\mathbf{x}}[\mathbf{T}(\mathbf{x})-\mathbf{I}(\mathbf{W}(\mathbf{x} ; \mathbf{p}))]^{2}. 
\end{gathered}
$$

The inputs are structured identically to Q1.2, but you should replace the forward alignment algorithm with the inverse compositional alignment algorithm. You may also find these materials useful: [link](https://www.ri.cmu.edu/pub_files/pub3/baker_simon_2002_3/baker_simon_2002_3.pdf) and [link](https://www.ri.cmu.edu/pub_files/pub3/baker_simon_2003_3/baker_simon_2003_3.pdf).

<span style='color:red'>**Output:**</span> In your write-up: Please include the results of the algorithm on all five videos we have provided along with your code. Compare the results of the Matthew-Bakers Tracker with the previous algorithms you have implemented. How do your algorithms perform on each video? What are the differences of the three algorithms in terms of performance and why do they have those differences? At what point does the algorithm break down and why does this happen?

In [None]:
import torch
device = torch.cuda.is_available()
print('*' * 50)
if torch.cuda.is_available():
    print('CUDA is found! Tranining on %s.......'%torch.cuda.get_device_name(0))
else:
    warnings.warn('CUDA not found! Training may be slow......')

In [None]:
def InverseCompositionAffine(It, It1, rect, thresh=.025, maxIters=100):
    '''
    Q2.1: Matthew-Bakers Inverse Compositional Alignment with Affine MAtrix
    
      Inputs: 
        It: template image
        It1: Current image
        rect: Current position of the object
        (top left, bottom right coordinates, x1, y1, x2, y2)
        thresh: Stop condition when dp is too small
        maxIt: Maximum number of iterations to run
        
      Outputs:
        M: Affine mtarix (2x3)
    '''
    M = np.eye(3)
    P = np.zeros((3,3))
#     threshold = thresh

    x1, y1, x2, y2 = rect

    if x2 < x1 or y2 < y1: 
        return M[: 2]
    
    s1 = np.arange(It.shape[0])
    s2 = np.arange(It.shape[1])
    s3 = np.arange(It1.shape[0])
    s4 = np.arange(It1.shape[1])
        
    interit = RectBivariateSpline(s1, s2, It) 
    interit1 = RectBivariateSpline(s3, s4, It1)
    
    random_x = np.arange(x1, x2 + 0.1)
    random_y = np.arange(y1, y2 + 0.1)
    x, y = np.meshgrid(random_x, random_y)
    x = x.flatten()
    y = y.flatten()     
    T_ev = interit.ev(y, x)
    
    coords_ = np.hstack((x.reshape(-1, 1), y.reshape(-1, 1)))
        
    T_x = interit.ev(y, x, dx=0, dy=1)
    T_y = interit.ev(y, x, dx=1, dy=0)

    A = np.zeros((x.shape[0], 2, 6))
    
    A[:,0,0] = A[:,1,3] =  x
    A[:,0,1] = A[:,1,4] =  y
    A[:,0,2] = A[:,1,5] =  1
    
    grad = np.hstack((T_x.reshape(-1, 1), T_y.reshape(-1, 1))).reshape(-1, 1, 2)
    
    A = np.matmul(grad, A).reshape(-1, 6)
    
    for i in range(maxIters):

        coords = M @ (np.hstack((coords_, np.ones(coords_.shape[0]).reshape(-1, 1))).T)
        
        x = coords[0].flatten()
        y = coords[1].flatten()
                
        I_ev = interit1.ev(y, x)

        b = I_ev - T_ev
        
        dp = np.linalg.lstsq(A, b, rcond=None)[0] 

        dp = dp.reshape(2, 3)
        
        dM = np.eye(3)
        dM = dM + np.vstack((dp, np.array([0, 0, 0])))
        M = M @ np.linalg.pinv(dM)
    
        if np.linalg.norm(dp) <= thresh:
            break
    
    return M[: -1]

In [None]:
# Test your algorithm and visualize results!

# Load data
# data_name = 'car2' # could choose from (car1, car2, landing, race, ballet)
# data = np.load('./data/%s.npy' % data_name)

# obtain the initial rect with format (x1, y1, x2, y2)
if data_name == 'car1':
    initial = np.array([170, 130, 290, 250])
elif data_name == 'car2':
    initial = np.array([59, 116, 145, 151])
elif data_name == 'landing':
    initial = np.array([440, 80, 560, 140])
elif data_name == 'race':
    initial = np.array([170, 270, 300, 370])
elif data_name == 'ballet':
    initial = np.array([700, 210, 775, 300])
else:
    assert False, 'the data name must be one of (car1, car2, landing, race, ballet)'

numFrames = data.shape[2]
w = initial[2] - initial[0]
h = initial[3] - initial[1]

# loop over frames
rects = []
rects.append(initial)

for i in range(numFrames-1):

    It = data[:,:,i]
    It1 = data[:,:,i+1]
    rect = rects[i]

    # run algorithm and collect rects
    M = InverseCompositionAffine(It, It1, rect)
    corners = np.array([[rect[0], rect[1], 1], 
                        [rect[2], rect[3], 1]]).transpose()
    newRect = np.matmul(M, corners).transpose().reshape((4, ))
    rects.append(newRect)

    # Visualize
    fig = plt.figure(1)
    ax = fig.add_subplot(111)
    ax.add_patch(patches.Rectangle((rect[0], rect[1]), rect[2]-rect[0]+1, rect[3]-rect[1]+1, linewidth=2, edgecolor='red', fill=False))
    plt.imshow(It1, cmap='gray')
    plt.show()
    ax.clear()

In [None]:
# For some transparency: we evaluate on multiple frames in a given video starting from the first frame.
# We then compare against the reference implementation and calculate the sum of all differences.
# You should not need to tune anything for the autograding. We pass in the same hyperparameters for you.


### Q2.2: Comparing Your Algorithms (write-up only, 10 PT)
Compare the results of the Matthew-Bakers Tracker with the previous algorithms you have implemented. How do your algorithms perform on each video? What are the differences of the three algorithms in terms of performance and why do we have those differences?  At what point does the algorithm break down and why does this happen?