# Computer Vision III: Detection, Segmentation and Tracking (CV3DST) GNN 

We will implement a Message Passing Network from scratch, and we will use to build a model that will learn to combine position information and reid features to directly predict associations between past tracks and detections. We will use this model to create robust tracker. 
- Implement a Message Passing Network from scratch to operate on bipartite graphs
- Implement the pairwise feature  computation to obtain features for our Message Passing Network
- Train the Message Passing Network and improve your tracker's IDF1 score

#### Install and import Python libraries

In [1]:
%load_ext autoreload
%autoreload 2
%matplotlib inline


In [2]:
import os
import sys
import matplotlib.pyplot as plt
import numpy as np
import time
import copy

from tqdm.autonotebook import tqdm

import torch
from torch import nn
from torch.utils.data import DataLoader
from torch.nn import functional as F

from scipy.optimize import linear_sum_assignment as linear_assignment
import os.path as osp

import motmetrics as mm

mm.lap.default_solver = "lap"


  from tqdm.autonotebook import tqdm


## import local functions

In [3]:

root_dir = ".."
sys.path.append(os.path.join(root_dir, "src"))


In [4]:

from mot.data.data_track import MOT16Sequences
from mot.tracker.advanced import LongTermReIDHungarianTracker
from mot.utils import  cosine_distance,ltrb_to_xcycwh
from mot.eval import get_mot_accum,evaluate_mot_accums,run_tracker
from mot.visualize import plot_sequence

## Speed-Ups
In order to speed up training and inference runtimes, in this exercise we will be working with pre-computed detections and ReID embeddings. We ran the object detector we provided in Exercise 0 and applied to all frames. We also computed reid embeddings for all boxes in every frame of the dataset so that they don't need to be computed every time you run your tracker. This yields over 10x speed improvements. You will not have to work directly with the resulting files, as we have internally adapted the boilerplate code to work with them.

In [5]:
# gnn_root_dir
# root_dir = '..'


In [6]:
train_db = torch.load(
    osp.join(root_dir, "data/preprocessed_data/preprocessed_data_train_2.pth")
)


In [7]:
val_sequences = MOT16Sequences(
    "MOT16-reid", root_dir=osp.join(root_dir, "data/MOT16"), vis_threshold=0.0
)


## Building a tracker based on Neural Message Passing

Our ``LongTermReIDHungarianTracker`` is still limited when compared to current modern trackers. 

Firstly, it relies solely on appearance to predict similarity scores between objectes. This can be problematic whenever appearance alone may not discriminative, and it'd be best to also take into account object position and size attributes. Secondly, our tracker can only account for pairwise similarities among objects. Ideally, we would like it to also consider higher-order information.

To address these limitations. We will now build a tracker that will combine both apperance and position information with a Message Passing Neural Network, inspired by the approach presented in [Learning a Neural Solver for Multiple Object Tracking, CVPR 2020](https://arxiv.org/abs/1912.07515)

The overall idea will be to build, for every tracking step, a bipartite graph containing two sets of nodes: past tracks, and detections in the current frame. We will initialize node features with ReID embeddings, and edge features with relative position features and ReID distance. We will use an MPN to refine these edge embeddings. The learning task will be to classify the edge embeddings in this graph, which is equivalent to predicting the entries of our data association similarity matrix.


### Building an MPN for Bipartite Graphs

We will first build a Neural Message Passing layer based on the Graph Networks framework introduced in [Relational inductive biases, deep learning, and graph networks, arXiv 2020](https://arxiv.org/abs/1806.01261), as explained in the *A More General Framework*

We will be using a bipartite graph, i.e., we will have two sets of nodes $A$ (past tracks), and $B$ (detections), and our set of edges will be $A\times B$. That is, we will connect every pair of past tracks and detections.

We will have initial node features (i.e. reid embeddings) matrices: $X_A$ and $X_B$ and an initial edge features tensor $E$.

$X_A$ and $X_B$ have shape $|A|\times \text{node\_dim}$ and $|B|\times \text{node\_dim}$, respectively.

$E$ has shape $|A| \times |B| \times \text{edge\_dim}$. Its $(i, j)$ entry contains the edge features of node $i$ in $A$ and node $j$ in $B$.

With the given layer, we will produce new node feature matrices $X_A'$ and $X_B'$ and edge features $E'$ with the same dimensions. 
Please refer to the formulas in the slides and figure how to apply them in this setting.

You are asked to implement both the node and edge update steps in the class below

**NOTE 1**: Working with a bipartite graph allows us to vectorize all operations in the formulas in a straightforward manner (keep in mind that we store edge features in a matrix). Given a node in $A$, it is connected to all nodes in $B$.

**NOTE 2**: You do not need to care about batching several graphs. This implementation will only work with a single graph at a time.

## Building the entire network to predict similarities
We now build the network that generates initial node and edge features, performs neural message passing, and classifies edges in order to produce the final costs that we will use for data association.

we  implement the method that computes the initial edge features. You can can follow [1] and, given a two bounding boxes $(x_i, y_i, w_i, h_i)$ and  $(x_j, y_j, w_j, h_j)$ and timestamps $t_i$ and $t_j$, compute an initial 5-dimensional edge feature vector as:
$$ E_(i, j) = \left (\frac{2(x_j - x_i)}{h_i + h_j}, \frac{2(y_j - y_i)}{h_i + h_j}, \log{\frac{h_i}{h_j}}, \log{\frac{w_i}{w_j}}, t_j - t_i \right )$$




In [8]:
from mot.models.gnn import SimilarityNet

## Putting everything together

Finally,incorporate our ``SimilarityNet`` into our tracker. We can keep everything as in ``LongTermReIDHungarianTracker`` except for the distance computation, which is now directly obtained via a forward pass through AssignmentSimilarityNet.

In [9]:
from mot.tracker.advanced import MPNTracker

## Training and evaluating our model

We provide all boilerplate code for training our neural message passing based
tracker, as well as evaluating. 

Under the hood, we are sampling frames randomly from our training sequences, and then sampling boxes from past frames as past_tracks to generate our 
training
data. Check out `LongTrackTrainingDataset` for details.

We train the model with a weighted cross-entropy loss
to account for the class imbalance. Check out `train_one_epoch` if you're 
interested.

No need to write any code from your side here!


In [10]:
from mot.data.data_gnn import LongTrackTrainingDataset
from torch.utils.data import DataLoader
from mot.trainer import train_one_epoch

MAX_PATIENCE = 20
MAX_EPOCHS = 15
EVAL_FREQ = 1
device = torch.device("cpu")


# Define our model, and init
similarity_net = SimilarityNet(
    reid_network=None,  # Not needed since we work with precomputed features
    node_dim=32,
    edge_dim=64,
    reid_dim=512,
    edges_in_dim=6,
    num_steps=10,
).to(device)

# We only keep two sequences for validation. You can
dataset = LongTrackTrainingDataset(
    dataset="MOT16-train_wo_val2",
    db=train_db,
    root_dir=osp.join(root_dir, "data/MOT16"),
    max_past_frames=MAX_PATIENCE,
    vis_threshold=0.25,
)

data_loader = DataLoader(
    dataset,
    batch_size=8,
    collate_fn=lambda x: x,
    shuffle=True,
    num_workers=2,
    drop_last=True,
)
optimizer = torch.optim.Adam(similarity_net.parameters(), lr=0.001)
scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=5)


We only leave 2 sequences for validation in order to maximize 
the amount of training data. For your convenience, here are the
 LongTermReIDTracker results on them. Your validation IDF1 scores should show an improvement over them.
```

          IDF1   IDP   IDR  Rcll  Prcn  GT MT PT ML  FP    FN IDs   FM  MOTA  MOTP
MOT16-02 45.9% 65.1% 35.4% 52.2% 96.1%  62 12 37 13 390  8873 130  210 49.4% 0.090
MOT16-11 68.3% 75.3% 62.5% 80.2% 96.6%  75 44 24  7 266  1871 136   90 75.9% 0.083
OVERALL  54.3% 69.6% 44.5% 61.7% 96.3% 137 56 61 20 656 10744 266  300 58.4% 0.087
```



Let's start training!

Note that we have observed quite a lot of noise in validation scores among epochs and runs. This can be explained due to the small size of our training and
validation sets, that's why we perform early stopping to obtain the best performing model on validation. In addition, changing the experiment seed and/or relaunching the training might help in case you are suspecting that noise might be influencing your scores. 

In [12]:
tracker = MPNTrackerOfflineDet(
        similarity_net=similarity_net.eval(),device=device, obj_detect=None, patience=MAX_PATIENCE
    )

In [None]:
best_idf1 = 0.0
for epoch in range(1, MAX_EPOCHS + 1):
    print(f"-------- EPOCH {epoch:2d} --------")
    train_one_epoch(
        model=similarity_net, data_loader=data_loader, optimizer=optimizer, print_freq=50
    )
    scheduler.step()

    if epoch % EVAL_FREQ == 0:
        tracker = MPNTracker(
            similarity_net=similarity_net.eval(), obj_detect=None, patience=MAX_PATIENCE
        )
        val_sequences = MOT16Sequences(
            "MOT16-val2", osp.join(root_dir, "data/MOT16"), vis_threshold=0.0
        )
        res = run_tracker(val_sequences, db=train_db, tracker=tracker, output_dir=None)
        idf1 = res.loc["OVERALL"]["idf1"]
        if idf1 > best_idf1:
            best_idf1 = idf1
            torch.save(
                similarity_net.state_dict(), osp.join(root_dir, "output", "best_ckpt.pth")
            )


            IDF1   IDP   IDR  Rcll  Prcn  GT MT PT ML  FP    FN IDs   FM  MOTA  MOTP IDt IDa IDm
    MOT16-02 48.5% 68.8% 37.4% 52.2% 96.1%  62 11 38 13 390  8873  98  222 49.6% 0.095  64  44  14
    MOT16-11 70.3% 77.5% 64.3% 80.2% 96.6%  75 44 24  7 266  1871  36   90 77.0% 0.083  33  13  15
    OVERALL  56.7% 72.6% 46.5% 61.7% 96.3% 137 55 62 20 656 10744 134  312 58.8% 0.090  97  57  29

# Infer tracker 

In [13]:
output_dir=None
db  = train_db
#####
time_total = 0
mot_accums = []
results_seq = {}
for seq in val_sequences:
    # break
    tracker.reset()
    now = time.time()

    print(f"Tracking: {seq}")

    # data_loader = DataLoader(seq, batch_size=1, shuffle=False)
    with torch.no_grad():
        # for i, frame in enumerate(tqdm(data_loader)):
        for frame in db[str(seq)]:
            tracker.step(frame)

    results = tracker.get_results()
    results_seq[str(seq)] = results

    if seq.no_gt:
        print("No GT evaluation data available.")
    else:
        mot_accums.append(get_mot_accum(results, seq))

    time_total += time.time() - now

    print(f"Tracks found: {len(results)}")
    print(f"Runtime for {seq}: {time.time() - now:.1f} s.")

    if output_dir is not None:
        os.makedirs(output_dir, exist_ok=True)
        seq.write_results(results, os.path.join(output_dir))

print(f"Runtime for all sequences: {time_total:.1f} s.")
if mot_accums:
    evaluate_mot_accums(
        mot_accums,
        [str(s) for s in val_sequences if not s.no_gt],
        generate_overall=True,
    )


Tracking: MOT16-02
Tracks found: 24
Runtime for MOT16-02: 3.4 s.
Tracking: MOT16-05
Tracks found: 12
Runtime for MOT16-05: 3.2 s.
Tracking: MOT16-09
Tracks found: 10
Runtime for MOT16-09: 2.0 s.
Tracking: MOT16-11
Tracks found: 19
Runtime for MOT16-11: 3.9 s.
Runtime for all sequences: 12.5 s.
         IDF1   IDP  IDR  Rcll  Prcn  GT  MT  PT ML   FP    FN   IDs   FM  MOTA  MOTP   IDt IDa IDm
MOT16-02 3.7%  5.3% 2.9% 52.2% 96.1%  62  12  37 13  390  8873  9494  340 -0.9% 0.088  9482   9  39
MOT16-05 4.5%  5.3% 3.9% 68.8% 94.0% 133  54  67 12  305  2156  4506  169 -0.7% 0.141  4536   6 117
MOT16-09 9.1% 11.3% 7.6% 66.3% 97.7%  26  13  12  1   83  1794  3240  146  3.9% 0.083  3226   6  22
MOT16-11 5.3%  5.8% 4.8% 80.2% 96.6%  75  44  24  7  266  1871  7388   99 -0.9% 0.083  7358   4  57
OVERALL  5.0%  6.3% 4.1% 63.5% 96.1% 296 123 140 33 1044 14694 24628  754 -0.3% 0.096 24602  25 235


## Visualize

In [None]:
plot_sequence(
    results_seq["MOT16-02"],
    [s for s in val_sequences if str(s) == "MOT16-02"][0],
    first_n_frames=2,
)


# Test data


In [None]:
best_ckpt = torch.load(osp.join(root_dir, "output", "best_ckpt.pth"))
similarity_net.load_state_dict(best_ckpt)

tracker = MPNTrackerOfflineDet(
    similarity_net=similarity_net.eval(), device=device, obj_detect=None, patience=MAX_PATIENCE
)
test_db = torch.load(
    osp.join(root_dir, "data/preprocessed_data/preprocessed_data_test_2.pth")
)
val_sequences = MOT16Sequences(
    "MOT16-test", osp.join(root_dir, "data/MOT16"), vis_threshold=0.0
)
run_tracker(val_sequences, db=test_db, tracker=tracker, output_dir=None)
