# Computer Vision III: Detection, Segmentation and Tracking (CV3DST) GNN 

We will implement a Message Passing Network from scratch, and we will use to build a model that will learn to combine position information and reid features to directly predict associations between past tracks and detections. We will use this model to create robust tracker. 
- Implement a Message Passing Network from scratch to operate on bipartite graphs
- Implement the pairwise feature  computation to obtain features for our Message Passing Network
- Train the Message Passing Network and improve your tracker's IDF1 score

#### Install and import Python libraries

In [1]:
%load_ext autoreload
%autoreload 2
%matplotlib inline


In [108]:
import os
import sys

import matplotlib.pyplot as plt
import numpy as np
import time
import copy

from tqdm.autonotebook import tqdm

import torch
from torch import nn
from torch.utils.data import DataLoader
from torch.nn import functional as F

from tracker.data_track import MOT16Sequences
from tracker.tracker import Tracker, ReIDTracker
from tracker.predef_tracker import LongTermReIDHungarianPredefTracker
from tracker.utils import run_tracker, cosine_distance
from scipy.optimize import linear_sum_assignment as linear_assignment
import os.path as osp

import motmetrics as mm
mm.lap.default_solver = 'lap'

In [None]:

root_dir = ".."
sys.path.append(os.path.join(root_dir, 'src'))


## Speed-Ups
In order to speed up training and inference runtimes, in this exercise we will be working with pre-computed detections and ReID embeddings. We ran the object detector we provided in Exercise 0 and applied to all frames. We also computed reid embeddings for all boxes in every frame of the dataset so that they don't need to be computed every time you run your tracker. This yields over 10x speed improvements. You will not have to work directly with the resulting files, as we have internally adapted the boilerplate code to work with them.

In [4]:
#gnn_root_dir
#root_dir = '..'

In [5]:
train_db = torch.load(osp.join(root_dir, 'data/preprocessed_data/preprocessed_data_train_2.pth'))

In [6]:
_UNMATCHED_COST = 255


In [7]:
val_sequences = MOT16Sequences('MOT16-reid', root_dir = osp.join(root_dir, 'data/MOT16'), vis_threshold=0.)

## Building a tracker based on Neural Message Passing

Our ``LongTermReIDHungarianTracker`` is still limited when compared to current modern trackers. 

Firstly, it relies solely on appearance to predict similarity scores between objectes. This can be problematic whenever appearance alone may not discriminative, and it'd be best to also take into account object position and size attributes. Secondly, our tracker can only account for pairwise similarities among objects. Ideally, we would like it to also consider higher-order information.

To address these limitations. We will now build a tracker that will combine both apperance and position information with a Message Passing Neural Network, inspired by the approach presented in [Learning a Neural Solver for Multiple Object Tracking, CVPR 2020](https://arxiv.org/abs/1912.07515)

The overall idea will be to build, for every tracking step, a bipartite graph containing two sets of nodes: past tracks, and detections in the current frame. We will initialize node features with ReID embeddings, and edge features with relative position features and ReID distance. We will use an MPN to refine these edge embeddings. The learning task will be to classify the edge embeddings in this graph, which is equivalent to predicting the entries of our data association similarity matrix.


### Building an MPN for Bipartite Graphs

We will first build a Neural Message Passing layer based on the Graph Networks framework introduced in [Relational inductive biases, deep learning, and graph networks, arXiv 2020](https://arxiv.org/abs/1806.01261), as explained in the *A More General Framework*

We will be using a bipartite graph, i.e., we will have two sets of nodes $A$ (past tracks), and $B$ (detections), and our set of edges will be $A\times B$. That is, we will connect every pair of past tracks and detections.

We will have initial node features (i.e. reid embeddings) matrices: $X_A$ and $X_B$ and an initial edge features tensor $E$.

$X_A$ and $X_B$ have shape $|A|\times \text{node\_dim}$ and $|B|\times \text{node\_dim}$, respectively.

$E$ has shape $|A| \times |B| \times \text{edge\_dim}$. Its $(i, j)$ entry contains the edge features of node $i$ in $A$ and node $j$ in $B$.

With the given layer, we will produce new node feature matrices $X_A'$ and $X_B'$ and edge features $E'$ with the same dimensions. 
Please refer to the formulas in the slides and figure how to apply them in this setting.

You are asked to implement both the node and edge update steps in the class below

**NOTE 1**: Working with a bipartite graph allows us to vectorize all operations in the formulas in a straightforward manner (keep in mind that we store edge features in a matrix). Given a node in $A$, it is connected to all nodes in $B$.

**NOTE 2**: You do not need to care about batching several graphs. This implementation will only work with a single graph at a time.

In [189]:

class BipartiteNeuralMessagePassingLayer(nn.Module):    
    def __init__(self, node_dim, edge_dim, dropout=0.):
        super().__init__()

        edge_in_dim  = 2*node_dim + 2*edge_dim # 2*edge_dim since we always concatenate initial edge features
        self.edge_mlp = nn.Sequential(*[nn.Linear(edge_in_dim, edge_dim), nn.ReLU(), nn.Dropout(dropout), 
                                    nn.Linear(edge_dim, edge_dim), nn.ReLU(), nn.Dropout(dropout)])

        node_in_dim  = node_dim + edge_dim
        self.node_mlp = nn.Sequential(*[nn.Linear(node_in_dim, node_dim), nn.ReLU(), nn.Dropout(dropout),  
                                        nn.Linear(node_dim, node_dim), nn.ReLU(), nn.Dropout(dropout)])

    def edge_update(self, edge_embeds, nodes_a_embeds, nodes_b_embeds):
        """
        Node-to-edge updates, as descibed in slide 71, lecture 5.
        Args:
            edge_embeds: torch.Tensor with shape (|A|, |B|, 2 x edge_dim) 
            nodes_a_embeds: torch.Tensor with shape (|A|, node_dim)
            nodes_a_embeds: torch.Tensor with shape (|B|, node_dim)
            
        returns:
            updated_edge_feats = torch.Tensor with shape (|A|, |B|, edge_dim) 
        """
        n_nodes_a, n_nodes_b, _  = edge_embeds.shape
        nodes_a_in = nodes_a_embeds.unsqueeze(1).expand((n_nodes_a, n_nodes_b, -1))
        nodes_b_in = nodes_b_embeds.unsqueeze(0).expand((n_nodes_a, n_nodes_b, -1))

        # edge_in has shape (|A|, |B|, 2*node_dim + 2*edge_dim) 
        edge_in = torch.cat((nodes_a_in, edge_embeds, nodes_b_in), dim=-1) 
        return self.edge_mlp(edge_in)

    def node_update(self, edge_embeds, nodes_a_embeds, nodes_b_embeds):
        """
        Edge-to-node updates, as descibed in slide 75, lecture 5.

        Args:
            edge_embeds: torch.Tensor with shape (|A|, |B|, edge_dim) 
            nodes_a_embeds: torch.Tensor with shape (|A|, node_dim)
            nodes_b_embeds: torch.Tensor with shape (|B|, node_dim)
            
        returns:
            tuple(
                updated_nodes_a_embeds: torch.Tensor with shape (|A|, node_dim),
                updated_nodes_b_embeds: torch.Tensor with shape (|B|, node_dim)
                )
        """

        # Use 'sum' as aggregation function
        # aggreagete information about all connections of node A
        # in each row - sum over edge embeddings with neighborn
        nodes_a_neigh_embeds = torch.sum(edge_embeds, axis=1) # shape (|A|, |B|, edge_dim) sum over B
        nodes_b_neigh_embeds = torch.sum(edge_embeds, axis=0) # shape (|A|, |B|, edge_dim) sum over A
        nodes_a_in = torch.cat((nodes_a_embeds, nodes_a_neigh_embeds),dim=-1)  # Has shape (|A|, node_dim + edge_dim) 
        nodes_b_in = torch.cat((nodes_b_embeds, nodes_b_neigh_embeds,),dim=-1) # Has shape (|B|, node_dim + edge_dim) 


        nodes_a = self.node_mlp(nodes_a_in)
        nodes_b = self.node_mlp(nodes_b_in)

        return nodes_a, nodes_b

    def forward(self, edge_embeds, nodes_a_embeds, nodes_b_embeds):
        edge_embeds_latent = self.edge_update(edge_embeds, nodes_a_embeds, nodes_b_embeds)
        nodes_a_latent, nodes_b_latent = self.node_update(edge_embeds_latent, nodes_a_embeds, nodes_b_embeds)

        return edge_embeds_latent, nodes_a_latent, nodes_b_latent

## Building the entire network to predict similarities
We now build the network that generates initial node and edge features, performs neural message passing, and classifies edges in order to produce the final costs that we will use for data association.

we  implement the method that computes the initial edge features. You can can follow [1] and, given a two bounding boxes $(x_i, y_i, w_i, h_i)$ and  $(x_j, y_j, w_j, h_j)$ and timestamps $t_i$ and $t_j$, compute an initial 5-dimensional edge feature vector as:
$$ E_(i, j) = \left (\frac{2(x_j - x_i)}{h_i + h_j}, \frac{2(y_j - y_i)}{h_i + h_j}, \log{\frac{h_i}{h_j}}, \log{\frac{w_i}{w_j}}, t_j - t_i \right )$$




In [190]:

def ltrb_to_xcycwh(ltrb_boxes):
    xcycwh = copy.deepcopy(ltrb_boxes)
    xcycwh[:, 0] = (ltrb_boxes[:, 2] + ltrb_boxes[:, 0])/2 # x_ceter = (rx+lx)/2
    xcycwh[:, 1] = (ltrb_boxes[:, 3] + ltrb_boxes[:, 1])/2 # 
    xcycwh[:, 2] = ltrb_boxes[:, 2] - ltrb_boxes[:, 0]
    xcycwh[:, 3] = ltrb_boxes[:, 3] - ltrb_boxes[:, 1]
    return xcycwh

In [407]:
class AssignmentSimilarityNet(nn.Module):
    def __init__(self, reid_network, node_dim, edge_dim, reid_dim, edges_in_dim, num_steps, dropout=0.):
        super().__init__()
        self.reid_network = reid_network
        self.graph_net = BipartiteNeuralMessagePassingLayer(node_dim=node_dim, edge_dim=edge_dim, dropout=dropout)
        self.num_steps = num_steps
        self.cnn_linear = nn.Linear(reid_dim, node_dim)
        self.edge_in_mlp = nn.Sequential(*[
            nn.Linear(edges_in_dim, edge_dim),
            nn.ReLU(),
            nn.Dropout(dropout),
            nn.Linear(edge_dim, edge_dim),
            nn.ReLU(),
            nn.Dropout(dropout)
        ])
        self.classifier = nn.Sequential(*[nn.Linear(edge_dim, edge_dim), nn.ReLU(), nn.Linear(edge_dim, 1)])
        
    
    def compute_edge_feats(self, track_coords, current_coords, track_t, curr_t):    
        """
        Computes initial edge feature tensor

        Args:
            track_coords: track's frame box coordinates, given by top-left and bottom-right coordinates
                          torch.Tensor with shape (num_tracks, 4)
            current_coords: current frame box coordinates, given by top-left and bottom-right coordinates
                            has shape (num_boxes, 4)
                          
            track_t: track's timestamps, torch.Tensor with with shape (num_tracks, )
            curr_t: current frame's timestamps, torch.Tensor withwith shape (num_boxes,)        
            
        
        Returns:
            tensor with shape (num_trakcs, num_boxes, 5) containing pairwise
            position and time difference features 
        """

        num_boxes = current_coords.shape[0]
        num_tracks = track_coords.shape[0]
        
        track_coords = ltrb_to_xcycwh(track_coords)
        current_coords = ltrb_to_xcycwh(current_coords)
       
        track_coords = track_coords.unsqueeze_(1).expand(num_tracks,num_boxes, 4)
        current_coords = current_coords.unsqueeze_(0).expand(num_tracks,num_boxes, 4)

        dist_y = track_coords[...,1] - current_coords[...,1]
        dist_x = track_coords[...,0] - current_coords[...,0]
        denom = (track_coords[...,2] + current_coords[...,2])/2
        dist_x = dist_x / denom
        dist_y = dist_y / denom
        
        dist_w = torch.log(current_coords[...,2] / track_coords[...,2])
        dist_h = torch.log(current_coords[...,3]  / track_coords[...,3])
        
        curr_t = curr_t.unsqueeze(0)#.expand(num_tracks,num_boxes)
        track_t = track_t.unsqueeze(1)#.expand(num_tracks, num_boxes) 
        dist_t = (curr_t - track_t).type(dist_h.dtype)
        
        edge_feats = torch.stack([dist_x,dist_y, dist_w, dist_h, dist_t],dim=-1)
        
        return edge_feats # has shape (num_trakcs, num_boxes, 5)


    def forward(self, track_app, current_app, track_coords, current_coords, track_t, curr_t):
        """
        Args:
            track_app: track's reid embeddings, torch.Tensor with shape (num_tracks, 512)
            current_app: current frame detections' reid embeddings, torch.Tensor with shape (num_boxes, 512)
            track_coords: track's frame box coordinates, given by top-left and bottom-right coordinates
                          torch.Tensor with shape (num_tracks, 4)
            current_coords: current frame box coordinates, given by top-left and bottom-right coordinates
                            has shape (num_boxes, 4)
                          
            track_t: track's timestamps, torch.Tensor with with shape (num_tracks, )
            curr_t: current frame's timestamps, torch.Tensor withwith shape (num_boxes,)
            
        Returns:
            classified edges: torch.Tensor with shape (num_steps, num_tracks, num_boxes),
                             containing at entry (step, i, j) the unnormalized probability that track i and 
                             detection j are a match, according to the classifier at the given neural message passing step
        """
        
        # Get initial edge embeddings to
        dist_reid = cosine_distance(track_app, current_app)
        pos_edge_feats = self.compute_edge_feats(track_coords, current_coords, track_t, curr_t)
        edge_feats = torch.cat((pos_edge_feats, dist_reid.unsqueeze(-1)), dim=-1)
        edge_embeds = self.edge_in_mlp(edge_feats)
        initial_edge_embeds = edge_embeds.clone()

        # Get initial node embeddings, reduce dimensionality from 512 to node_dim
        track_embeds = F.relu(self.cnn_linear(track_app))
        curr_embeds =F.relu(self.cnn_linear(current_app))

        classified_edges = []
        for _ in range(self.num_steps):
            edge_embeds = torch.cat((edge_embeds, initial_edge_embeds), dim=-1)            
            edge_embeds, track_embeds, curr_embeds = self.graph_net(edge_embeds=edge_embeds, 
                                                                    nodes_a_embeds=track_embeds, 
                                                                    nodes_b_embeds=curr_embeds)

            classified_edges.append(self.classifier(edge_embeds))

        return torch.stack(classified_edges).squeeze(-1)

## Putting everything together

Finally,incorporate our ``AssignmentSimilarityNet`` into our tracker. We can keep everything as in ``LongTermReIDHungarianTracker`` except for the distance computation, which is now directly obtained via a forward pass through AssignmentSimilarityNet.

In [408]:
from tracker.predef_tracker import LongTermReIDHungarianPredefTracker

In [409]:
_UNMATCHED_COST=255
class MPNTracker(LongTermReIDHungarianPredefTracker):
    def __init__(self, assign_net, *args, **kwargs):
        self.assign_net = assign_net
        super().__init__(*args, **kwargs)
        
    def data_association(self, boxes, scores, pred_features):  
        if self.tracks:  
            track_boxes = torch.stack([t.box for t in self.tracks], axis=0).cuda()
            track_features = torch.stack([t.get_feature() for t in self.tracks], axis=0).cuda()
            
            # Hacky way to recover the timestamps of boxes and tracks
            curr_t = self.im_index * torch.ones((pred_features.shape[0],)).cuda()
            track_t = torch.as_tensor([self.im_index - t.inactive - 1 for t in self.tracks]).cuda()
            
            # Do a forward pass through self.assign_net to obtain our costs.
            edges_raw_logits = self.assign_net(
                track_features.cuda(),
                pred_features.cuda(),
                track_boxes.cuda(),
                boxes.cuda(),
                track_t,
                curr_t
            )
            # Note: self.assign_net will return unnormalized probabilities. 
            # apply the sigmoid function to them!
            pred_sim = torch.sigmoid(edges_raw_logits).detach().cpu().numpy()
            pred_sim = pred_sim[-1]  # Use predictions at last message passing step
            distance = (1- pred_sim) 
            
            # Do not allow mataches when sim < 0.5, to avoid low-confident associations
            distance = np.where(pred_sim < 0.5, _UNMATCHED_COST, distance) 

            # Perform Hungarian matching.
            row_idx, col_idx = linear_assignment(distance)            
            self.update_tracks(row_idx, col_idx,distance, boxes, scores, pred_features)

            
        else:
            # No tracks exist.
            self.add(boxes, scores, pred_features)

## Training and evaluating our model

We provide all boilerplate code for training our neural message passing based
tracker, as well as evaluating. 

Under the hood, we are sampling frames randomly from our training sequences, and then sampling boxes from past frames as past_tracks to generate our 
training
data. Check out `LongTrackTrainingDataset` for details.

We train the model with a weighted cross-entropy loss
to account for the class imbalance. Check out `train_one_epoch` if you're 
interested.

No need to write any code from your side here!


In [410]:
from gnn.dataset import LongTrackTrainingDataset
from torch.utils.data import DataLoader
from gnn.trainer import train_one_epoch

MAX_PATIENCE = 20
MAX_EPOCHS = 15
EVAL_FREQ = 1


# Define our model, and init 
assign_net = AssignmentSimilarityNet(reid_network=None, # Not needed since we work with precomputed features
                                     node_dim=32, 
                                     edge_dim=64, 
                                     reid_dim=512, 
                                     edges_in_dim=6, 
                                     num_steps=10).cuda()

# We only keep two sequences for validation. You can
dataset = LongTrackTrainingDataset(dataset='MOT16-train_wo_val2', 
                                   db=train_db, 
                                   root_dir= osp.join(root_dir, 'data/MOT16'),
                                   max_past_frames = MAX_PATIENCE,
                                   vis_threshold=0.25)

data_loader = DataLoader(dataset, batch_size=8, collate_fn = lambda x: x, 
                         shuffle=True, num_workers=2, drop_last=True)
device = torch.device('cuda')
optimizer = torch.optim.Adam(assign_net.parameters(), lr=0.001)
scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=5)

We only leave 2 sequences for validation in order to maximize 
the amount of training data. For your convenience, here are the
 LongTermReIDTracker results on them. Your validation IDF1 scores should show an improvement over them.
```

          IDF1   IDP   IDR  Rcll  Prcn  GT MT PT ML  FP    FN IDs   FM  MOTA  MOTP
MOT16-02 45.9% 65.1% 35.4% 52.2% 96.1%  62 12 37 13 390  8873 130  210 49.4% 0.090
MOT16-11 68.3% 75.3% 62.5% 80.2% 96.6%  75 44 24  7 266  1871 136   90 75.9% 0.083
OVERALL  54.3% 69.6% 44.5% 61.7% 96.3% 137 56 61 20 656 10744 266  300 58.4% 0.087
```



Let's start training!

Note that we have observed quite a lot of noise in validation scores among epochs and runs. This can be explained due to the small size of our training and
validation sets, that's why we perform early stopping to obtain the best performing model on validation. In addition, changing the experiment seed and/or relaunching the training might help in case you are suspecting that noise might be influencing your scores. 

In [None]:
best_idf1 = 0.
for epoch in range(1, MAX_EPOCHS + 1):
    print(f"-------- EPOCH {epoch:2d} --------")
    train_one_epoch(model = assign_net, data_loader=data_loader, optimizer=optimizer, print_freq=50)
    scheduler.step()

    if epoch % EVAL_FREQ == 0:
        tracker =  MPNTracker(assign_net=assign_net.eval(), obj_detect=None, patience=MAX_PATIENCE)
        val_sequences = MOT16Sequences('MOT16-val2', osp.join(root_dir, 'data/MOT16'), vis_threshold=0.)
        res = run_tracker(val_sequences, db=train_db, tracker=tracker, output_dir=None)
        idf1 = res.loc['OVERALL']['idf1']
        if idf1 > best_idf1:
            best_idf1 = idf1
            torch.save(assign_net.state_dict(), osp.join(root_dir, 'output', 'best_ckpt.pth'))
        

-------- EPOCH  1 --------


20it [00:05,  3.33it/s]

Iter 20. Loss: 0.724. Accuracy: 0.910. Recall: 0.840. Precision: 0.484


40it [00:11,  3.40it/s]

Iter 40. Loss: 0.306. Accuracy: 0.970. Recall: 0.969. Precision: 0.695


60it [00:17,  3.60it/s]

Iter 60. Loss: 0.150. Accuracy: 0.980. Recall: 0.986. Precision: 0.792


80it [00:23,  3.51it/s]

Iter 80. Loss: 0.097. Accuracy: 0.990. Recall: 0.994. Precision: 0.884


100it [00:28,  3.44it/s]

Iter 100. Loss: 0.157. Accuracy: 0.981. Recall: 0.976. Precision: 0.799


120it [00:34,  3.35it/s]

Iter 120. Loss: 0.103. Accuracy: 0.987. Recall: 0.991. Precision: 0.838


140it [00:40,  3.31it/s]

Iter 140. Loss: 0.089. Accuracy: 0.990. Recall: 0.989. Precision: 0.893


160it [00:46,  3.26it/s]

Iter 160. Loss: 0.104. Accuracy: 0.989. Recall: 0.985. Precision: 0.881


180it [00:52,  3.66it/s]

Iter 180. Loss: 0.075. Accuracy: 0.991. Recall: 0.995. Precision: 0.875


200it [00:58,  3.31it/s]

Iter 200. Loss: 0.079. Accuracy: 0.988. Recall: 0.991. Precision: 0.878


220it [01:04,  3.75it/s]

Iter 220. Loss: 0.065. Accuracy: 0.993. Recall: 0.993. Precision: 0.906


240it [01:10,  3.40it/s]

Iter 240. Loss: 0.060. Accuracy: 0.993. Recall: 0.994. Precision: 0.917


260it [01:15,  4.60it/s]

Iter 260. Loss: 0.048. Accuracy: 0.996. Recall: 0.993. Precision: 0.956


280it [01:21,  3.46it/s]

Iter 280. Loss: 0.037. Accuracy: 0.997. Recall: 0.995. Precision: 0.970


300it [01:27,  3.76it/s]

Iter 300. Loss: 0.034. Accuracy: 0.998. Recall: 0.996. Precision: 0.982


320it [01:32,  3.65it/s]

Iter 320. Loss: 0.028. Accuracy: 0.998. Recall: 0.995. Precision: 0.983


340it [01:38,  3.35it/s]

Iter 340. Loss: 0.017. Accuracy: 0.999. Recall: 0.999. Precision: 0.990


360it [01:44,  3.53it/s]

Iter 360. Loss: 0.019. Accuracy: 0.999. Recall: 0.997. Precision: 0.987


380it [01:50,  3.37it/s]

Iter 380. Loss: 0.023. Accuracy: 0.999. Recall: 0.998. Precision: 0.991


401it [01:56,  4.06it/s]

Iter 400. Loss: 0.017. Accuracy: 0.999. Recall: 0.999. Precision: 0.985


420it [02:02,  3.29it/s]

Iter 420. Loss: 0.035. Accuracy: 0.999. Recall: 0.995. Precision: 0.986


440it [02:08,  3.34it/s]

Iter 440. Loss: 0.020. Accuracy: 0.998. Recall: 0.997. Precision: 0.979


460it [02:14,  3.30it/s]

Iter 460. Loss: 0.048. Accuracy: 0.998. Recall: 0.992. Precision: 0.982


476it [02:19,  3.42it/s]


Tracking: MOT16-02
Tracks found: 84
Runtime for MOT16-02: 4.6 s.
Tracking: MOT16-11
Tracks found: 75
Runtime for MOT16-11: 5.2 s.
Runtime for all sequences: 9.8 s.
          IDF1   IDP   IDR  Rcll  Prcn  GT MT PT ML  FP    FN IDs   FM  MOTA  MOTP IDt IDa IDm
MOT16-02 43.1% 61.2% 33.2% 52.2% 96.1%  62 11 38 13 390  8873 107  220 49.6% 0.095  74  39  14
MOT16-11 63.2% 69.7% 57.8% 80.2% 96.6%  75 44 24  7 266  1871  39   90 76.9% 0.083  41  10  18
OVERALL  50.6% 64.9% 41.5% 61.7% 96.3% 137 55 62 20 656 10744 146  310 58.8% 0.090 115  49  32
-------- EPOCH  2 --------


20it [00:06,  3.24it/s]

Iter 20. Loss: 0.041. Accuracy: 0.999. Recall: 0.992. Precision: 0.990


40it [00:11,  3.72it/s]

Iter 40. Loss: 0.042. Accuracy: 0.997. Recall: 0.993. Precision: 0.979


60it [00:17,  3.30it/s]

Iter 60. Loss: 0.015. Accuracy: 0.999. Recall: 0.999. Precision: 0.989


80it [00:23,  3.16it/s]

Iter 80. Loss: 0.014. Accuracy: 0.999. Recall: 0.999. Precision: 0.992


100it [00:29,  3.33it/s]

Iter 100. Loss: 0.018. Accuracy: 0.998. Recall: 0.998. Precision: 0.986


121it [00:35,  6.08it/s]

Iter 120. Loss: 0.009. Accuracy: 1.000. Recall: 0.998. Precision: 0.995


140it [00:40,  3.33it/s]

Iter 140. Loss: 0.025. Accuracy: 0.998. Recall: 0.998. Precision: 0.978


160it [00:46,  3.24it/s]

Iter 160. Loss: 0.020. Accuracy: 0.998. Recall: 0.996. Precision: 0.979


180it [00:52,  3.31it/s]

Iter 180. Loss: 0.010. Accuracy: 0.999. Recall: 0.998. Precision: 0.991


200it [00:58,  3.40it/s]

Iter 200. Loss: 0.013. Accuracy: 0.999. Recall: 0.999. Precision: 0.987


220it [01:04,  3.27it/s]

Iter 220. Loss: 0.024. Accuracy: 0.999. Recall: 0.997. Precision: 0.989


240it [01:10,  3.28it/s]

Iter 240. Loss: 0.015. Accuracy: 0.999. Recall: 0.996. Precision: 0.995


260it [01:16,  3.34it/s]

Iter 260. Loss: 0.017. Accuracy: 0.999. Recall: 0.998. Precision: 0.990


280it [01:22,  3.35it/s]

Iter 280. Loss: 0.016. Accuracy: 0.999. Recall: 0.998. Precision: 0.992


300it [01:28,  3.33it/s]

Iter 300. Loss: 0.016. Accuracy: 1.000. Recall: 0.999. Precision: 0.996


320it [01:34,  3.41it/s]

Iter 320. Loss: 0.014. Accuracy: 0.999. Recall: 0.999. Precision: 0.989


340it [01:40,  3.31it/s]

Iter 340. Loss: 0.017. Accuracy: 0.999. Recall: 0.998. Precision: 0.988


360it [01:46,  3.24it/s]

Iter 360. Loss: 0.016. Accuracy: 0.999. Recall: 0.998. Precision: 0.985


380it [01:52,  3.20it/s]

Iter 380. Loss: 0.013. Accuracy: 0.999. Recall: 0.998. Precision: 0.992


400it [01:58,  3.43it/s]

Iter 400. Loss: 0.014. Accuracy: 0.999. Recall: 0.999. Precision: 0.992


420it [02:04,  3.40it/s]

Iter 420. Loss: 0.010. Accuracy: 0.999. Recall: 0.998. Precision: 0.992


440it [02:10,  3.46it/s]

Iter 440. Loss: 0.011. Accuracy: 1.000. Recall: 0.999. Precision: 0.995


460it [02:16,  3.33it/s]

Iter 460. Loss: 0.030. Accuracy: 0.995. Recall: 0.997. Precision: 0.963


476it [02:20,  3.38it/s]


Tracking: MOT16-02
Tracks found: 128
Runtime for MOT16-02: 3.9 s.
Tracking: MOT16-11
Tracks found: 94
Runtime for MOT16-11: 5.4 s.
Runtime for all sequences: 9.3 s.
          IDF1   IDP   IDR  Rcll  Prcn  GT MT PT ML  FP    FN IDs   FM  MOTA  MOTP IDt IDa IDm
MOT16-02 47.0% 66.8% 36.3% 52.2% 96.1%  62 11 38 13 390  8873 110  216 49.6% 0.095  48  68   9
MOT16-11 71.8% 79.1% 65.7% 80.2% 96.6%  75 44 24  7 266  1871  38   90 76.9% 0.083  24  18   9
OVERALL  56.3% 72.2% 46.2% 61.7% 96.3% 137 55 62 20 656 10744 148  306 58.8% 0.089  72  86  18
-------- EPOCH  3 --------


20it [00:05,  3.62it/s]

Iter 20. Loss: 0.020. Accuracy: 0.999. Recall: 0.995. Precision: 0.988


40it [00:11,  3.40it/s]

Iter 40. Loss: 0.018. Accuracy: 0.997. Recall: 0.999. Precision: 0.972


60it [00:17,  3.42it/s]

Iter 60. Loss: 0.013. Accuracy: 0.999. Recall: 0.997. Precision: 0.987


80it [00:23,  3.30it/s]

Iter 80. Loss: 0.015. Accuracy: 0.999. Recall: 0.998. Precision: 0.984


101it [00:28,  6.09it/s]

Iter 100. Loss: 0.009. Accuracy: 0.999. Recall: 0.999. Precision: 0.989


120it [00:33,  3.34it/s]

Iter 120. Loss: 0.007. Accuracy: 0.999. Recall: 0.999. Precision: 0.994


140it [00:39,  3.32it/s]

Iter 140. Loss: 0.022. Accuracy: 0.998. Recall: 0.995. Precision: 0.981


160it [00:45,  3.77it/s]

Iter 160. Loss: 0.011. Accuracy: 0.999. Recall: 0.999. Precision: 0.987


180it [00:51,  3.49it/s]

Iter 180. Loss: 0.012. Accuracy: 0.999. Recall: 0.997. Precision: 0.989


200it [00:57,  3.35it/s]

Iter 200. Loss: 0.010. Accuracy: 0.999. Recall: 1.000. Precision: 0.989


220it [01:03,  3.30it/s]

Iter 220. Loss: 0.015. Accuracy: 0.998. Recall: 0.998. Precision: 0.985


240it [01:08,  3.70it/s]

Iter 240. Loss: 0.011. Accuracy: 0.999. Recall: 0.999. Precision: 0.991


260it [01:14,  3.56it/s]

Iter 260. Loss: 0.010. Accuracy: 0.999. Recall: 0.999. Precision: 0.989


280it [01:20,  3.61it/s]

Iter 280. Loss: 0.011. Accuracy: 0.999. Recall: 1.000. Precision: 0.987


301it [01:25,  6.86it/s]

Iter 300. Loss: 0.012. Accuracy: 0.999. Recall: 1.000. Precision: 0.986


320it [01:30,  3.37it/s]

Iter 320. Loss: 0.008. Accuracy: 0.999. Recall: 1.000. Precision: 0.987


340it [01:36,  3.44it/s]

Iter 340. Loss: 0.014. Accuracy: 0.999. Recall: 0.999. Precision: 0.986


360it [01:42,  3.30it/s]

Iter 360. Loss: 0.017. Accuracy: 0.998. Recall: 0.998. Precision: 0.978


380it [01:47,  4.89it/s]

Iter 380. Loss: 0.021. Accuracy: 0.998. Recall: 0.997. Precision: 0.984


400it [01:52,  3.55it/s]

Iter 400. Loss: 0.021. Accuracy: 0.998. Recall: 0.999. Precision: 0.981


420it [01:58,  3.26it/s]

Iter 420. Loss: 0.015. Accuracy: 0.999. Recall: 0.997. Precision: 0.986


440it [02:04,  3.24it/s]

Iter 440. Loss: 0.015. Accuracy: 0.999. Recall: 0.997. Precision: 0.989


460it [02:10,  3.85it/s]

Iter 460. Loss: 0.011. Accuracy: 0.999. Recall: 0.998. Precision: 0.986


476it [02:14,  3.54it/s]


Tracking: MOT16-02
Tracks found: 102
Runtime for MOT16-02: 3.9 s.
Tracking: MOT16-11
Tracks found: 82
Runtime for MOT16-11: 5.4 s.
Runtime for all sequences: 9.3 s.
          IDF1   IDP   IDR  Rcll  Prcn  GT MT PT ML  FP    FN IDs   FM  MOTA  MOTP IDt IDa IDm
MOT16-02 46.8% 66.5% 36.1% 52.2% 96.1%  62 11 38 13 390  8873  97  219 49.6% 0.096  56  47  12
MOT16-11 70.2% 77.3% 64.2% 80.2% 96.6%  75 44 24  7 266  1871  36   90 77.0% 0.083  30  15  15
OVERALL  55.6% 71.2% 45.6% 61.7% 96.3% 137 55 62 20 656 10744 133  309 58.8% 0.090  86  62  27
-------- EPOCH  4 --------


20it [00:05,  3.37it/s]

Iter 20. Loss: 0.009. Accuracy: 0.999. Recall: 1.000. Precision: 0.990


40it [00:11,  3.36it/s]

Iter 40. Loss: 0.007. Accuracy: 0.999. Recall: 1.000. Precision: 0.991


60it [00:17,  3.44it/s]

Iter 60. Loss: 0.013. Accuracy: 0.999. Recall: 0.998. Precision: 0.990


80it [00:23,  3.41it/s]

Iter 80. Loss: 0.009. Accuracy: 0.999. Recall: 1.000. Precision: 0.987


94it [00:27,  3.34it/s]

Tracks found: 76
Runtime for MOT16-11: 5.4 s.
Runtime for all sequences: 9.2 s.
          IDF1   IDP   IDR  Rcll  Prcn  GT MT PT ML  FP    FN IDs   FM  MOTA  MOTP IDt IDa IDm
MOT16-02 44.9% 63.7% 34.6% 52.2% 96.1%  62 11 38 13 390  8873  97  222 49.6% 0.096  69  38  14
MOT16-11 64.7% 71.3% 59.2% 80.2% 96.6%  75 44 24  7 266  1871  47   90 76.9% 0.083  44  15  19
OVERALL  52.3% 67.0% 42.9% 61.7% 96.3% 137 55 62 20 656 10744 144  312 58.8% 0.090 113  53  33
-------- EPOCH  5 --------


20it [00:05,  3.37it/s]

Iter 20. Loss: 0.012. Accuracy: 0.998. Recall: 0.999. Precision: 0.984


40it [00:11,  3.45it/s]

Iter 40. Loss: 0.007. Accuracy: 0.999. Recall: 1.000. Precision: 0.993


60it [00:17,  3.45it/s]

Iter 60. Loss: 0.012. Accuracy: 0.999. Recall: 0.997. Precision: 0.986


80it [00:23,  3.39it/s]

Iter 80. Loss: 0.013. Accuracy: 0.999. Recall: 0.998. Precision: 0.987


100it [00:28,  3.63it/s]

Iter 100. Loss: 0.008. Accuracy: 0.999. Recall: 1.000. Precision: 0.990


120it [00:34,  3.39it/s]

Iter 120. Loss: 0.013. Accuracy: 0.999. Recall: 0.998. Precision: 0.987


140it [00:40,  3.34it/s]

Iter 140. Loss: 0.012. Accuracy: 0.999. Recall: 0.998. Precision: 0.989


160it [00:46,  3.35it/s]

Iter 160. Loss: 0.015. Accuracy: 0.999. Recall: 0.998. Precision: 0.986


180it [00:51,  3.61it/s]

Iter 180. Loss: 0.015. Accuracy: 0.999. Recall: 0.999. Precision: 0.989


200it [00:57,  3.21it/s]

Iter 200. Loss: 0.013. Accuracy: 0.999. Recall: 0.998. Precision: 0.983


220it [01:03,  3.63it/s]

Iter 220. Loss: 0.012. Accuracy: 0.999. Recall: 1.000. Precision: 0.986


240it [01:08,  3.83it/s]

Iter 240. Loss: 0.008. Accuracy: 0.999. Recall: 0.999. Precision: 0.989


260it [01:14,  3.26it/s]

Iter 260. Loss: 0.011. Accuracy: 0.999. Recall: 0.998. Precision: 0.989


280it [01:20,  3.51it/s]

Iter 280. Loss: 0.009. Accuracy: 0.999. Recall: 1.000. Precision: 0.990


300it [01:25,  3.43it/s]

Iter 300. Loss: 0.004. Accuracy: 1.000. Recall: 1.000. Precision: 0.993


321it [01:31,  6.01it/s]

Iter 320. Loss: 0.013. Accuracy: 0.998. Recall: 0.999. Precision: 0.985


340it [01:36,  3.56it/s]

Iter 340. Loss: 0.007. Accuracy: 1.000. Recall: 0.999. Precision: 0.996


360it [01:41,  3.48it/s]

Iter 360. Loss: 0.011. Accuracy: 0.999. Recall: 0.998. Precision: 0.990


380it [01:47,  3.34it/s]

Iter 380. Loss: 0.014. Accuracy: 0.999. Recall: 0.999. Precision: 0.986


400it [01:53,  3.47it/s]

Iter 400. Loss: 0.019. Accuracy: 0.998. Recall: 0.998. Precision: 0.977


420it [01:58,  3.61it/s]

Iter 420. Loss: 0.009. Accuracy: 0.999. Recall: 0.999. Precision: 0.993


440it [02:04,  3.76it/s]

Iter 440. Loss: 0.007. Accuracy: 0.999. Recall: 1.000. Precision: 0.993


460it [02:10,  3.52it/s]

Iter 460. Loss: 0.006. Accuracy: 0.999. Recall: 1.000. Precision: 0.992


476it [02:14,  3.53it/s]


Tracking: MOT16-02
Tracks found: 89
Runtime for MOT16-02: 4.0 s.
Tracking: MOT16-11
Tracks found: 79
Runtime for MOT16-11: 5.5 s.
Runtime for all sequences: 9.5 s.
          IDF1   IDP   IDR  Rcll  Prcn  GT MT PT ML  FP    FN IDs   FM  MOTA  MOTP IDt IDa IDm
MOT16-02 48.5% 68.8% 37.4% 52.2% 96.1%  62 11 38 13 390  8873  98  222 49.6% 0.095  64  44  14
MOT16-11 70.3% 77.5% 64.3% 80.2% 96.6%  75 44 24  7 266  1871  36   90 77.0% 0.083  33  13  15
OVERALL  56.7% 72.6% 46.5% 61.7% 96.3% 137 55 62 20 656 10744 134  312 58.8% 0.090  97  57  29
-------- EPOCH  6 --------


20it [00:05,  3.51it/s]

Iter 20. Loss: 0.018. Accuracy: 0.997. Recall: 0.999. Precision: 0.953


40it [00:11,  3.72it/s]

Iter 40. Loss: 0.011. Accuracy: 0.998. Recall: 0.999. Precision: 0.982


60it [00:17,  3.36it/s]

Iter 60. Loss: 0.011. Accuracy: 0.998. Recall: 1.000. Precision: 0.978


80it [00:23,  3.35it/s]

Iter 80. Loss: 0.010. Accuracy: 0.998. Recall: 0.999. Precision: 0.982


100it [00:29,  3.58it/s]

Iter 100. Loss: 0.009. Accuracy: 0.999. Recall: 0.999. Precision: 0.990


120it [00:35,  3.47it/s]

Iter 120. Loss: 0.010. Accuracy: 0.999. Recall: 0.998. Precision: 0.988


140it [00:40,  3.39it/s]

Iter 140. Loss: 0.010. Accuracy: 0.999. Recall: 0.998. Precision: 0.983


160it [00:46,  3.34it/s]

Iter 160. Loss: 0.006. Accuracy: 1.000. Recall: 1.000. Precision: 0.995


180it [00:52,  3.30it/s]

Iter 180. Loss: 0.007. Accuracy: 0.999. Recall: 1.000. Precision: 0.991


200it [00:58,  3.35it/s]

Iter 200. Loss: 0.008. Accuracy: 0.999. Recall: 1.000. Precision: 0.990


220it [01:04,  3.41it/s]

Iter 220. Loss: 0.009. Accuracy: 0.999. Recall: 0.999. Precision: 0.988


240it [01:10,  3.38it/s]

Iter 240. Loss: 0.010. Accuracy: 0.999. Recall: 1.000. Precision: 0.990


260it [01:16,  3.53it/s]

Iter 260. Loss: 0.011. Accuracy: 0.999. Recall: 1.000. Precision: 0.992


280it [01:21,  3.70it/s]

Iter 280. Loss: 0.008. Accuracy: 0.999. Recall: 1.000. Precision: 0.988


300it [01:27,  3.62it/s]

Iter 300. Loss: 0.011. Accuracy: 0.998. Recall: 0.998. Precision: 0.983


320it [01:33,  3.41it/s]

Iter 320. Loss: 0.005. Accuracy: 1.000. Recall: 1.000. Precision: 0.994


340it [01:39,  3.27it/s]

Iter 340. Loss: 0.010. Accuracy: 0.999. Recall: 0.998. Precision: 0.986


360it [01:45,  3.39it/s]

Iter 360. Loss: 0.012. Accuracy: 0.998. Recall: 1.000. Precision: 0.984


380it [01:51,  3.40it/s]

Iter 380. Loss: 0.006. Accuracy: 1.000. Recall: 1.000. Precision: 0.993


400it [01:56,  3.35it/s]

Iter 400. Loss: 0.008. Accuracy: 0.999. Recall: 0.999. Precision: 0.989


420it [02:02,  3.49it/s]

Iter 420. Loss: 0.009. Accuracy: 0.999. Recall: 0.998. Precision: 0.993


440it [02:08,  3.52it/s]

Iter 440. Loss: 0.008. Accuracy: 0.998. Recall: 1.000. Precision: 0.985


455it [02:12,  3.28it/s]

            IDF1   IDP   IDR  Rcll  Prcn  GT MT PT ML  FP    FN IDs   FM  MOTA  MOTP IDt IDa IDm
    MOT16-02 48.5% 68.8% 37.4% 52.2% 96.1%  62 11 38 13 390  8873  98  222 49.6% 0.095  64  44  14
    MOT16-11 70.3% 77.5% 64.3% 80.2% 96.6%  75 44 24  7 266  1871  36   90 77.0% 0.083  33  13  15
    OVERALL  56.7% 72.6% 46.5% 61.7% 96.3% 137 55 62 20 656 10744 134  312 58.8% 0.090  97  57  29

# Exercise submission

After executing the followinc cell the `Colab Notebooks/cv3dst_exercise/output` directory in your Google Drive should contain multiple `MOT16-XY.txt` files.

For the final submission you have to process the test sequences and upload the zipped prediction files to our server. See moodle for a guide how to upload the results.

Note that this time, you only have to evaluate three sequences!

In [None]:
best_ckpt = torch.load(osp.join(root_dir, 'output', 'best_ckpt.pth'))
assign_net.load_state_dict(best_ckpt)

tracker =  MPNTracker(assign_net=assign_net.eval(), obj_detect=None, patience=MAX_PATIENCE)
test_db = torch.load(osp.join(root_dir, 'data/preprocessed_data/preprocessed_data_test_2.pth'))
val_sequences = MOT16Sequences('MOT16-test', osp.join(root_dir, 'data/MOT16'), vis_threshold=0.)
run_tracker(val_sequences, db=test_db, tracker=tracker, output_dir=None)