# MultipleMotionModels

This project implements a basic framework to compare different motion models on [DanceTrack](https://github.com/DanceTrack/DanceTrack).

It is based on [SORT](https://github.com/abewley/sort) for association and replaces the Kalman filter used there with various deep-learning motion models. The following motion models have been implemented so far:

- Transformer-based
- Mamba-based (work in progress)

[TrackEval](https://github.com/JonathonLuiten/TrackEval) is used for evaluation.

**Table of Contents**
```
0 Download and Preprocess Data (run once in the beginning)
1 Building Dataset
2 Building Model
3 Training Model
4 Evaluation
```

## 0 Downloading and Preprocessing Data

When the script is run for the first time, the datasets must be downloaded and preprocessed once. This is not necessary for follow-up runs.

### 0.1 Download Data

The DanceTrack dataset is downloaded from huggingface using its custom downloader.

In [None]:
import os
from huggingface_hub import hf_hub_download


# downloading both parts of the training set
if os.path.exists('./DanceTrack/train'):
    print('Data Already Downloaded!')
else:
    repo_id = 'noahcao/dancetrack'
    filename = 'train1.zip'
    save_dir = './DanceTrack'
    hf_hub_download(repo_id ,filename, repo_type='dataset',local_dir=save_dir)
    
    filename = 'train2.zip'
    hf_hub_download(repo_id ,filename, repo_type='dataset',local_dir=save_dir)

# downloading the validation set
if os.path.exists('./DanceTrack/val'):
    print('Data Already Downloaded!')
else:
    repo_id = 'noahcao/dancetrack'
    filename = 'val.zip'
    save_dir = './DanceTrack'
    hf_hub_download(repo_id ,filename, repo_type='dataset',local_dir=save_dir)

### 0.2 Unzip Data 

The downloaded data is extracted and the remaining zip files are deleted

In [None]:
import zipfile
zip_paths = ['train1.zip', 'train2.zip', 'val.zip']
tgt_paths = ['train','train','val']

for i in range(len(zip_paths)):
    with zipfile.ZipFile('./DanceTrack/' + zip_paths[i], 'r') as zip_ref:
        zip_ref.extractall('./DanceTrack/' + tgt_paths[i])
    os.remove('./DanceTrack/' + zip_paths[i])
print('Extracted Data And Removed Zip Files.')

### 0.3 Restructure Directory

The extracted data is in subfolders. The files are moved such that the directory looks as followed:

```
DanceTrack
|-- train
|   |-- dancetrack0001
|   |   |-- img1
|   |   |   |-- 00000001.jpg
|   |   |   |-- ...
|   |   |-- gt
|   |   |   |-- gt.txt            
|   |   |-- seqinfo.ini
|   |-- ...
|-- val
|   |-- dancetrack0004
|   |-- ...
```

In [None]:
from pathlib import Path
import shutil

paths = ['train/train1','train/train2', 'val/val']
for path in paths:
    src_path = Path('./DanceTrack').joinpath(path)
    tgt_path = src_path.parent

    for src_file in src_path.glob('*dancetrack*'):
        shutil.move(src_file, tgt_path)
    shutil.rmtree('./DanceTrack/' + path)
print('Restructured Directory Successfully.')

### 0.4 Download Seqmap Files

The seqmap files are necessary for the evaluation using the TrackEval repo. They can be downloaded from the DanceTrack repo.

In [None]:
from urllib.request import urlretrieve
url = 'https://raw.githubusercontent.com/DanceTrack/DanceTrack/refs/heads/main/dancetrack/'
seqmap_files = ['train_seqmap.txt', 'val_seqmap.txt']

for file in seqmap_files:
    urlretrieve(url + file, './DanceTrack/' + file)


### 0.5 Preprocess Train Data

We are changing the ground-truth format from
```
'[bb_left, bb_top, bb_width, bb_height]'
```
to 
```
'[bb_center_x, bb_center_y, bb_width, bb_height]'.
```
Moreover, the ground-truth data is split into seperate sequences for each tracked object.

In [None]:
from utils.preprocess_data import preprocess_train

seq_root = './DanceTrack'
label_root = './DanceTrack/train_seq'
preprocess_train(seq_root, label_root)

### 0.6 Download Validation Yolo-X Detections

Since this project is exclusively for the evaluation of different motion models, the [YOLO-X](https://github.com/Megvii-BaseDetection/YOLOX) detections are used. These can be downloaded from the [DiffMOT](https://github.com/Kroery/DiffMOT) repo.

In [None]:
if os.path.exists('./DanceTrack/val_dets'):
    print('Data Already Downloaded!')
else:
    # download the detections from DIffMOT
    urlretrieve('https://github.com/Kroery/DiffMOT/releases/download/v1.1/Detections.zip', './DanceTrack/Detections.zip')

    # extract the data
    with zipfile.ZipFile('./DanceTrack/Detections.zip', 'r') as zip_ref:
        zip_ref.extractall('./DanceTrack/Detections')
    os.remove('./DanceTrack/Detections.zip')

    # move dancetrack detections to new subdirectory
    src_path = Path('./DanceTrack/Detections/DanceTrack/detections_yolox_x/val')
    os.mkdir('./DanceTrack/val_dets')
    tgt_path = './DanceTrack/val_dets'
    for src_file in src_path.glob('*dancetrack*'):
        shutil.move(src_file, tgt_path)

### 0.7 Restructure Detections

The YOLO-X detections are stored in separate files for each frame. SORT expects all detections of a sequence to be in the same text file.

In [None]:
import numpy as np

phase = 'val_dets'
seq_path= './DanceTrack'
pattern = os.path.join(seq_path, phase, '*')
seq_file_names = glob.glob(pattern)
seq_file_names.sort()

# iterate over all sequences
for seq_file in seq_file_names:
    det_file_names = glob.glob(seq_file+"/*")
    det_file_names.sort()

    # create new file and write all detections in it
    with open(seq_file + '/det.txt', 'w') as outfile:
        for i,det_file in enumerate(det_file_names):
            with open(det_file) as infile:
                file_content = np.loadtxt(infile, delimiter=',')
                file_content[:,0] = i + 1 # frames are starting from 1
                np.savetxt(outfile, file_content, fmt='%s', delimiter=",")

## 1 Building Dataset

A custom dataset is required which returns a sequence of bounding boxes of length n. 

In [16]:
from utils.dataset import MotionDataset

seq_len = 10
data_dir = './DanceTrack/train_seq'
dataset_train = MotionDataset(seq_len, data_dir)

## 2 Building Model

In the future it will be possible to choose between different models. At the moment, only the transformer-encoder based model can be used. The model feeds the bounding boxes through an MLP layer to increase the dimension and then through a transformer-encoder with 6 layers. The output is then averaged-pooled over the time dimension and an MLP predicts the offset to the previous bounding box. A sine-cosine positional encoding is used to encode the different time steps.

### 2.1 Building the Model 

In [15]:
model_type = 'transformer'

# TODO: fix ssm for cpu-only use
if model_type == 'ssm':
    from models.SSM.motion_ssm import MotionSSM
    config = ''
    model = MotionSSM(config)

# transformer-encoder model
elif model_type == 'transformer':
    from models.Transformer.transformer_encoder import MotionTransformer
    model = MotionTransformer(seq_len = 10)

### 2.2 Building Auxiliary Components

Other components apart from the model that are required for training are initialised here.

In [18]:
import torch
from torch.utils.data import DataLoader, DistributedSampler


lr = 1e-4

# currently all parameters use the same learning rate
param_dicts = [
        {"params": [p for _, p in model.named_parameters()
                    if p.requires_grad],
         "lr": lr, }]

# the optimizer is adapted from the MotionTrack paper
optimizer = torch.optim.AdamW(param_dicts, 
                              lr=lr,
                              betas=(0.9, 0.98), 
                              eps=1e-08, 
                              weight_decay=0.01)

# TODO: implement lr_scheduler from 'Attention is all you need'
lr_scheduler = torch.optim.lr_scheduler.MultiStepLR(
        optimizer, [10])

data_loader_train = DataLoader(
        dataset_train,
        batch_size = 512,
        shuffle=True,
        drop_last=True)

criterion = torch.nn.SmoothL1Loss(reduction='mean', beta=1.0)

## 3 Training Model

The motion model is trained on the trajectory prediction task. It gets n previous positions as input and predicts the bounding box. Currently the model is trained on ground-truth bounding boxes. It may be more effective to either use some forms of augmentation or train on actual detections. 

In [20]:
from engine import train_one_epoch
import datetime
import os
from torch.utils.tensorboard import SummaryWriter

# start tensorboard
%load_ext tensorboard
%tensorboard --logdir=runs


writer = SummaryWriter()

# train model for n epochs and save the model every 5 epochs
for epoch in range(21):
    
    train_one_epoch(model, 
                    criterion, 
                    data_loader_train, 
                    optimizer, 
                    device='cpu', 
                    writer=writer, 
                    epoch=epoch)

    print('-----------------------')
    print('Finished epoch ' + epoch)

    # save model and optimizer
    if not epoch%5:
        torch.save({
            'epoch': epoch,
            'model_state_dict': model.state_dict(),
            'optimizer_state_dict': optimizer.state_dict(),
            }, './checkpoints/transformer_ep_' + f'{epoch:02d}')

The tensorboard extension is already loaded. To reload it, use:
  %reload_ext tensorboard


KeyboardInterrupt: 

## 4 Tracking

Training and Tracking are quite different for most MOT algorithms. This project employs SORT and exchanges the Kalman filter with the previously trained motion model.

In [None]:
from models.Transformer.transformer_encoder import MotionTransformer
from engine import track
import torch

# load existing model from checkpoint
# not necessary if model was trained in the same notebook without restarting
model = MotionTransformer()
checkpoint = torch.load('./checkpoints/transformer_ep_05', weights_only=True)
model.load_state_dict(checkpoint['model_state_dict'])

# set model to eval (necessary due to dropout and norm layers)
model.eval()


phase = 'val_dets'
seq_path = './DanceTrack'
split = 'val'
output_dir = 'output1'

# MOT on all sequences in DanceTrack val
# predictions are saved in output_dir
track(model, phase, seq_path, split, output_dir)


## 5 Evaluation

The model is evaluated using the HOTA, CLEAR and Identity metrics. The TrackEval package is used for evaluation.

In [8]:
import os
import matplotlib

dataset_dir = './DanceTrack'
dataset_split = 'val'
gt_dir = './DanceTrack/val'
tracker_dir = './output'

os.system(f"python3 TrackEval/scripts/run_mot_challenge.py --SPLIT_TO_EVAL {dataset_split}  "
                    f"--METRICS HOTA CLEAR Identity  --GT_FOLDER {gt_dir} "
                    f"--SEQMAP_FILE {os.path.join(dataset_dir, f'{dataset_split}_seqmap.txt')} "
                    f"--SKIP_SPLIT_FOL True --TRACKERS_TO_EVAL '' --TRACKER_SUB_FOLDER ''  --USE_PARALLEL True "
                    f"--NUM_PARALLEL_CORES 8 --PLOT_CURVES True "
                    f"--TRACKERS_FOLDER '{tracker_dir}'")



Eval Config:
USE_PARALLEL         : True                          
NUM_PARALLEL_CORES   : 8                             
BREAK_ON_ERROR       : True                          
RETURN_ON_ERROR      : False                         
LOG_ON_ERROR         : /home/fredo/PycharmProjects/MotionSSM/TrackEval/error_log.txt
PRINT_RESULTS        : True                          
PRINT_ONLY_COMBINED  : False                         
PRINT_CONFIG         : True                          
TIME_PROGRESS        : True                          
DISPLAY_LESS_PROGRESS : False                         
OUTPUT_SUMMARY       : True                          
OUTPUT_EMPTY_CLASSES : True                          
OUTPUT_DETAILED      : True                          
PLOT_CURVES          : True                          

MotChallenge2DBox Config:
PRINT_CONFIG         : True                          
GT_FOLDER            : ./DanceTrack/val              
TRACKERS_FOLDER      : ./output                      
OUTPUT_FO

0