Track to Detect and Segment: An Online Multi-Object Tracker (CVPR 2021)

Jialian Wu, Jiale Cao, Liangchen Song, Yu Wang, Ming Yang, Junsong Yuan
In CVPR, 2021. [Paper] [Project Page] [Demo (YouTube) (bilibili)]


  • As reported in the OVIS paper, TraDeS achieves competitive performance on Occluded Video Instance Segmentation (12.0 AP on OVIS test set).
  • As reported in the MvMHAT paper, TraDeS also performs well on Multi-view Persons Tracking.
  • TraDeS has been applied to 6 datasets across 4 tasks through our or third-parties' implementations.


Please refer to for installation instructions.

Run Demo

Before run the demo, first download our trained models: CrowdHuman model (2D tracking), MOT model (2D tracking) or nuScenes model (3D tracking). Then, put the models in TraDeS_ROOT/models/ and cd TraDeS_ROOT/src/. The demo result will be saved as a video in TraDeS_ROOT/results/.

2D Tracking Demo

Demo for a video clip from MOT dataset: Run the demo (using the MOT model):

python tracking --dataset mot --load_model ../models/mot_half.pth --demo ../videos/mot_mini.mp4 --pre_hm --ltrb_amodal --pre_thresh 0.5 --track_thresh 0.4 --inference --clip_len 3 --trades --save_video --resize_video --input_h 544 --input_w 960

Demo for a video clip which we randomly selected from YouTube: Run the demo (using the CrowdHuman model):

python tracking --load_model ../models/crowdhuman.pth --num_class 1 --demo ../videos/street_2d.mp4 --pre_hm --ltrb_amodal --pre_thresh 0.5 --track_thresh 0.5 --inference --clip_len 2 --trades --save_video --resize_video --input_h 480 --input_w 864

Demo for your own video or image folder: Please specify the file path after --demo and run (using the CrowdHuman model):

python tracking --load_model ../models/crowdhuman.pth --num_class 1 --demo $path to your video or image folder$ --pre_hm --ltrb_amodal --pre_thresh 0.5 --track_thresh 0.5 --inference --clip_len 2 --trades --save_video --resize_video --input_h $your_input_h$ --input_w $your_input_w$

(Some Notes: (i) For 2D tracking, the models are only used for person tracking, since our method is only trained on CrowdHuman or MOT. You may train a model on COCO or your own dataset for multi-category 2D object tracking. (ii) --clip_len is set to 3 for MOT; otherwise, it should be 2. You may refer to our paper for this detail. (iii) The CrowdHuman model is more able to generalize to real world scenes than the MOT model. Note that both datasets are in non-commercial licenses. (iii) input_h and input_w shall be evenly divided by 32.)

3D Tracking Demo

Demo for a video clip from nuScenes dataset: Run the demo (using the nuScenes model):

python tracking,ddd --dataset nuscenes --load_model ../models/nuscenes.pth --demo ../videos/nuscenes_mini.mp4 --pre_hm --track_thresh 0.1 --inference --clip_len 2 --trades --save_video --resize_video --input_h 448 --input_w 800 --test_focal_length 633

(You will need to specify test_focal_length for monocular 3D tracking demo to convert the image coordinate system back to 3D. The value 633 is half of a typical focal length (~1266) in nuScenes dataset in input resolution 1600x900. The mini demo video is in an input resolution of 800x448, so we need to use a half focal length. You don't need to set the test_focal_length when testing on the original nuScenes data.)

You can also refer to CenterTrack for the usage of webcam demo (code is available in this repo, but we have not tested yet).

Benchmark Evaluation and Training

Please refer to for dataset preparation.

2D Object Tracking

Our Baseline 64.8 59.5 1055
CenterTrack 66.1 64.2 528
TraDeS (ours) 68.2 71.7 285

Test on MOT17 validation set: Place the MOT model in $TraDeS_ROOT/models/ and run:

sh experiments/

Train on MOT17 halftrain set: Place the pretrained model in $TraDeS_ROOT/models/ and run:

sh experiments/

3D Object Tracking

nuScenes Val AMOTA↑ AMOTP↓ IDSA↓
Our Baseline 4.3 1.65 1792
CenterTrack 6.8 1.54 813
TraDeS (ours) 11.8 1.48 699

Test on nuScenes validation set: Place the nuScenes model in $TraDeS_ROOT/models/. You need to change the MOT and nuScenes dataset API versions due to their conflicts. The default installed versions are for MOT dataset. For experiments on nuScenes dataset, please run:


sh experiments/

To switch back to the API versions for MOT experiments, you can run:


Train on nuScenes train set: Place the pretrained model in $TraDeS_ROOT/models/ and run:

sh experiments/

Train on Static Images

We follow CenterTrack which uses CrowdHuman to pretrain 2D object tracking model. Only the training set is used.

sh experiments/

The trained model is available at CrowdHuman model.


If you find it useful in your research, please consider citing our paper as follows:

title={Track to Detect and Segment: An Online Multi-Object Tracker},
author={Wu, Jialian and Cao, Jiale and Song, Liangchen and Wang, Yu and Yang, Ming and Yuan, Junsong},
booktitle={IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},


Many thanks to CenterTrack authors for their great framework!


