Skip to content

ecker-lab/object-centric-representation-benchmark

master
Switch branches/tags
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
img
 
 
 
 
 
 
 
 
 
 
 
 

Object-centric Representation Benchmark

This repository contains code, data and a benchmark leaderboard from the paper Benchmarking Unsupervised Object Representations for Video Sequences by M.A. Weis, K. Chitta, Y. Sharma, W. Brendel, M. Bethge, A. Geiger and A.S. Ecker (2021).

Code for training OP3, TBA and SCALOR was adapted using: OP3 codebase, TBA codebase and SCALOR codebase.

Table of Contents

Installation

python3 setup.py install

Datasets

Download data from OSF to ocrb/data/datasets.

Available datasets:

  • Video Multi-dSprites (VMDS)
  • Sprites-MOT (SpMOT)
  • Video Object Room (VOR)

Datasets

  • Textured Video Multi-dSprites (texVMDS)

alt text      alt text      alt text

Extract Data

Extract data from hdf5 files:

python3 ocrb/data/extract_data.py --path='ocrb/data/datasets/' --dataset='vmds'

Training

Training ViMON

To run ViMON training:

python3 ocrb/vimon/main.py --config='ocrb/vimon/config.json'

where hyperparameters are specified in the config file.

Training OP3

To run OP3 training:

python3 ocrb/op3/main.py --va vmds

where the --va flag can be toggled between vmds, vor, and spmot. Hyperparameters for each can be found in the corresponding file. For details see the original OP3 repository.

Training TBA

For TBA training, the input datasets need to be pre-processed into batches, for which we provide a function:

python3 ocrb/tba/data/create_batches.py --batch_size=64 --dataset='vmds' --mode='train'

To run training:

python3 ocrb/tba/run.py --task vmds

the --task flag can be toggled between vmds, spmot and vor. For details regarding other training flags see the original TBA repository.

Evaluation

Generating ViMON annotation file

To generate annotation file with mask and object id predictions per frame for each video in the test set, run:

python3 ocrb/vimon/generate_pred_json.py --config='ocrb/vimon/config.json' --ckpt_file='ocrb/vimon/ckpts/pretrained/ckpt_vimon_vmds.pt' --out_path='ocrb/vimon/ckpts/pretrained/vmds_pred_list.json'

where hyperparameters including dataset are specified in ocrb/vimon/config.json file and --ckpt_file gives the path to the trained model weights.

Generating OP3 annotation file

To generate annotation file with mask and object id predictions per frame for each video in the test set, run:

python3 ocrb/op3/generate_pred_json.py --va vmds --ckpt_file='ocrb/op3/ckpts/vmds_params.pkl' --out_path='ocrb/op3/ckpts/vmds_pred_list.json'

where hyperparameters can be found in the corresponding file and --ckpt_file gives the path to the trained model weights. For details see the original OP3 repository.

Generating TBA annotation file

To generate annotation file for TBA, run:

python3 ocrb/tba/run.py --task vmds --metric 1 --v 2 --init_model sp_latest.pt

The annotation file is generated in the folder ocrb/tba/pic. For details regarding other evaluation flags see the original TBA repository.

Evaluating MOT metrics

To compute MOT metrics, run:

python3 ocrb/eval/eval_mot.py --gt_file='ocrb/data/gt_jsons/vmds_test.json' --pred_file='ocrb/vimon/ckpts/pretrained/vmds_pred_list.json' --results_path='ocrb/vimon/ckpts/pretrained/vmds_results.json' --exclude_bg

where --gt_file specifies the path to the ground truth annotation file, --pred_file specifies the path to the annotation file containing the model predictions and -results_path gives the path where to save the result dictionary. Set --exclude_bg to exclude background segmentations masks from the evaluation.

Leaderboard

Analysis of SOTA object-centric representation learning models for MOT. Results shown as mean ± standard deviation of three runs with different random training seeds. Models ranked according to MOTA for each dataset. If you want to add your own method and results on any of the three datasets, please open a pull request where you add the results in the tables below.

SpMOT

Rank Model Reference MOTA ↑ MOTP ↑ MD ↑ MT ↑ Match ↑ Miss ↓ ID S. ↓ FPs ↓ MSE ↓
1 SCALOR Jiang et al. 2020 94.9 ± 0.5 80.2 ± 0.1 96.4 ± 0.1 93.2 ± 0.7 95.9 ± 0.4 2.4 ± 0.0 1.7 ± 0.4 1.0 ± 0.1 3.4 ± 0.1
2 ViMON Weis et al. 2020 92.9 ± 0.2 91.8 ± 0.2 87.7 ± 0.8 87.2 ± 0.8 95.0 ± 0.2 4.8 ± 0.2 0.2 ± 0.0 2.1 ± 0.1 11.1 ± 0.6
3 OP3 Veerapaneni et al. 2019 89.1 ± 5.1 78.4 ± 2.4 92.4 ± 4.0 91.8 ± 3.8 95.9 ± 2.2 3.7 ± 2.2 0.4 ± 0.0 6.8 ± 2.9 13.3 ± 11.9
4 TBA He et al. 2019 79.7 ± 15.0 71.2 ± 0.3 83.4 ± 9.7 80.0 ± 13.6 87.8 ± 9.0 9.6 ± 6.0 2.6 ± 3.0 8.1 ± 6.0 11.9 ± 1.9
5 MONet Burgess et al. 2019 70.2 ± 0.8 89.6 ± 1.0 92.4 ± 0.6 50.4 ± 2.4 75.3 ± 1.3 4.4 ± 0.4 20.3 ± 1.6 5.1 ± 0.5 13.0 ± 2.0

VMDS

Rank Model Reference MOTA ↑ MOTP ↑ MD ↑ MT ↑ Match ↑ Miss ↓ ID S. ↓ FPs ↓ MSE ↓
1 OP3 Veerapaneni et al. 2019 91.7 ± 1.7 93.6 ± 0.4 96.8 ± 0.5 96.3 ± 0.4 97.8 ± 0.1 2.0 ± 0.1 0.2 ± 0.0 6.1 ± 1.5 4.3 ± 0.2
2 ViMON Weis et al. 2020 86.8 ± 0.3 86.8 ± 0.0 86.2 ± 0.3 85.0 ± 0.3 92.3 ± 0.2 7.0 ± 0.2 0.7 ± 0.0 5.5 ± 0.1 10.7 ± 0.1
3 SCALOR Jiang et al. 2020 74.1 ± 1.2 87.6 ± 0.4 67.9 ± 1.1 66.7 ± 1.1 78.4 ± 1.0 20.7 ± 1.0 0.8 ± 0.0 4.4 ± 0.4 14.0 ± 0.1
4 TBA He et al. 2019 54.5 ± 12.1 75.0 ± 0.9 62.9 ± 5.9 58.3 ± 6.1 75.9 ± 4.3 21.0 ± 4.2 3.2 ± 0.3 21.4 ± 7.8 28.1 ± 2.0
5 MONet Burgess et al. 2019 49.4 ± 3.6 78.6 ± 1.8 74.2 ± 1.7 35.7 ± 0.8 66.7 ± 0.7 13.6 ± 1.0 19.7 ± 0.6 17.2 ± 3.1 22.2 ± 2.2

VOR

Rank Model Reference MOTA ↑ MOTP ↑ MD ↑ MT ↑ Match ↑ Miss ↓ ID S. ↓ FPs ↓ MSE ↓
1 ViMON Weis et al. 2020 89.0 ± 0.0 89.5 ± 0.5 90.4 ± 0.5 90.0 ± 0.4 93.2 ± 0.4 6.5 ± 0.4 0.3 ± 0.0 4.2 ± 0.4 6.4 ± 0.6
2 SCALOR Jiang et al. 2020 74.6 ± 0.4 86.0 ± 0.2 76.0 ± 0.4 75.9 ± 0.4 77.9 ± 0.4 22.1 ± 0.4 0.0 ± 0.0 3.3 ± 0.2 6.4 ± 0.1
3 OP3 Veerapaneni et al. 2019 65.4 ± 0.6 89.0 ± 0.6 88.0 ± 0.6 85.4 ± 0.5 90.7 ± 0.3 8.2 ± 0.4 1.1 ± 0.2 25.3 ± 0.6 3.0 ± 0.1
4 MONet Burgess et al. 2019 37.0 ± 6.8 81.7 ± 0.5 76.9 ± 2.2 37.3 ± 7.8 64.4 ± 5.0 15.8 ± 1.6 19.8 ± 3.5 27.4 ± 2.3 12.2 ± 1.4

texVMDS

Rank Model Reference MOTA ↑ MOTP ↑ MD ↑ MT ↑ Match ↑ Miss ↓ ID S. ↓ FPs ↓ MSE ↓
1 MONet Burgess et al. 2019 -73.3 ± 5.5 67.7 ± 1.1 16.0 ± 3.4 12.3 ± 3.1 24.7 ± 4.7 73.1 ± 5.1 2.2 ± 0.8 98.0 ± 1.7 200.5 ± 5.7
2 ViMON Weis et al. 2020 -85.5 ± 2.8 69.0 ± 0.6 24.2 ± 1.3 23.8 ± 1.4 34.7 ± 1.7 65.0 ± 1.7 0.3 ± 0.0 120.2 ± 2.5 171.4 ± 3.3
3 SCALOR Jiang et al. 2020 -99.2 ± 11.7 74.0 ± 0.5 6.5 ± 0.6 6.3 ± 0.6 12.3 ± 0.4 87.5 ± 0.4 0.2 ± 0.0 111.5 ± 11.4 133.7 ± 11.1
4 OP3 Veerapaneni et al. 2019 -110.4 ± 4.3 70.6 ± 0.6 16.5 ± 5.1 16.2 ± 5.0 22.9 ± 6.6 76.9 ± 6.7 0.2 ± 0.1 133.4 ± 2.9 132.8 ± 16.2

Citation

If you use this repository in your research, please cite:

@article{Weis2021,
  author  = {Marissa A. Weis and Kashyap Chitta and Yash Sharma and Wieland Brendel and Matthias Bethge and Andreas Geiger and Alexander S. Ecker},
  title   = {Benchmarking Unsupervised Object Representations for Video Sequences},
  journal = {Journal of Machine Learning Research},
  year    = {2021},
  volume  = {22},
  number  = {183},
  pages   = {1-61},
  url     = {http://jmlr.org/papers/v22/21-0199.html}
}

About

Code, data and benchmark from the paper "Unmasking the Inductive Biases of Unsupervised Object Representations for Video Sequences".

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •  

Languages