GitHub - KainingYing/CTVIS: Official Implementation of ICCV 2023 Paper - CTVIS: Consistent Training for Online Video Instance Segmentation

CTVIS: Consistent Training for Online Video Instance Segmentation

Kaining Ying^1,2*, Qing Zhong^4*, Weian Mao⁴, Zhenhua Wang^3#, Hao Chen^1#

Lin Yuanbo Wu⁵, Yifan Liu⁴, Chenxiang Fan¹, Yunzhi Zhuge⁴, Chunhua Shen¹

¹Zhejiang University, ²Zhejiang University of Technology

³Northwest A&F University, ⁴The University of Adelaide, ⁵Swansea University

📰 News

[2023/06/18] CTVIS wins 2nd Place in Pixel-level Video Understanding Challenge (VPS Track) at CVPR2023.
[2023/07/14] Our work CTVIS is accepted by ICCV 2023! Congrats! ✌️
[2023/07/24] ~~We will release the code ASAP. Stay tuned!~~
[2023/07/31] We release the code and weights on YTVIS19_R50.
[2023/08/24] CTVIS wins the 2nd Place in The 5th Large-scale Video Object Segmentation Challenge - Track 2: Video Instance Segmentation at ICCV 2023.
[2023/10/15] We upload all the checkpoints.

🔨 Install

Here we provide the command lines to build conda environment.

conda create -n ctvis python=3.10 -y 
conda activate ctvis
pip install torch==2.0.0 torchvision  

# install D2
git clone https://gitee.com/yingkaining/detectron2.git
python -m pip install -e detectron2

# install mmcv
pip install openmim
mim install "mmcv==1.7.1"

pip install -r requirements.txt

cd mask2former/modeling/pixel_decoder/ops
sh make.sh
cd ../../../../

🏀 Dataset Preparation

We recommend that you use the following format to organize the dataset format and refer to this for more details.

$DETECTRON2_DATASETS
+-- coco
|   |
|   +-- annotations
|   |   |
|   |   +-- instances_{train,val}2017.json
|   |   +-- coco2ytvis2019_train.json
|   |   +-- coco2ytvis2021_train.json
|   |   +-- coco2ovis_train.json
|   |
|   +-- {train,val}2017
|       |
|       +-- *.jpg
|
+-- ytvis_2019
|   ...
|
+-- ytvis_2021
|   ...
|
+-- ovis
    ...

It is worthwhile to note that annotations coco2ytvis2019_train.json, coco2ytvis2021_train.json and coco2ovis_train.json are post-processing from following command:

python tools/convert_coco2ytvis.py

If you want to visualize the dataset, you can use the following script (YTVIS19):

python browse_datasets.py ytvis_2019_train --save-dir /path/to/save/dir

⚾️ Training and Evaluation

Training

We use the weights of Mask2Former pretrained on MS-COCO as initional. You should download them first and place them in the checkpoints/.

Mask2Former-R50-COCO: Official Download Link

Mask2Former-SwinL-COCO: Official Download Link

Next you can train CTVIS, for example on YTVIS19 using R50.

python train_ctvis.py --config-file configs/ytvis_2019/CTVIS_R50.yaml --num-gpus 8 OUTPUT_DIR work_dirs/CTVIS_YTVIS19_R50

Evaluation

Typically during training, the model is evaluated on the validation set periodically. I can also evaluate the model separately, like this:

python train_ctvis.py --config-file configs/ytvis_2019/CTVIS_R50.yaml --eval-only --num-gpus 8 OUTPUT_DIR work_dirs/CTVIS_YTVIS19_R50 MODEL.WEIGHTS /path/to/model/weight/file

You can download the model weights in Model Zoo. Finally, we need to submit the submission files to the CodaLab to get the AP. We recommend using following scripts to push the submission to CodaLab. We appeariate this project for providing such useful feature.

python tools/codalab_upload.py --result-dir /path/to/your/submission/dir --id ytvis19 --account your_codalab_account_email --password your_codalab_account_password

Demo and Visualization

We support inference on specified videos (demo/demo.py) as well as visualization of all videos in a given dataset (demo/visualize_all_videos.py).

# demo
python demo/demo.py --config-file configs/ytvis_2019/CTVIS_R50.yaml --video-input --output /path/to/save/output --save-frames --opts MODEL.WEIGHTS /path/to/your/checkpoint

💽 Model Zoo

YouTube-VIS 2019

Model	Backbone	AP	AP50	AP75	AR1	AR10	Link
CTVIS	ResNet-50	55.2	79.5	60.2	51.3	63.7	1Drive
CTVIS	Swin-L (200 queries)	65.6	87.7	72.2	56.5	70.4	1Drive

YouTube-VIS 2021

Model	Backbone	AP	AP50	AP75	AR1	AR10	Link
CTVIS	ResNet-50	50.1	73.7	54.7	41.8	59.5	1Drive
CTVIS	Swin-L (200 queries)	61.2	84	68.8	48	65.8	1Drive

YouTube-VIS 2022

Note: YouTube-VIS 2022 shares the same training set as YouTube-VIS 2021.

Model	Backbone	AP	APS	APL	Link
CTVIS	ResNet-50	44.9	50.3	39.4	1Drive
CTVIS	Swin-L (200 queries)	53.8	61.2	46.4	1Drive

OVIS

Model	Backbone	AP	AP50	AP75	AR1	AR10	Link
CTVIS	ResNet-50	35.5	60.8	34.9	16.1	41.9	1Drive
CTVIS	Swin-L (200 queries)	46.9	71.5	47.5	19.1	52.1	1Drive

🫡 Acknowledgements

We sincerely appreciate HIGH-FLYER for providing the valuable computational resources. At the same time, we would like to express our gratitude to the following open source projects for their inspirations:

🪪 Lincese

The content of this project itself is licensed under LICENSE.

📇 Cite our Paper

If you found this project useful for your paper, please kindly cite our paper.

@misc{ying2023ctvis,
      title={{CTVIS}: {C}onsistent {T}raining for {O}nline {V}ideo {I}nstance {S}egmentation}, 
      author={Kaining Ying and Qing Zhong and Weian Mao and Zhenhua Wang and Hao Chen and Lin Yuanbo Wu and Yifan Liu and Chengxiang Fan and Yunzhi Zhuge and Chunhua Shen},
      year={2023},
      eprint={2307.12616},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

纪念刘伊凡老师

We are deeply grieved by the irreparable loss of Professor Liu. Professor Liu provided invaluable advice on our research, especially regarding the design of positive and negative samples as well as subsequent experimental arrangements and paper writing. She gave us tremendous guidance. Academically, Professor Liu was proficient and published influential works such as Structured Knowledge Distillation and Auto-painter that impacted academia profoundly. Professor Liu taught us to publish valuable and impactful research results. Even if it is just a small module, as long as it is widely applied and developed by others, it is sufficient to leave a brilliant legacy. Beyond academia, Professor Liu's optimism, sagacity, and transcendence left us with deep impressions. We will bear Professor Liu's teachings in mind, stay true to our original aspirations, and continue to move forward in academic research to publish original and influential results. This is the best way to cherish Professor Liu's memory. Professor Liu's passing has left an irreplaceable void in each of our hearts. We will always cherish her wisdom and kindness. Professor Liu, thank you for your meticulous nurturing of us. We will surely uphold your academic spirit, live up to your trust, and keep moving forward. May you rest in peace.

我们深感悲痛，永远失去了刘老师。刘老师对我们的研究工作提出了宝贵的建议，尤其是在构造正负样本以及后续的实验设计和论文写作上，给予了我们强有力的指导。在学术上，刘老师造诣颇深，发表了对学术界影响深远的工作，如Structured knowledge distillation和Auto-painter等。刘老师教导我们，要发表有价值和影响力的研究成果，哪怕只是一个小模块，如果为后人所广泛应用和发展，也足以留下灿烂的一笔。在学术之外，刘老师的乐观豁达和聪慧超脱也给我们留下了深刻的印象。我们会牢记刘老师的教导，不忘初心，继续努力在学术道路上前行，发表具有原创性和影响力的研究成果。这是对刘老师最好的怀念。刘老师的离去让我们每个人心中留下了难以填补的空白。我们将永远怀念他的睿智和善良。刘老师，感谢您对我们的悉心培养，我们一定会秉持您的学术精神，不负重托，继续努力前行。愿您安息。

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
assets		assets
configs		configs
ctvis		ctvis
demo		demo
mask2former		mask2former
tools		tools
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
train_ctvis.py		train_ctvis.py

License

KainingYing/CTVIS

Folders and files

Latest commit

History

Repository files navigation