Spectrum-guided Multi-granularity Referring Video Object Segmentation

New: see the HTR for a stronger model and a new metric to measure temporal consistency.

The official implementation of the ICCV 2023 paper:

Spectrum-guided Multi-granularity Referring Video Object Segmentation

Spectrum-guided Multi-granularity Referring Video Object Segmentation

Bo Miao, Mohammed Bennamoun, Yongsheng Gao, Ajmal Mian

ICCV 2023

Introduction

We propose a Spectrum-guided Multi-granularity (SgMg) approach that follows a segment-and-optimize pipeline to tackle the feature drift issue found in previous decode-and-segment approaches. Extensive experiments show that SgMg achieves state-of-the-art overall performance on multiple benchmark datasets, outperforming the closest competitor by 2.8% points on Ref-YouTube-VOS with faster inference time.

Setup

The main setup of our code follows Referformer.

Please refer to install.md for installation.

Please refer to data.md for data preparation.

Training and Evaluation

All the models are trained using 2 RTX 3090 GPU. If you encounter the OOM error, please add the command --use_checkpoint.

The training and evaluation scripts are included in the scripts folder. If you want to train/evaluate SgMg, please run the following command:

sh dist_train_ytvos_videoswinb.sh

sh dist_test_ytvos_videoswinb.sh

Note: You can modify the --backbone and --backbone_pretrained to specify a backbone.

Model Zoo

We provide the pretrained model for different visual backbones and the checkpoints for SgMg (refer below).

You can put the models in the checkpoints folder to start training/inference.

Results (Ref-YouTube-VOS & Ref-DAVIS)

To evaluate the results, please upload the zip file to the competition server.

Backbone	Ref-YouTube-VOS J&F	Ref-DAVIS J&F	Model	Submission
Video-Swin-T	62.0	61.9	model	link
Video-Swin-B	65.7	63.3	model	link

Results (A2D-Sentences & JHMDB-Sentences)

Backbone	(A2D) mAP	Mean IoU	Overall IoU	(JHMDB) mAP	Mean IoU	Overall IoU	Model
Video-Swin-T	56.1	78.0	70.4	44.4	72.8	71.7	model
Video-Swin-B	58.5	79.9	72.0	45.0	73.7	72.5	model

Results (RefCOCO/+/g)

The overall IoU is used as the metric, and the model is obtained from the pre-training stage mentioned in the paper.

Backbone	RefCOCO	RefCOCO+	RefCOCOg	Model
Video-Swin-B	76.3	66.4	70.0	model

Acknowledgements

Citation

@InProceedings{Miao_2023_ICCV,
    author    = {Miao, Bo and Bennamoun, Mohammed and Gao, Yongsheng and Mian, Ajmal},
    title     = {Spectrum-guided Multi-granularity Referring Video Object Segmentation},
    booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
    month     = {October},
    year      = {2023},
    pages     = {920-930}
}

Contact

If you have any questions about this project, please feel free to contact bomiaobbb@gmail.com.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
datasets		datasets
davis2017		davis2017
docs		docs
models		models
scripts		scripts
tools		tools
util		util
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
engine.py		engine.py
eval_davis.py		eval_davis.py
inference_davis.py		inference_davis.py
inference_ytvos.py		inference_ytvos.py
main.py		main.py
main_pretrain.py		main_pretrain.py
opts.py		opts.py
requirements.txt		requirements.txt
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

New: see the HTR for a stronger model and a new metric to measure temporal consistency.

Spectrum-guided Multi-granularity Referring Video Object Segmentation

Introduction

Setup

Training and Evaluation

Model Zoo

Results (Ref-YouTube-VOS & Ref-DAVIS)

Results (A2D-Sentences & JHMDB-Sentences)

Results (RefCOCO/+/g)

Acknowledgements

Citation

Contact

About

Releases

Packages

Languages

License

bo-miao/SgMg

Folders and files

Latest commit

History

Repository files navigation

New: see the HTR for a stronger model and a new metric to measure temporal consistency.

Spectrum-guided Multi-granularity Referring Video Object Segmentation

Introduction

Setup

Training and Evaluation

Model Zoo

Results (Ref-YouTube-VOS & Ref-DAVIS)

Results (A2D-Sentences & JHMDB-Sentences)

Results (RefCOCO/+/g)

Acknowledgements

Citation

Contact

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages