Skip to content

Limingxing00/RDE-VOS-CVPR2022

Repository files navigation

Introduction

Recurrent Dynamic Embedding for Video Object Segmentation [CVPR 2022]

Install

If you just implement our method, refer to Requirements.
If you want to evaluate our method on the davis 2017 validation set, refer to Requirements.

Model zoo

You can download the pretrained models from Google.

The predictions of our method can be download from Google.

Dataset

Following STCN, we train the network in the three stages. Firstly, we train the network on the static image dataset, which can be downloaded in download_datasets.py. Then we fine-tune the network with SAM on the BL30K dataset, which can be downloaded in download_bl30k.py. Note, BL30K is an extensive dataset introduced by MiVOS and is 700GB in total. Finally, we fine-tune the network with SAM on the mixed dataset (DAVIS 2017 and YouTube-VOS 2019).

I know it doesn't look straightforward., while you can just download DAVIS 2017 and have a quick start right away.

├── BL30K
├── DAVIS
│   ├── 2016
│   │   ├── Annotations
│   │   └── ...
│   └── 2017
│       ├── test-dev
│       │   ├── Annotations
│       │   └── ...
│       └── trainval
│           ├── Annotations
│           └── ...
├── static
│   ├── BIG_small
│   └── ...
├── YouTube
│   ├── all_frames
│   │   └── valid_all_frames
│   ├── train
│   ├── train_480p
│   └── valid
└── YouTube2018
    ├── all_frames
    │   └── valid_all_frames
    └── valid

Quick start

Take the inference on the Davis 2017 validation set as an example. The inference command is as follows:

python eval_davis.py --output ... --davis_path ... --model  ...   --mode two-frames-compress  --mem_every ... --top ... --amp
  • output: prediction path to save the results.
  • davis_path: the path for Davis 2017.
  • model: pre-trained model path.
  • mode: optional items, such as two-frames-compress, gt-compress, last-compress.
  • mem_every: the interval to use SAM.
  • top: topk-filter.
  • amp: use apex to infer.

You can use this protocol.

python eval_davis.py --output prediction/s012 --davis_path ... --model  pretrain/model_s012_final.pth   --mode two-frames-compress  --mem_every 3 --amp

Quick evaluation

Take the evaluation on the Davis 2017 validation set as an example.

We modify this repo to evaluate our method.

python evaluation/2017/evaluation_ours.py --results_path ... --davis_path ...
  • results_path: prediction path to save the results.
  • davis_path: the path for Davis 2017.

Results

Without BL30K

Dataset Split FPS
DAVIS 2016 validation 91.1 89.7 92.5 35.0
DAVIS 2017 validation 84.2 80.8 87.5 27.0
DAVIS 2017 test-dev 77.4 73.6 81.2 -
Dataset Split
YouTube 2019 validation 81.9 81.1 85.5 76.2 84.8

With BL30K

Dataset Split FPS
DAVIS 2016 validation 91.6 90.0 93.2 35.0
DAVIS 2017 validation 86.1 82.1 90.0 27.0
DAVIS 2017 test-dev 78.9 74.9 92.9 -
Dataset Split
YouTube 2019 validation 83.3 81.9 86.3 78.0 86.9

Inference

By one gpu, you can infer these datasets as follows:

  • DAVIS 2017 validation set
python eval_davis.py --output prediction/DAVIS-2017-val --davis_path ... --model  pretrain/model_s012_final.pth   --mode two-frames-compress  --mem_every 3 --amp
  • Davis 2017 test set
python eval_davis.py --output prediction/DAVIS-2017-test --davis_path ... --model  pretrain/model_s012_final.pth   --mode two-frames-compress  --mem_every 3 --top 40 --split testdev --amp
  • DAVIS 2016 validation set
python eval_davis_2016.py --output prediction/DAVIS-2017-val --davis_path ... --model  pretrain/model_s012_final.pth   --mode two-frames-compress  --mem_every 3 --top 40 --split testdev --amp
  • YouTube 2019 validation set
python eval_youtube.py --output prediction/YV-19-val --yv_path ... --model  pretrain/model_s012_final_yv.pth  --mode two-frames-compress  --mem_every 4 --top 20 --amp

Training

Firstly, you must configure the paths to the dataset in util/hyper_para.py, which include --static_root, --bl_root, --yv_root and --davis_root.

stage 0

cd rootdir &&\
OMP_NUM_THREADS=4 python -m  torch.distributed.launch --master_port 9843 \
--nproc_per_node=4 \
train.py --id  s0 \
--stage 0 \
--perturb_max 1 \
--perturb_min 0.85 \
--save_interval  10000 \
--klloss_weight 10 \
--start_warm 5000 \
--end_warm 17500 \
--batch_size 16 \
--lr 2e-05 \
--steps 37500 \
--iterations 75000 \
--repeat 0  

stage 0 -> 3 (w/o BL30K)

cd rootdir &&\
OMP_NUM_THREADS=4 python -m  torch.distributed.launch --master_port 9844 \
--nproc_per_node=2 \
train.py --id  s03 \
--stage 3 \
--load_network pretrain/s0/model_75000.pth \
--perturb_max 1 \
--perturb_min 0.85 \
--save_interval  10000 \
--klloss_weight 10 \
--batch_size 4 \
--lr 2e-05 \
--steps 125000 \
--iterations 150000 \
--repeat 0  

stage 1

cd rootdir &&\
OMP_NUM_THREADS=4 python -m  torch.distributed.launch --master_port 9843 \
--nproc_per_node=2 \
train.py --id  s1 \
--stage 1 \
--load_network pretrain/s0/model_75000.pth \
--perturb_max 1 \
--perturb_min 0.85 \
--save_interval  10000 \
--klloss_weight 10 \
--start_warm 20000 \
--end_warm 70000 \
--batch_size 4 \
--lr 1e-05 \
--steps 400000 \
--iterations 500000 \
--repeat 0  

stage 2

cd rootdir &&\
OMP_NUM_THREADS=4 python -m  torch.distributed.launch --master_port 9843 \
--nproc_per_node=2 \
train.py --id  s2 \
--stage 2 \
--load_network /gdata/limx/VOS/SAM/cvpr-22-code/pretrain/s1/model_500000.pth \
--perturb_max 1 \
--perturb_min 0.85 \
--save_interval  10000 \
--klloss_weight 5 \
--decoder_f2_weight 5 \
--decoder_f4_weight 5 \
--start_warm 5000 \
--end_warm 17500 \
--batch_size 8 \
--lr 2e-05 \
--steps 62500 \
--iterations 75000 \
--repeat 0  

Note since I suffered temporary layoffs during my internship at Alibaba, there is uncertainty about the installation environment and the version of the code I applied for. I tried to reproduce the previous parameters on this version and got 0.857 on Davis 17 val (0.861 in the original paper) and 79.2 on Davis 17 test (0.789 in the original paper).

The original parameters:

klloss_weight = 10 (paper) -> 5 (now)
decoder_f2_weight = 10 (paper) -> 5 (now)
decoder_f4_weight = 10 (paper) -> 5 (now)

Acknowledgement

This project is built upon numerous previous projects. We'd like to thank the contributors of STCN and MiVOS.

To do

  • quick start and quick evaluation.
  • inference codes.
  • training detials.
  • pre-trained models.

About

Accepted by CVPR 2022

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages