GitHub - Hydragon516/DPA: [CVPR 2024] Dual Prototype Attention for Unsupervised Video Object Segmentation

Dual Prototype Attention for Unsupervised Video Object Segmentation

Suhwan Cho* ¹ Minhyeok Lee* ¹ Seunghoon Lee ¹ Dogyoon Lee ¹ Heeseung Choi ^1,2 Ig-Jae Kim ^1,2 Sangyoun Lee ^1,2

¹ Yonsei University ² Korea Institute of Science and Technology (KIST)

CVPR 2024

Abstract

Unsupervised video object segmentation (VOS) aims to detect and segment the most salient object in videos. The primary techniques used in unsupervised VOS are 1) the collaboration of appearance and motion information; and 2) temporal fusion between different frames. This paper proposes two novel prototype-based attention mechanisms, inter-modality attention (IMA) and inter-frame attention (IFA), to incorporate these techniques via dense propagation across different modalities and frames. IMA densely integrates context information from different modalities based on a mutual refinement. IFA injects global context of a video to the query frame, enabling a full utilization of useful properties from multiple frames. Experimental results on public benchmark datasets demonstrate that our proposed approach outperforms all existing methods by a substantial margin. The proposed two components are also thoroughly validated via ablative study.

Datasets

Prepare all dataset.

We use RAFT to generate optical flow maps.

You can also get pre-processed datasets from TMO.

The complete dataset directory structure is as follows:

dataset dir/
├── DUTS_train/
│   ├── RGB/
│   │   ├── sun_ekmqudbbrseiyiht.jpg
│   │   ├── sun_ejwwsnjzahzakyjq.jpg
│   │   └── ...
│   └── GT/
│       ├── sun_ekmqudbbrseiyiht.png
│       ├── sun_ejwwsnjzahzakyjq.png
│       └── ...
├── DAVIS_train/
│   ├── RGB/
│   │   ├── bear_00000.jpg
│   │   ├── bear_00001.jpg
│   │   └── ...
│   ├── GT/
│   │   ├── bear_00000.png
│   │   ├── bear_00001.png
│   │   └── ...
│   └── FLOW/
│       ├── bear_00000.jpg
│       ├── bear_00001.jpg
│       └── ...
└── DAVIS_test/
    ├── blackswan/
    │   ├── RGB/
    │   │   ├── blackswan_00000.jpg
    │   │   ├── blackswan_00001.jpg
    │   │   └── ...
    │   ├── GT/
    │   │   ├── blackswan_00000.png
    │   │   ├── blackswan_00001.png
    │   │   └── ...
    │   └── FLOW/
    │       ├── blackswan_00000.jpg
    │       ├── blackswan_00001.jpg
    │       └── ...
    ├── bmx-trees
    └── ...

Training Model

We use a two-stage learning strategy: pretraining and finetuning.

Pretraining

Edit config.py. The data root path option and GPU index should be modified.
training

python pretrain.py

Finetuning

Edit config.py. The best model path generated during the pretraining process is required.
training

python train_for_DAVIS.py

Evaluation

See this link.

Results

Ours pre-calculated prediction masks can be downloaded here.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
dataloader		dataloader
model		model
pretrain		pretrain
README.md		README.md
config.py		config.py
logger.py		logger.py
loss.py		loss.py
metrics.py		metrics.py
pretrain.py		pretrain.py
train_for_DAVIS.py		train_for_DAVIS.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dataloader

dataloader

model

model

pretrain

pretrain

README.md

README.md

config.py

config.py

logger.py

logger.py

loss.py

loss.py

metrics.py

metrics.py

pretrain.py

pretrain.py

train_for_DAVIS.py

train_for_DAVIS.py

Repository files navigation

Dual Prototype Attention for Unsupervised Video Object Segmentation

Abstract

Datasets

Training Model

Pretraining

Finetuning

Evaluation

Results

About

Releases

Packages

Languages

Hydragon516/DPA

Folders and files

Latest commit

History

Repository files navigation

Dual Prototype Attention for Unsupervised Video Object Segmentation

Abstract

Datasets

Training Model

Pretraining

Finetuning

Evaluation

Results

About

Resources

Stars

Watchers

Forks

Languages