GLAD: Global-Local View Alignment and Background Debiasing for Unsupervised Video Domain Adaptation with Large Domain Gap [WACV 2024]

Abstract

In this work, we tackle the challenging problem of unsupervised video domain adaptation (UVDA) for action recognition. We specifically focus on scenarios with a substantial domain gap, in contrast to existing works primarily deal with small domain gaps between labeled source domains and unlabeled target domains.

So, contributions of this work is 2-fold.

1. Introduces Kinetics→BABEL.

To establish a more realistic setting, we introduce a novel UVDA scenario, denoted as Kinetics→BABEL, with a more considerable domain gap in terms of both temporal dynamics and background shifts.

2. Introduces a method to tackle the challenging Kinetics→BABEL.

To tackle the temporal shift, i.e., action duration difference between the source and target domains, we propose a global-local view alignment approach.
To mitigate the background shift, we propose to learn temporal order sensitive representations by temporal order learning and background invariant representations by background augmentation. We empirically validate that the proposed method shows significant improvement over the existing methods on the Kinetics→BABEL dataset with a large domain gap.

Installation

We provide our working conda environment as an exported yaml file.

conda env create --file requirements/environment.yml
pip install -e .

Data Preparation

1. Download AMASS BMLrub Rendered Videos

The AMASS dataset is a comprehensive motion capture skeleton dataset that serves as an input for the BABEL dataset. Unlike the original, our proposed dataset, Kinetics→BABEL, utilizes a different kind of input—rendered videos rather than skeletons. To access these, please create an account on AMASS and download the BMLrub rendered videos.

2. Link datasets

Make symlinks to the actual dataset paths.

mkdir data
ln -s ./data/k400 /KINETICS/PATH/
ln -s ./data/babel /BABEL/PATH/

We highly recommend to extract rawframes beforehand to optimize I/O. Below are example structures for each dataset.

Kinetics Structure

./data/k400/rawframes_resized
├── train
│   ├── applauding
│   │   ├── 0nd-Gc3HkmU_000019_000029
│   │   │   ├── img_00000.jpg
│   │   │   ├── img_00001.jpg
│   │   │   ├── img_00002.jpg
│   │   │   └── ...
│   │   ├── 0Tq8uFakTbk_000000_000010
│   │   ├── 0XrsfW9ejfk_000000_000010
│   │   ├── 0YQrMye3BBY_000000_000010
│   │   ├── 1WMulo84kBY_000020_000030
│   │   └── ...
│   ├── balloon_blowing
│   ├── ...
│   ├── unboxing
│   └── waxing_legs
└── val
    ├── applauding
    ├── balloon_blowing
    ├── ...
    ├── unboxing
    └── waxing_legs

BABEL Structure

./data/babel
├── train
│   ├── 000000
│   │   ├── img_00001.jpg
│   │   ├── img_00002.jpg
│   │   └── ...
│   ├── 000002
│   └── ...
└── val
    ├── ...
    ├── 013286
    └── 013288

3. Extract Backgrounds for the Background Augmentation

python utils/extract_median_by_rawframes.py \
    --ann-file 'data/filelists/k400/filelist_k400_train_closed.txt' \
    --outdir 'data/median/k400' \
    --start-index 0 \
    --data-prefix 'data/k400/rawframes_resized'

Train and Test

Train

The training process has 2 stages.

Pretrain TOL (Temporal Ordering Learning)
```
source tools/dist_train.sh configs/tol.py 8 \
--seed 0
```
Then training result will be generated under work_dirs/tol/, which will be utilized in the next stage.

GLAD

source tools/dist_train.sh configs/glad.py 8 \
--seed 3 \
--validate --test-last --test-best

Test

source tools/dist_test.sh configs/glad.py $(find 'work_dirs/glad' -name '*best*.pth' | head -1) 8 \
--eval 'mean_class_accuracy' 'confusion_matrix'

Special Thanks

This project has been made possible through the generous funding and support of NCSOFT Corporation. We extend our sincere gratitude for their contribution and belief in our work.

License

This project is released under the BSD-3-Clause.

Citation

@inproceedings{leebae2024glad,
  title={{GLAD}: Global-Local View Alignment and Background Debiasing for Video Domain Adaptation},
  author={Lee, Hyogun and Bae, Kyungho and Ha, Seong Jong and Ko, Yumin and Park, Gyeong-Moon and Choi, Jinwoo},
  booktitle={Proceedings of the IEEE Winter Conference on Applications of Computer Vision (WACV)},
  year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
configs		configs
mmaction		mmaction
requirements		requirements
resources		resources
tools		tools
utils		utils
AUTHORS.md		AUTHORS.md
LICENSE.md		LICENSE.md
MANIFEST.in		MANIFEST.in
README.md		README.md
SPECIFICATION.md		SPECIFICATION.md
setup.cfg		setup.cfg
setup.py		setup.py

License

KHU-VLL/GLAD

Folders and files

Latest commit

History

Repository files navigation

GLAD: Global-Local View Alignment and Background Debiasing for Unsupervised Video Domain Adaptation with Large Domain Gap [WACV 2024]

Abstract

1. Introduces Kinetics→BABEL.

2. Introduces a method to tackle the challenging Kinetics→BABEL.

Installation

Data Preparation

1. Download AMASS BMLrub Rendered Videos

2. Link datasets

3. Extract Backgrounds for the Background Augmentation

Train and Test

Train

Test

Special Thanks

License

Citation

About

Resources

License

Stars

Watchers

Forks

Languages