Skip to content

KHU-VLL/GLAD

Repository files navigation

GLAD: Global-Local View Alignment and Background Debiasing for Unsupervised Video Domain Adaptation with Large Domain Gap [WACV 2024]

method

Abstract

In this work, we tackle the challenging problem of unsupervised video domain adaptation (UVDA) for action recognition. We specifically focus on scenarios with a substantial domain gap, in contrast to existing works primarily deal with small domain gaps between labeled source domains and unlabeled target domains.

So, contributions of this work is 2-fold.

1. Introduces Kinetics→BABEL.

To establish a more realistic setting, we introduce a novel UVDA scenario, denoted as Kinetics→BABEL, with a more considerable domain gap in terms of both temporal dynamics and background shifts.

2. Introduces a method to tackle the challenging Kinetics→BABEL.

  • To tackle the temporal shift, i.e., action duration difference between the source and target domains, we propose a global-local view alignment approach.
  • To mitigate the background shift, we propose to learn temporal order sensitive representations by temporal order learning and background invariant representations by background augmentation. We empirically validate that the proposed method shows significant improvement over the existing methods on the Kinetics→BABEL dataset with a large domain gap.

Installation

We provide our working conda environment as an exported yaml file.

conda env create --file requirements/environment.yml
pip install -e .

Data Preparation

1. Download AMASS BMLrub Rendered Videos

The AMASS dataset is a comprehensive motion capture skeleton dataset that serves as an input for the BABEL dataset. Unlike the original, our proposed dataset, Kinetics→BABEL, utilizes a different kind of input—rendered videos rather than skeletons. To access these, please create an account on AMASS and download the BMLrub rendered videos.

2. Link datasets

Make symlinks to the actual dataset paths.

mkdir data
ln -s ./data/k400 /KINETICS/PATH/
ln -s ./data/babel /BABEL/PATH/

We highly recommend to extract rawframes beforehand to optimize I/O. Below are example structures for each dataset.

Kinetics Structure
./data/k400/rawframes_resized
├── train
│   ├── applauding
│   │   ├── 0nd-Gc3HkmU_000019_000029
│   │   │   ├── img_00000.jpg
│   │   │   ├── img_00001.jpg
│   │   │   ├── img_00002.jpg
│   │   │   └── ...
│   │   ├── 0Tq8uFakTbk_000000_000010
│   │   ├── 0XrsfW9ejfk_000000_000010
│   │   ├── 0YQrMye3BBY_000000_000010
│   │   ├── 1WMulo84kBY_000020_000030
│   │   └── ...
│   ├── balloon_blowing
│   ├── ...
│   ├── unboxing
│   └── waxing_legs
└── val
    ├── applauding
    ├── balloon_blowing
    ├── ...
    ├── unboxing
    └── waxing_legs
BABEL Structure
./data/babel
├── train
│   ├── 000000
│   │   ├── img_00001.jpg
│   │   ├── img_00002.jpg
│   │   └── ...
│   ├── 000002
│   └── ...
└── val
    ├── ...
    ├── 013286
    └── 013288

3. Extract Backgrounds for the Background Augmentation

python utils/extract_median_by_rawframes.py \
    --ann-file 'data/filelists/k400/filelist_k400_train_closed.txt' \
    --outdir 'data/median/k400' \
    --start-index 0 \
    --data-prefix 'data/k400/rawframes_resized'

Train and Test

Train

The training process has 2 stages.

  1. Pretrain TOL (Temporal Ordering Learning)
    source tools/dist_train.sh configs/tol.py 8 \
    --seed 0
    Then training result will be generated under work_dirs/tol/, which will be utilized in the next stage.
  2. GLAD
    source tools/dist_train.sh configs/glad.py 8 \
    --seed 3 \
    --validate --test-last --test-best

Test

source tools/dist_test.sh configs/glad.py $(find 'work_dirs/glad' -name '*best*.pth' | head -1) 8 \
--eval 'mean_class_accuracy' 'confusion_matrix'

Special Thanks

This project has been made possible through the generous funding and support of NCSOFT Corporation. We extend our sincere gratitude for their contribution and belief in our work.

License

This project is released under the BSD-3-Clause.

Citation

@inproceedings{leebae2024glad,
  title={{GLAD}: Global-Local View Alignment and Background Debiasing for Video Domain Adaptation},
  author={Lee, Hyogun and Bae, Kyungho and Ha, Seong Jong and Ko, Yumin and Park, Gyeong-Moon and Choi, Jinwoo},
  booktitle={Proceedings of the IEEE Winter Conference on Applications of Computer Vision (WACV)},
  year={2024}
}

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published