Skip to content

[ECCV 2022] Dual-Evidential Learning for Weakly-supervised Temporal Action Localization

License

Notifications You must be signed in to change notification settings

MengyuanChen21/ECCV2022-DELU

Repository files navigation

Dual-Evidential Learning for Weakly-supervised Temporal Action Localization

Paper

Mengyuan Chen, Junyu Gao, Shicai Yang, Changsheng Xu

European Conference on Computer Vision (ECCV), 2022.

Update:2024/04/19

We have further optimized the code, and the provided pre-trained model can now achieve the following performance on THUMOS14:

@0.1 @0.2 @0.3 @0.4 @0.5 @0.6 @0.7 0.1-0.5 0.1-0.7
DELU (Paper) 71.5 66.2 56.5 47.7 40.5 27.2 15.3 56.5 46.4
DELU (Latest) 72.1 66.5 57.0 48.1 40.8 27.8 15.6 56.9 46.8

Table of Contents

  1. Introduction
  2. Preparation
  3. Testing
  4. Training
  5. Citation

Introduction

Weakly-supervised temporal action localization (WS-TAL) aims to localize the action instances and recognize their categories with only video-level labels. Despite great progress, existing methods suffer from severe action-background ambiguity, which mainly comes from background noise introduced by aggregation operations and large intra-action variations caused by the task gap between classification and localization. To address this issue, we propose a generalized evidential deep learning (EDL) framework for WS-TAL, called Dual-Evidential Learning for Uncertainty modeling (DELU), which extends the traditional paradigm of EDL to adapt to the weakly-supervised multi-label classification goal. Specifically, targeting at adaptively excluding the undesirable background snippets, we utilize the video-level uncertainty to measure the interference of background noise to video-level prediction. Then, the snippet-level uncertainty is further induced for progressive learning, which gradually focuses on the entire action instances in an ``easy-to-hard'' manner. Extensive experiments show that DELU achieves state-of-the-art performance on THUMOS14 and ActivityNet1.2 benchmarks.

avatar

Prerequisites

Requirements and Dependencies:

Here we list our used requirements and dependencies.

  • Linux: Ubuntu 20.04 LTS
  • GPU: GeForce RTX 3090
  • CUDA: 11.1
  • Python: 3.7.11
  • PyTorch: 1.11.0
  • Numpy: 1.21.2
  • Pandas: 1.3.5
  • Scipy: 1.7.3
  • Wandb: 0.12.11
  • Tqdm: 4.64.0

THUMOS-14 Dataset:

We use the 2048-d features provided by MM 2021 paper: Cross-modal Consensus Network for Weakly Supervised Temporal Action Localization. You can get access of the dataset from Google Drive or Baidu Disk. The annotations are included within this package.

ActivityNet-v1.2 Dataset:

We also use the features provided in MM2021-CO2-Net. The features can be obtained from here. The annotations are included within this package.

Testing

Download the pretrained models from Google Drive, and put them into "./download_ckpt/".

Test on THUMOS-14

Change "path/to/CO2-THUMOS-14" in the script into your own path to the dataset, and run:

cd scripts/
./test_thumos.sh

Test on ActivityNet-v1.2

Change "path/to/CO2-ActivityNet-12" in the script into your own path to the dataset, and run:

cd scripts/
./test_activitynet.sh

Training

Change the dataset paths as stated above, and run:

cd scripts/
./train_thumos.sh

or

cd scripts/
./train_activitynet.sh

Citation

If you find the code useful in your research, please cite:

@inproceedings{mengyuan2022ECCV_DELU,
  author = {Chen, Mengyuan and Gao, Junyu and Yang, Shicai and Xu, Changsheng},
  title = {Dual-Evidential Learning for Weakly-supervised Temporal Action Localization},
  booktitle = {European Conference on Computer Vision (ECCV)},
  year = {2022}
}

License

See MIT License

Acknowledgement

This repo contains modified codes from:

We sincerely thank the owners of all these great repos!