Usage Guide

Cross-Modal Relation-Aware Networks for Audio-Visual Event Localization

This repo holds the code for the work presented on ACM Multimedia 2020 [Paper]

Usage Guide

Prerequisites

We provide the implementation in PyTorch for the ease of use.

Install the requirements by runing the following command:

pip install -r requirements.txt

Code and Data Preparation

We highly appreciate @YapengTian for the shared features and code.

Download Features

Two kinds of features (i.e., Visual features and Audio features) are required for experiments.

Visual Features: You can download the VGG visual features from here.
Audio Features: You can download the VGG-like audio features from here.
Additional Features: You can download the features of background videos here, which are required for the experiments of the weakly-supervised setting.

After downloading the features, please place them into the data folder. The structure of the data folder is shown as follows:

data
├── audio_feature.h5
├── audio_feature_noisy.h5
├── labels.h5
├── labels_noisy.h5
├── mil_labels.h5
├── test_order.h5
├── train_order.h5
├── val_order.h5
├── visual_feature.h5
└── visual_feature_noisy.h5

Download Datasets (Optional)

You can download the AVE dataset from the repo here.

Training and testing CMRAN in a fully-supervised setting

You can run the following command for training and testing the model. We evaluate the model on the test set every epoch (set by the arg "eval_freq" in the configs/default_config.yaml file) when training.

bash supv_train.sh
# The argument "--snapshot_pref" denotes the path for saving checkpoints and code.

Evaluating

bash supv_test.sh

After training, there will be a checkpoint file whose name contains the accuracy on the test set and the number of epoch.

Training and testing CMRAN in a Weakly-supervised setting

Similar to training the model in a fully-supervised setting, you can run training and testing using the following commands:

Training

bash weak_train.sh

Evaluating

bash weak_test.sh

Citation

Please cite the following paper if you feel this repo useful to your research

@inproceedings{CMRAN2020Xu,
  author    = {Haoming Xu and
               Runhao Zeng and
               Qingyao Wu and
               Mingkui Tan and
               Chuang Gan},
  title     = {Cross-Modal Relation-Aware Networks for Audio-Visual Event Localization},
  booktitle   = {{ACM} International Conference on Multimedia},
  year      = {2020},
}

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
configs		configs
dataset		dataset
model		model
utils		utils
.gitignore		.gitignore
README.md		README.md
current_configs.yaml		current_configs.yaml
requirements.txt		requirements.txt
supv_main.py		supv_main.py
supv_test.sh		supv_test.sh
supv_train.sh		supv_train.sh
weakly_main.py		weakly_main.py
weakly_test.sh		weakly_test.sh
weakly_train.sh		weakly_train.sh

FloretCat/CMRAN

Folders and files

Latest commit

History

Repository files navigation

Cross-Modal Relation-Aware Networks for Audio-Visual Event Localization

Usage Guide

Prerequisites

Code and Data Preparation

Download Features

Download Datasets (Optional)

Training and testing CMRAN in a fully-supervised setting

Training and testing CMRAN in a Weakly-supervised setting

Citation

About

Resources

Stars

Watchers

Forks

Languages