Audio-visual Retrieval Challenge-21

Problem Description

Given a query example in one modality (audio/video) the task is to retrieve relevent examples in other modality (video/audio). For every data point, audio and video modality is available. The class name can also be considered as the third modality text which is only available for training data. The retrieval examples are correct if they are semantically similar to query, i.e. they share same class label as the query. At test time, only the paired audio and video modality features will be available.

Dataset Statistics

AudiosetZSL dataset will be used for the task. This dataset is proposed for the task of zero-shot classification and retrieval of videos and is curated from a large dataset, Audioset. For this challenge, only the seen classes from the dataset will be considered. It contains a total of 79795 training examples and 26587 validation example. Out of the total 26593 testing examples, a subset of it will be used for the final evaluation. We have provided the features for both audio and video, extracted using pre-trained networks. For a fair comparison of the approach it is mandatory for everyone to use the features provided. More details about the dataset and task can be found in these papers below.

Evaluation Metric

ClassAverage mAP will be used as the evaluation metric. Each retrieval example will produce an average precision (AP) score. Averageing AP for all the query from a particular class will give the mAP for that class. ClassAverage mAP is then obtained by averaging mAP for all the class. ClassAverage Map can be calculated for both audio to video and video to audio retrieval. The final score will be the average of both of them.

$$Final mAP = 0.5*(audio2video) + 0.5*(video2audio)$$

Code to get started

Data Download

Download the dataset from this link onto data folder.
Arrange as per the directory structure given in the readme.md file inside data folder.

Run Baselines (Unsupervised)

Different baseline codes using unsupervised approach are provided to start with.

Run python main_baseline.py for obtaining the baseline retrieval resutls from raw features directly.
Run python main_baseline.py -mode cca for obtaining the results using CCA.

Run Baselines (Supervised)

A supervised learning baseline using triplet loss is also provided.

Run python main_triplet.py to learn a neural network model for aligning all modalities using a triplet loss.
Run python evaluate_triplet.py to evaluate using the model learnt in the previous step.

The code is tested with

python3.8
torch==1.9.0
numpy==1.21.2
scipy==1.7.1
h5py==3.4.0
pandas===1.3.2

Submission

Submit a txt file with each row specifying the index of the retrieval samples sorted in decreasing order of similarity. A single txt file should be submitted containing both the audio to video and video to audio retrieval results. In the txt file the first half of the results should containg the index for audio to video retrieva where as the second half should contain the video to audio retrieval results. E.g. if there are N examples in the test set, then the text file should have 2N rows, where the first N rows correspond to retrieval index for audio to video retrieval and the next N rows should contain video to audio retrieval resuts. Please note that the text file required for submission can be generated by specifying out_txt=Truein calculate_both_map function.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
data		data
dataloader		dataloader
models		models
utils		utils
.gitignore		.gitignore
README.md		README.md
evaluate_triplet.py		evaluate_triplet.py
main_baseline.py		main_baseline.py
main_triplet.py		main_triplet.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data

data

dataloader

dataloader

models

models

utils

utils

.gitignore

.gitignore

README.md

README.md

evaluate_triplet.py

evaluate_triplet.py

main_baseline.py

main_baseline.py

main_triplet.py

main_triplet.py

Repository files navigation

Audio-visual Retrieval Challenge-21

Problem Description

Dataset Statistics

Evaluation Metric

Code to get started

Data Download

Run Baselines (Unsupervised)

Run Baselines (Supervised)

Submission

About

Releases

Packages

Contributors 2

Languages

ICVGIP-Challenge/AudioVisualRetrieval-21

Folders and files

Latest commit

History

Repository files navigation

Audio-visual Retrieval Challenge-21

Problem Description

Dataset Statistics

Evaluation Metric

Code to get started

Data Download

Run Baselines (Unsupervised)

Run Baselines (Supervised)

Submission

About

Resources

Stars

Watchers

Forks

Languages