Skip to content

Sreyan88/RECAP

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

RECAP: Retrieval-Augmented Audio Captioning

This is the official repository for the paper RECAP: Retrieval-Augmented Audio Captioning accepted at ICASSP 2024 for oral presentation.

[Paper] [CLAP Checkpoints] [Weakly labeled captions for AudioSet, AudioCaps, and Clotho]

We present RECAP (REtrieval-Augmented Audio CAPtioning), a novel and effective audio captioning system that generates captions conditioned on an input audio and other captions similar to the audio retrieved from a datastore. Additionally, our proposed method can transfer to any domain without the need for any additional fine-tuning. To generate a caption for an audio sample, we leverage an audio-text model CLAP to retrieve captions similar to it from a replaceable datastore, which are then used to construct a prompt. Next, we feed this prompt to a GPT-2 decoder and introduce cross-attention layers between the CLAP encoder and GPT-2 to condition the audio for caption generation. Experiments on two benchmark datasets, Clotho and AudioCaps, show that RECAP achieves competitive performance in in-domain settings and significant improvements in out-of-domain settings. Additionally, due to its capability to exploit a large text-captions-only datastore in a training-free fashion, RECAP shows unique capabilities of captioning novel audio events never seen during training and compositional audios with multiple events. To promote research in this space, we also release 150,000+ new weakly labeled captions for AudioSet, AudioCaps, and Clotho. image

Setup

  1. You are required to install the dependencies: pip install -r requirements.txt. If you have conda installed, you can run the following:
cd RECAP && \
conda create -n recap python=3.10 && \
conda activate recap && \
pip install -r requirements.txt
  1. After updating the paths in recap.sh, run the following command:
bash recap.sh

Using CLAP Checkpoints

Once you have downloaded our CLAP checkpoints, you can use them for evaluation using CLAP.

Citation

@INPROCEEDINGS{10448030,
  author={Ghosh, Sreyan and Kumar, Sonal and Reddy Evuru, Chandra Kiran and Duraiswami, Ramani and Manocha, Dinesh},
  booktitle={ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)}, 
  title={Recap: Retrieval-Augmented Audio Captioning}, 
  year={2024},
  volume={},
  number={},
  pages={1161-1165},
  keywords={Training;Signal processing;Benchmark testing;Acoustics;Decoding;Feeds;Speech processing;Automated audio captioning;multimodal learning;retrieval-augmented generation},
  doi={10.1109/ICASSP48485.2024.10448030}}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published