Skip to content

Enhanced sound event localization and detection in real 360-degree audio-visual soundscapes (DCASE task3 format)

Notifications You must be signed in to change notification settings

aromanusc/SoundQ

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

71 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SoundQ — Enhanced sound event localization and detection in real 360-degree audio-visual soundscapes.

Features

  • An audio-visual synthetic data generator with spatial audio and 360-degree video.

  • A suite of scripts to perform data_augmentation on 360-degree audio and video.

    • Integrating audio channel swapping (ACS) as per Wang et al.

    • Integrating video pixel swapping (VPS) as per Wang et al.

  • An enhanced audio-visual SELDNet model with comparable performance to the audio-only SELDNet23

    • The model integrates Detic, but any other detection model can also be integrated within the training pipeline.

Installation

See installation instructions.

Results on development dataset

We benchmark our model following the DCASE Challenge 2023 Task3 SELD evaluation metric.

The following table includes only the best performing system (as documented in DCASE results). The evaluation metric scores for the test split of the development dataset is given below.

Model Dataset ER20° F20° LECD LRCD
AO SELDNet23 (baseline) Ambisonic* 0.57 29.9 % 21.6° 47.7 %
AV SELDNet23 (baseline) Ambisonic + Video 1.07 14.3 % 48.0 ° 35.5 %
AV SELDNet23 (ours) Ambisonic* + Video 0.65 24.9 % 18.7° 37.5 %

Legend: AO=audio-only, AV=audio-visual, FOA=first order ambisonics format, *=FOA + Multi-ACCDOA

Citation

If you find our work useful, please cite our paper:

@article{roman2024enhanced,
  title={Enhanced Sound Event Localization and Detection in Real 360-degree audio-visual soundscapes},
  author={Roman, Adrian S and Balamurugan, Baladithya and Pothuganti, Rithik},
  journal={arXiv preprint arXiv:2401.17129},
  year={2024}
}

About

Enhanced sound event localization and detection in real 360-degree audio-visual soundscapes (DCASE task3 format)

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages