Skip to content

A python implementation of “SRP-DNN: Learning Direct-Path Phase Difference for Multiple Moving Sound Source Localization” [ICASSP 2022]

License

Notifications You must be signed in to change notification settings

BingYang-20/SRP-DNN

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

33 Commits
 
 
 
 
 
 
 
 

Repository files navigation

SRP-DNN

A python implementation of “SRP-DNN: Learning direct-path phase difference for multiple moving sound source localization”, IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2022.

  • Contributions
    • Learning competing and time-varying direct-path inter-channel phase differences (or IPD sequence) for multiple moving sources
      • avoids the assignment ambiguity and the problem of uncertain output-dimension encountered when simultaneously predicting multiple targets
      • exhibits reliable peaks around the actual directions of sources by the constructed spatial spectrum
    • Iterative source detection and localization
      • separates the merged peaks of spatial spectrum caused by the interaction between sources
      • achieves superior performance for the azimuth and elevation estimation of multiple moving sound sources
  • Suited cases
    • good or adverse noisy and reverberant scenario
    • single or multiple sound sources
    • static or moving source sources
    • the number of sound sources is known or unknown
    • different topologies of microphone arrays

Datasets

Quick start

  • Preparation

    • copy the train-clean-100, dev-clean and test-clean folders of LibriSpeech database to SRP-DNN/data/SouSig/LibriSpeech
    • install: numpy, scipy, soundfile, tqdm, matplotlib, gpuRIR, webrtcvad, etc.
  • Training

    python RunSRPDNN.py --train --gen-on-the-fly --gpu-id [*] (--use-amp)
    
  • Evaluation

    • use GPU
    python RunSRPDNN.py --test --gpu-id [*] --time 00000001 --eval-mode locata pred eval (--use-amp)
    
    • use CPU
    python RunSRPDNN.py --test --no-cuda --time 00000001 --eval-mode locata pred eval (--use-amp)
    
  • Pretrained models

    • exp/00000002/best_model.tar

Citation

If you find our work useful in your research, please consider citing:

@InProceedings{yang2022srpdnn,
    author = "Bing Yang and Hong Liu and Xiaofei Li",
    title = "SRP-DNN: Learning direct-path phase difference for multiple moving sound source localization",
    booktitle = "Proceedings of {IEEE} International Conference on Acoustics, Speech and Signal Processing (ICASSP)",
    year = "2022",
    pages = "721-725"}

Reference code

Licence

MIT

About

A python implementation of “SRP-DNN: Learning Direct-Path Phase Difference for Multiple Moving Sound Source Localization” [ICASSP 2022]

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages