Skip to content

A python implementation of “Learning Deep Direct-Path Relative Transfer Function for Binaural Sound Source Localization” [TASLP 2021]

License

Notifications You must be signed in to change notification settings

BingYang-20/DP-RTF-Learning

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DP-RTF-Learning

A python implementation of “Learning Deep Direct-Path Relative Transfer Function for Binaural Sound Source Localization”, IEEE/ACM Transactions on Audio, Speech, and Language Processing (TASLP), 2021.

  • Contributions
    • A DP-RTF learning framework that embeds the sensor signals to a low-dimensional localization feature space is designed, which disentangles the localization cues from other factors including source signals, noise, reverberation, etc.
      • a Novel DP-RTF Learning Network
      • leveraging Monaural Speech Enhancement to Improve the Robustness of DP-RTF Estimation
      • generalization to Unseen Binaural Configurations
    • The DP-RTF learning based localization method takes full use of the spatial and spectral cues, which is demonstrated to perform better than several other methods on both simulated and real-world data in the noisy and reverberant environment.

Datasets

Quick start

  • Preparation

    • Add soft link of "common" file to "DPRTF" file
      ln -s [original path] [target path]
      
    • Generate the lists of source signals and BRIRs, direct-path relative tranfer functions (DP-RTFs), room acoustic settings, and sensor signals for training, validation and test stages.
      python -m common.getData --stage [*] --data [*] 
      
  • Training

    python run.py --gpu-id [*]
    
  • Test

    python run.py --gpu-id [*] --test
    
  • Pretrained models

    • exp/00000000/model_12.pth: trained with fixed data
    • exp/00000001/model_52.pth: trained with random data (generated on-the-fly)

Citation

If you find our work useful in your research, please consider citing:

@article{yang2021dprtf,
    Author = "Bing Yang and Hong Liu and Xiaofei Li",
    Title = "Learning deep direct-path relative transfer function for binaural sound source localization",
    Journal = "{IEEE/ACM} Transactions on Audio, Speech, and Language Processing (TASLP)",
    Volume = {29},	
    Pages = {3491-3503},
    Year = {2021}}
@InProceedings{yang2021dprtf1,
    author = "Bing Yang and Xiaofei Li and Hong Liu",
    title = "Supervised direct-path relative transfer function learning for binaural sound source localization",
    booktitle = "Proceedings of {IEEE} International Conference on Acoustics, Speech and Signal Processing (ICASSP)",
    year = "2021",
    pages = "825-829"}

Licence

MIT

About

A python implementation of “Learning Deep Direct-Path Relative Transfer Function for Binaural Sound Source Localization” [TASLP 2021]

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Languages