Skip to content

Dual-Path Style Learning for End-to-End Noise-Robust Automatic Speech Recognition (DPSL-ASR).

License

Notifications You must be signed in to change notification settings

Alex-Songs/DPSL-ASR

 
 

Repository files navigation

DPSL-ASR (Dual-Path Style Learning for End-to-End Noise-Robust Automatic Speech Recognition)

Dual-Path Style Learning for End-to-End Noise-Robust Speech Recognition

Interactive Feature Fusion for End-to-End Noise-Robust Speech Recognition

Introduction

DPSL-ASR is a novel method for end-to-end noise-robust speech recognition. It has extended our prior work IFF-Net (Interactive Feature Fusion Network) with dual-path inputs and style learning, which achieved better ASR performance on RATS Channel-A dataset and CHiME-4 1-Channel Track Dataset.

Left figure: (a) joint SE-ASR approach, (b) IFF-Net baseline, (c) the proposed DPSL-ASR approach.

Right figure: back-end ASR module with style learning and consistency loss in our DPSL-ASR. The dashed lines denote sharing parameters.

If you find DPSL-ASR useful in your research, please use the following BibTeX entry for citation:

@article{hu2022dualpath,
  title={Dual-Path Style Learning for End-to-End Noise-Robust Speech Recognition}, 
  author={Hu, Yuchen and Hou, Nana and Chen, Chen and Chng, Eng Siong},
  journal={arXiv preprint arXiv:2203.14838},
  year={2022}
}

@article{hu2021interactive,
  title={Interactive Feature Fusion for End-to-End Noise-Robust Speech Recognition},
  author={Hu, Yuchen and Hou, Nana and Chen, Chen and Chng, Eng Siong},
  journal={arXiv preprint arXiv:2110.05267},
  year={2021}
}

Usage

Our code implementation is based on ESPnet. You can intall it directly using our provided ESPnet(v.0.9.6) folder, or install from official website and then add files from our repo. Use the command pip install -e . to install ESPnet.

In our foler, the running scripts are at egs2/rats_chA/asr_with_enhancement/{run_rats_chA_dpsl_asr, rats_chA_dpsl_asr}.sh, and the network code are at espnet2/{asr/, enh/, layers/}.

Tips:

  1. To go over the entire project, please start from the script egs2/rats_chA/asr_with_enhancement/run_rats_chA_dpsl_asr.sh
  2. To read the network code only, please start from the script espnet2/asr/dpsl_asr.py

About

Dual-Path Style Learning for End-to-End Noise-Robust Automatic Speech Recognition (DPSL-ASR).

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 48.2%
  • Shell 45.4%
  • Perl 5.6%
  • MATLAB 0.4%
  • Dockerfile 0.2%
  • M 0.1%
  • Makefile 0.1%