UMA-ASR

This repository is the official implementation of "Unimodal Aggregation for CTC-based Speech Recognition".

This work has been accepted by ICASSP 2024.

Introduction

This project works on non-autoregressive automatic speech recognition. A unimodal aggregation (UMA) is proposed to segment and integrate the feature frames that belong to the same text token, and thus to learn better feature representations for text tokens. The frame-wise features and weights are both derived from an encoder. Then, the feature frames with unimodal weights are integrated and further processed by a decoder. Connectionist temporal classification (CTC) loss is applied for training. Moreover, by integrating self-conditioned CTC into the proposed framework, the performance can be further noticeably improved.

Get started

The proposed method is implemented using ESPnet2. So please make sure you have installed ESPnet successfully.
Roll back espnet to the specified version as follows：
```
git checkout v.202304
```

Clone the UMA-ASR codes by:

git clone https://github.com/Audio-WestlakeU/UMA-ASR

Copy the configurations of the recipes in the egs2 folder to the corresponding directory in "espnet/egs2/". At present, experiments have only been conducted on AISHELL-1, AISHELL-2, HKUST dataset. If you want to experiment on other Chinese datasets, you can refer to these configurations.
Copy the files in the espnet2 folder to the corresponding folder in "espnet/espnet2", and check that the comment path in the file header matches your path.
To experiment, follow the ESPnet's steps. You can implement UMA method by simply changing run.sh from the command line to our run_unimodal.sh. For example:
```
./run_unimodal.sh --stage 10 --stop_stage 13
```
Be careful to change the permissions of the bash files to executable.
```
chmod -x asr_unimodal.sh
chmod -x run_unimodal.sh
```

Citation

You can cite this paper like:

@article{fang2023unimodal,
    title={Unimodal Aggregation for CTC-based Speech Recognition},
    author={Ying Fang and Xiaofei Li},
    journal={arXiv preprint arXiv:2309.08150},
    year={2023}
}

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
egs2		egs2
espnet2		espnet2
README.md		README.md
uma.png		uma.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

UMA-ASR

Introduction

Get started

Citation

About

Releases

Packages

Languages

Audio-WestlakeU/UMA-ASR

Folders and files

Latest commit

History

Repository files navigation

UMA-ASR

Introduction

Get started

Citation

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages