Auto-regressive and Non-autoregressive Transformer for Speech Recognition

This is the implementation of our work "CTC alignment-based Non-autoregressive Speech Transformer". Some codes are borrowed from Espnet and transformer implementation in Harvard NLP group.

News:

Using pretrained Hubert Encoder for CASS-NAT.

Requirements

Python 3.7
Pytorch 1.11
Kaldi

We didn't test it for a higher version of Python or Pytorch. Other required python packages are in requirments.txt. You can install it using:

pip install -r requirements.txt

Example, run librispeech (scripts under libri_100 are tested).

Go to egs/librispeech. Modify path.sh and specify the kaldi path (for feature extraction and etc.).
Check the conf/transformer.yaml and make revisions on hyparameters if you like.
./run.sh. I suggest to run the script step by step.
./run_cassnat.sh. Run the non-autoregressive model. You can directly run this step if you want to skip the Auto-regressive transformer.

All the python codes are under src/. Some codes may not well organized since this is still in the period of experiments

Results (need updates for conformer encoder and hubert encoder).

Librispeech (WER)

Methods	LM	dev-clean	test-clean	dev-other	test-other	RTF(s)
AT	no	3.4	3.6	8.5	8.5	0.562
-	yes	2.5	2.7	5.7	5.8	-
ConAT	no	2.7	3.0	7.2	7.0	0.499
CASSNAT	no	3.7	3.8	9.2	9.1	0.011
-	yes	3.3	3.3	8.0	8.1	-
ImpCASS	no	2.8	3.1	7.3	7.2	0.014

Aishell1 (CER)

Methods	LM	dev	test
AT	no	5.4	5.9
CASSNAT	no	5.3	5.8
ImpCASS	no	4.9	5.4

Citations

If you find this repository useful, please consider citing our work:

@inproceedings{cassnat,
  title={Cass-nat: Ctc alignment-based single step non-autoregressive transformer for speech recognition},
  author={Fan, Ruchao and Chu, Wei and Chang, Peng and Xiao, Jing},
  booktitle={IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
  pages={5889--5893},
  year={2021},
  organization={IEEE}
}

@inproceedings{improvedcassnat,
  author={Ruchao Fan and Wei Chu and Peng Chang and Jing Xiao and Abeer Alwan},
  title={{An Improved Single Step Non-Autoregressive Transformer for Automatic Speech Recognition}},
  year=2021,
  booktitle={Proc. Interspeech 2021},
  pages={3715--3719},
  doi={10.21437/Interspeech.2021-1955}
}

@article{studycassnat,
  author    = {Ruchao Fan and Wei Chu and Peng Chang and Abeer Alwan},
  title     = {A {CTC} Alignment-based Non-autoregressive Transformer for End-to-end Automatic Speech Recognition},
  journal   = {IEEE Transactions on Audio, Speech and Language Processing},
  doi       = {10.1109/TASLP.2023.3263789},
  year      = {2023}
}

Name		Name	Last commit message	Last commit date
Latest commit History 74 Commits
egs		egs
src		src
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

egs

egs

src

src

utils

utils

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

requirements.txt

requirements.txt

Repository files navigation

Auto-regressive and Non-autoregressive Transformer for Speech Recognition

News:

Requirements

Example, run librispeech (scripts under libri_100 are tested).

Results (need updates for conformer encoder and hubert encoder).

Citations

About

Releases

Packages

Languages

License

Diamondfan/cassnat_asr

Folders and files

Latest commit

History

Repository files navigation

Auto-regressive and Non-autoregressive Transformer for Speech Recognition

News:

Requirements

Example, run librispeech (scripts under libri_100 are tested).

Results (need updates for conformer encoder and hubert encoder).

Citations

About

Resources

License

Stars

Watchers

Forks

Languages