Skip to content

ASR tutorial for Libri Speech dataset using Clova call

License

Notifications You must be signed in to change notification settings

InhaDSP/ASR_tutorial

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Digital Speech Processing : Automatic Speech Recoginition with LibriSpeech dataset

How to train

# Set the environment
0. Virtual environment (ex: conda) is recommended 
1. ' pip install -r requirements.txt' or install : numpy, pandas, python-Levenshtein, librosa, numba, matplotlib
2. Install pytorch considering your CUDA version : https://pytorch.org/get-started/locally/
3. ' unzip Libri_data.zip -d ./DSP_project/ ' # unzip dataset
4. Set PATHs in train.py and test.py (+ set data path at ./Libri_Data/*.csv files)

# train and test
5. ' python DSP_ASR_LibriSpeech/train.py '
6. ' python DSP_ASR_LibriSpeech/test.py '  

Recommended Environmet

  • Ubuntu 20.04.3 LTS
  • CUDA Version: 11.5
  • Python 3.8

Dataset

: Part of Libri Speech Clean data 360 (label length is under 150) - Download it from online

Base Model

: Location based Attention - CER 36.318% WER 73.181% at 160 Epochs - CER 31.254% WER 66.985% at 200 Epochs

Performance metric

: Word Error Rate = (S + I + D) / N - S : # of substitutions - I : # of deletions - D : # of insertions - N : # of words in the target label

Reference

For Beginners

Keywords, good to study

  • Listen, attend and spell
  • SpecAugment
  • Learning scheduler
  • teacher forcing

Good reference

For students taking DSP class

submit as zip file, including ..

  • All code files you uesd for training
  • One best trained model(.pth) and test reult
  • Report as pdf (detail will be announced)

About

ASR tutorial for Libri Speech dataset using Clova call

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages