Pop Up Archive Kaldi release
- [Kaldi] (https://github.com/kaldi-asr/kaldi)
- [SoX] (http://sox.sourceforge.net/) for audio files
- [ffmpeg] (https://www.ffmpeg.org/) for converting video files
- CMUSeg (use
- IRSTLM (use
- sclite (see
- Latest version of [exp dir] (https://sourceforge.net/projects/popuparchive-kaldi/files/), should be (re-)named
##Notes about Kaldi It is recommended that you review the [Kaldi documentation] (http://kaldi-asr.org/doc/) before you begin, especially if you intend to modify the compiled model included on Sourceforge.
Each model you experiment with should have its own directory. Start by putting the exp dir from Sourceforge in the sample_experiment dir.
Preliminaries for your experiment dir (e.g. sample_experiment)
- Make sure
set-kaldi-path.shmatch your Kaldi location.
- Create the following sym links in your experiment dir
ln -s [KALDI-PATH]/egs/wsj/s5/steps [EXPT-DIR]
ln -s [KALDI-PATH]/egs/wsj/s5/utils [EXPT-DIR]
- Create a directory to store results from sclite, e.g.
- Convert audio to mono 16-bit Signed Integer PCM, sample rate 16K
- Example with sox
sox input.mp3 -c 1 -r 16000 -L output.wav
- Feel free to increase
exp/run.shdepending on how much memory you have. In decoding, expect each job to use at least 5 gb.
- If you have ground truth transcripts and wish to evaluate the accuracy of your output, create a text file in the following format:
- The transcript for each file should be on a single line, all lower case without punctuation.
- The line of transcript text should end with
([FILENAME]-1). Do not use dashes or underscores in the file name.
- Include a new line at the end of the file.
Run kaldi speech recognition on directory of wav files:
python run_kaldi.py [EXPERIMENT-DIR] [WAV-DIR]
Run evaluation (
[RESULTS-DIR] will be created and sclite files will be written there):
python run_sclite.py [KALDI-OUTPUT-DIR] [RESULTS-DIR] [REF-FILE-PATH]
##Building on the model You can use the current model as is, or add your own lexicon or language model.
- You can add new words to the lexicon by editing
sh prep_lang_local.sh exp/dict exp/tmp_lang exp/lang. Make sure your pronunciations only use phones that are already in the lexicon.
sh add_grammar.sh [LM-FILEPATH]to create a new bigram language model based on an LM textfile and to update the overall model. Use
sh create_big_lm.sh [LM-FILEPATH] lang_newlm lang_lmrescoreto create a new 5-gram language for rescoring.
- Additional raw LM text can be obtained from [Open SLR] (http://www.openslr.org/27/). They also include trigram and 4-gram pre-trained LMs, however we recommend using a bigram baseline and 5-gram rescoring.
This repo is based on [work] (https://github.com/APMG/audio-search) done by APM with Cantab Research.