Scripts for training command+english language models for Silvius
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Type Name Latest commit message Commit time
Failed to load latest commit information.


This project trains new dual command/English language models, intended to be
used for voice coding with Silvius. It relies on Kaldi. Usage:

1. Clone kaldi (we tested git version 69cd717 but any should work). Run make
   in tools/, then configure and make in src.

2. Clone this repository as a subdirectory inside kaldi/egs/tedlium.

3. Put your command words into corpus.txt, separated by whitespace, as in the
   output of `python grammar/ 1 | sort -u` from silvius frontend. You
   may wish to add the following words as commands:
   <unk> [BREATH] [COUGH] [NOISE] [SMACK] [UH] [UM]

4. Execute This script should take 15 minutes, once the pretrained
   models and data are downloaded (1.5GB).

5. If you have any non-English words, the script will point them out
   and exit. Write their pronunciations in phones in corpus.dict. To generate
   word pronunciations automatically, try ./ foobar.

6. The output is in exp/tri_6a and may be imported into silvius-backend with
   silvius-backend/models/ Models will be about 70MB.

Please join the silvius mailing list if you have any questions!!forum/silvius