Skip to content

Commit

Permalink
notes
Browse files Browse the repository at this point in the history
  • Loading branch information
zxie committed Sep 22, 2015
1 parent d866ea7 commit c885989
Showing 1 changed file with 3 additions and 1 deletion.
4 changes: 3 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1,14 +1,16 @@
# stanford-ctc
Neural net code for lexicon-free speech recognition with connectionist temporal classification

This repository contains code for a bi-directional RNN training using the CTC loss function.
This repository contains code for a bi-directional RNN training using the CTC loss function.
We assume you have separately prepared a dataset of speech utterances with audio features and text transcriptions.

For more information please see the [project page](http://deeplearning.stanford.edu/lexfree/) and the [character language modeling repository](https://github.com/zxie/nn)

Our neural net code runs on the GPU using [Cudamat](https://github.com/cudamat/cudamat)
We use a forked version of Cudamat to add an extra function which you can find [here](https://github.com/awni/cudamat). If you need a more recent version of cudamat you can likely take just the extra function and apply the patch to the most recent version of Cudamat.

The latest code is in the directory `ctc_fast`; please set your `PYTHONPATH` accordingly. The script `runNNet.py` should be the starting point for training the BRNN model -- you'll have to modify `run_cfg.py` and `decoder_config.py`. Unfortunately the `run*.sh` scripts in `{timit/wsj/swbd}-utils` are outdated but you can refer to them for reasonable parameter settings.

Example feat#.bin, keys#.txt, and alis#.txt files for small subset of TIMIT training data can be
found [here](http://deeplearning.stanford.edu/lexfree/timit/).

Expand Down

0 comments on commit c885989

Please sign in to comment.