It can therefore be used for the evaluation of multi-speaker Automated Speech Recognition (ASR) system. In particular, this script helps with evaluating the CMU Sphinx ASR system, to see how different acoustic and language models compare to each other.
However the transcripts are not verbatim. For example quite a few lectures start with an introduction of the speaker, which is not available in the transcripts. Some times, the speaker deviate from the transcripts as well. Therefore the results will often be slightly worse than one could expect.
The evaluation script relies on SoX being installed, and the PocketSphinx python bindings.
# apt-get install sox libsox-fmt-mp3 python-numpy python-pocketsphinx
Start by downloading the dataset using the provided script.
$ cd reith-lectures
Create a configuration file pointing to your acoustic and language models. An example configuration file is given in hub4_and_lm_giga_64k_vp_3gram.ini.example, using the HUB4 acoustic model bundled with Sphinx and a language model derived from the English Gigaword corpus.
Run the evaluation.
$ ./evaluate.py --directory reith-lectures --config sphinx-config.ini
If you want to run the evaluation using only pre-computed transcriptions, use the --lazy flag.
For example to run the evaluation on transcriptions derived using the example configuration file:
$ ./evaluate.py --directory reith-lectures-hub4-and-lm-giga-64k-vp-3gram --lazy true
Average WER: 0.556791
The full results of the above command are available in reith-lectures-hub4-and-lm-giga-64k-vp-3gram/evaluation-results.txt