Skip to content

Third party:SpeechRecognition:Models:create

Bertrand Benoit edited this page Dec 7, 2019 · 2 revisions

title: Third-party:SpeechRecognition:Models:create permalink: /Third-party:SpeechRecognition:Models:create/

TOC

SphinxTrain

You can use SphinxTrain provided by CMU Sphinx. See SphinxTrain documentation.

=Hemera Speech Recognition Tool= Hemera project provides a little speech recognition tool allowing to create lexical and language models. Currently it only supports French language, but you may contribute to add support for other languages.

Get it

You can get it from source code.

==Third-party tools==

Tools

The following tools are required:

-> you must install them (or create symbolic link) in [HEMERA_TP_PATH](/Appendix#HEMERA_TP_PATH]]/_fromSource which has been created to help you keeping track of third-party tools you have installed for Hemera

The createModels.sh script will check for these tools availability.

===Requirements=== To begin, you need to prepare your [computer for compiling source code|Third-party:Prepare_to_compile_Source#Needed_packages]].

Installation

SRILM

  • download the 1.5.11 version from http://www.speech.sri.com/projects/srilm/download.html
  • uncompress it in [HEMERA_TP_PATH](/Appendix#HEMERA_TP_PATH]]/_fromSource
  • follow INSTALL file (particularly about the MACHINE_TYPE)
  • for 32 bits version, performed following instructions

make -s SRILM=$PWD World

  • for 64 bits version, performed following instructions

make -s SRILM=$PWD MACHINE_TYPE=i686-m64 World

LIA_PHON

WARNING: this tool does support x86_64 architecture, it must be compiled as ix86 even on x86_64 bits OS

If it is your case, you need [additional packages|Third-party:Prepare_to_compile_Source#Additional_packages_for_compiling_i686_version_on_x86_64_Operating_System]]. Then use the provided patch to update Makefile, forcing 32 bits compilation:

patch -N -p1 -s [HEMERA_TP_PATH](/Appendix#HEMERA_TP_PATH]]/_fromSource/lia_phon/Makefile < misc/lia_phon_32bits_compile.patch

* performed following instructions (it will create the tools, resources, and the 80k lexical) cd [HEMERA_TP_PATH](/Appendix#HEMERA_TP_PATH]]/_fromSource/lia_phon

make -s LIA_PHON_REP=$PWD all ressource lex80k

Sphinx3

  • follow [install instructions|Third-party:SpeechRecognition#cmusphinx3_installation]]

==Instructions== Create your own corpus, updating the file to fit your needs:

data/hemeraTranscript.txt

Then, launch the script

./createModels.sh

You can use the --copy option to automatically copy the created models in the corresponding directory of [HEMERA_TP_PATH](/Appendix#HEMERA_TP_PATH]].

If a tool is not available or if there is an error, it will be printed on standard output. Otherwise, lexical and language model will be created under the data/ sub-directory.

Category:HemeraBook/en Category:advanced

You can’t perform that action at this time.