Update documentation and README

- Update docstrings for the SphinxEngine class methods and properties. - Add sphinx_engine documentation page. - Remove the Sphinx engine README.md file. - Remove section on the Sphinx engine from the README and reference the new page on Read the Docs.
dictation-toolbox · Sep 24, 2018 · 3222075 · 3222075
1 parent 55643f7
commit 3222075
Show file tree

Hide file tree

Showing 8 changed files with 413 additions and 205 deletions.
diff --git a/README.md b/README.md
@@ -26,52 +26,19 @@ There is also a gitter channel:
 [![Join the chat at https://gitter.im/sphinx-dragonfly](https://badges.gitter.im/Join%20Chat.svg)](https://gitter.im/sphinx-dragonfly)
 
 
-CMU Sphinx and Installation
+Installation
 ----------------------------------------------------------------------------
 
-This fork of dragonfly has an engine implementation using the open source
-CMU Pocket Sphinx speech recognition engine. You can read more about the
-CMU Sphinx speech recognition projects
-[here](https://cmusphinx.github.io/wiki/).
-
-This version of dragonfly should work normally with the DNS and WSR engines
-and can be installed for that purpose using something like:
+Dragonfly can be installed by cloning this repository and running the
+following (or similar) in the root directory:
 
 ``` Shell
 python setup.py install
 ```
 
-To use the Pocket Sphinx engine you will need to install the
-[sphinxwrapper](https://github.com/Danesprite/sphinxwrapper),
-[pyjsgf](https://github.com/Danesprite/pyjsgf), and
-[pyaudio](http://people.csail.mit.edu/hubert/pyaudio/) Python packages.
-
-You can install *sphinxwrapper* and *pyjsgf* from the git submodules by
-running the following commands:
-``` Shell
-git clone --recursive https://github.com/Danesprite/dragonfly.git
-git submodule foreach python setup.py install
-```
-
-Then install dragonfly with the 'sphinx' extra using `pip`, which will
-install other dependencies:
-``` Shell
-pip install .[sphinx]
-```
-
-Once it's installed, you'll need to copy the *sphinx_module_loader.py*
-script from *dragonfly/examples* into the folder with your grammars and run
-it using:
-``` Shell
-python sphinx_module_loader.py
-```
-
-This is the equivalent to the 'core' directory that NatLink uses to load
-grammar modules.
-
-There is more information on how the engine works, what the limitations 
-are, the to-do list and more
-[here](dragonfly/engines/backend_sphinx/README.md).
+To use the CMU Pocket Sphinx engine, see the
+[relevant documentation page](http://dragonfly2.readthedocs.org/en/latest/sphinx_engine.html)
+on it.
 
 
 Features
@@ -111,7 +78,7 @@ Existing command modules
 
 The related resources page of Dragonfly's documentation has a
 section on
-[command modules](http://dragonfly.readthedocs.org/en/latest/related_resources.html#command-modules)
+[command modules](http://dragonfly2.readthedocs.org/en/latest/related_resources.html#command-modules)
 which lists various sources.
 
 

diff --git a/documentation/engines.txt b/documentation/engines.txt
@@ -6,6 +6,13 @@ Dragonfly supports multiple speech recognition engines as its backend.
 The *engines* sub-package implements the interface code for each
 supported engine.
 
+There is a separate page on the CMU Pocket Sphinx engine:
+
+.. toctree::
+    :titlesonly:
+    :maxdepth: 1
+
+    sphinx_engine
 
 EngineBase class
 ----------------------------------------------------------------------------
@@ -31,6 +38,9 @@ Engine backends
 .. automodule:: dragonfly.engines.backend_sapi5
    :members:
 
+.. automodule:: dragonfly.engines.backend_sphinx
+   :members:
+
 
 Dictation container classes
 ----------------------------------------------------------------------------

diff --git a/documentation/index.txt b/documentation/index.txt
@@ -11,6 +11,7 @@ It currently supports the following speech recognition engines:
  - *Dragon NaturallySpeaking* (DNS), a product of *Nuance*
  - *Windows Speech Recognition* (WSR), included with Microsoft 
    Windows Vista, Windows 7, and freely available for Windows XP
+ - *CMU Pocket Sphinx* (with caveats)
 
 Dragonfly's documentation is available online at
 `Read the Docs <http://dragonfly2.readthedocs.org/en/latest/>`_.

diff --git a/documentation/sphinx_engine.txt b/documentation/sphinx_engine.txt
@@ -0,0 +1,292 @@
+.. _RefSphinxEngine:
+
+CMU Pocket Sphinx dragonfly engine
+============================================================================
+
+This version of dragonfly contains an engine implementation using the open
+source, cross-platform CMU Pocket Sphinx speech recognition engine. You can
+read more about the CMU Sphinx speech recognition projects on the
+`CMU Sphinx wiki`_.
+
+
+Setup
+----------------------------------------------------------------------------
+
+There are three dependencies for using the Pocket Sphinx engine:
+
+- `pyaudio <http://people.csail.mit.edu/hubert/pyaudio/>`_
+- `pyjsgf <https://github.com/Danesprite/pyjsgf>`_
+- `sphinxwrapper <https://github.com/Danesprite/sphinxwrapper>`_
+
+*sphinxwrapper* must be installed manually at the moment. You can install
+it from the git submodule by running the following::
+
+  git clone --recursive https://github.com/Danesprite/dragonfly.git
+  git submodule foreach python setup.py install
+
+You can install the other packages using::
+
+  pip install pyjsgf pyaudio
+
+Once the dependencies are installed, you'll need to copy the
+`dragonfly/examples/sphinx_module_loader.py`_ script into the folder
+with your grammar modules and run it using::
+
+  python sphinx_module_loader.py
+
+
+This is the equivalent to the 'core' directory that NatLink uses to load
+grammar modules.
+
+
+Cross-platform Engine
+----------------------------------------------------------------------------
+
+Pocket Sphinx runs on most platforms, including on architectures other than
+x86, so it only makes sense that the Pocket Sphinx dragonfly engine
+implementation should work on non-Windows platforms like macOS as well as on
+Linux distributions. To this effect, I've made an effort to mock
+Windows-only functionality for non-Windows platforms for the time being to
+allow the engine components to work correctly regardless of the platform.
+
+Using dragonfly with a non-Windows operating system can already be done with
+`Aenea`_ using the existing *NatLink* engine. Aenea communicates with a
+separate Windows system running *NatLink* and *DNS* over a network
+connection and has server support for Linux (using X11), macOS, and Windows.
+
+
+Engine Configuration
+----------------------------------------------------------------------------
+
+This engine is configurable via the engine configuration module (see the
+`default engine config module`_).
+
+To change the engine configuration, create a *config.py* file in the same
+directory as *sphinx_engine_loader.py* and make your config changes.
+
+The config module should have each of the following attributes defined:
+
+- ``DECODER_CONFIG`` -- configuration object for the Pocket Sphinx decoder.
+- ``LANGUAGE`` -- user language for the engine to use (default: ``"en"``).
+- ``NEXT_PART_TIMEOUT`` -- timeout in seconds for speaking the next part of
+  a rule involving dictation. If set to 0, there will be no timeout.
+- ``PYAUDIO_STREAM_KEYWORD_ARGS`` -- keyword arguments dictionary given to
+  :meth:`PyAudio.open` in :meth:`recognise_forever`. Some values are
+  also used in :meth:`process_wave_file`. The default values
+  assume a 16kHz acoustic model is used.
+- ``START_ASLEEP`` -- boolean value for whether the engine should start in
+  a sleep state (default: ``True``).
+- ``TRAINING_DATA_DIR`` -- directory to store recorded utterances and
+  transcriptions for training (default: ``"training/"``). Relative paths
+  will be interpreted as relative to the module loader's directory. Set to
+  ``None`` to disable training data recording.
+- ``WAKE_PHRASE`` -- the keyphrase to listen for when in sleep mode
+  (default: ``"wake up"``).
+- ``WAKE_PHRASE_THRESHOLD`` -- threshold value* for the wake keyphrase
+  (default: ``1e-20``).
+- ``SLEEP PHRASE`` -- the keyphrase to listen for to enter sleep mode
+  (default: ``"go to sleep"``)
+- ``SLEEP_PHRASE_THRESHOLD`` -- threshold value* for the sleep keyphrase
+  (default: ``1e-40``). 
+- ``START_TRAINING_PHRASE`` -- keyphrase to listen for to start a training
+  session where no processing occurs.
+  (default: ``"start training session"``).
+- ``START_TRAINING_THRESHOLD`` -- threshold value* for the start training
+  keyphrase.
+  (default: ``1e-48``). 
+- ``END_TRAINING_PHRASE`` -- keyphrase to listen for to end a training
+  session if one is in progress.
+  (default: ``"end training session"``).
+- ``END_TRAINING_THRESHOLD`` -- threshold value* for the end training
+  keyphrase.
+  (default: ``1e-45``). 
+
+\* Threshold values need to be set for each keyphrase. The `CMU Sphinx LM
+tutorial`_ has some advice on keyphrase threshold values.
+
+
+Pocket Sphinx Decoder Configuration
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+The ``DECODER_CONFIG`` object initialised in the engine config module can be
+used to set various Pocket Sphinx decoder options. For instance, the
+following line silences the decoder's log output::
+
+  DECODER_CONFIG.set_string("-logfn", os.devnull)
+
+There does not appear to be much documentation on these options outside of
+the `pocketsphinx/cmdln_macro.h`_ and `sphinxbase/fe.h`_ header files.
+If this is incorrect or has changed, feel free to suggest an edit.
+
+Probably the easiest way of seeing the available options and their default
+and current values is to comment the above line in the engine config module
+and examine the decoder log output.
+
+
+Changing Models and Dictionaries
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+The ``DECODER_CONFIG`` object can be used to configure the pronunciation
+dictionary as well as the acoustic and language models. To do this, add
+something like the following in the config module::
+
+  DECODER_CONFIG.set_string('-hmm', '/path/to/acoustic-model-folder')
+  DECODER_CONFIG.set_string('-lm', '/path/to/lm-file.lm')
+  DECODER_CONFIG.set_string('-dict', '/path/to/dictionary-file.dict')
+
+The language model, acoustic model and pronunciation dictionary should all
+use the same language or language variant. See the `CMU Sphinx wiki`_ for
+a more detailed explanation of these components.
+
+Engine API
+----------------------------------------------------------------------------
+
+.. autoclass:: dragonfly.engines.backend_sphinx.engine.SphinxEngine
+   :members:
+
+
+
+Improving Speech Recognition Accuracy
+----------------------------------------------------------------------------
+
+CMU Pocket Sphinx can have some trouble recognising what was said
+accurately. To remedy this, you may need to adapt the acoustic model that
+Pocket Sphinx is using. This is similar to how Dragon sometimes requires
+training. The CMU Sphinx `adaption tutorial`_ covers this topic. There is
+also a `YouTube video on model adaption`_.
+
+Adapting your model may not be necessary; there might be other issues with
+your setup. There is more information on tuning the recognition accuracy in
+the CMU Sphinx `tuning tutorial`_.
+
+By default, the engine will record what you say into wave and transcription
+files compatible with the Sphinx accoustic model adaption process. The files
+are placed in the directory specified by the engine's ``TRAINING_DATA_DIR``
+configuration option.
+
+There are built-in key phrases for starting and ending training sessions
+where no grammar rule processing will occur. Key phrases will still be
+processed. See the ``START_TRAINING_PHRASE`` and ``END_TRAINING_PHRASE``
+engine configuration options. One use case for the training mode is training
+commands that take a long time to execute their actions or are dangerous.
+Perhaps such commands keep getting falsely recognised and they need more
+training.
+
+To use the training files, you will need to correct any incorrect phrases
+in the *training.transcription* file and then use the
+`SphinxTrainingHelper`_ bash script to adapt your model. This script makes
+the process considerably easier, although you may still encounter problems.
+You should be able to play the wave files using most media players (e.g.
+VLC, Windows Media Player, aplay) if you need to.
+
+You will want to remove the wave and transcription files after a successful
+adaption without the engine running or with ``TRAINING_DATA_DIR`` set to
+``None``.
+
+
+Limitations
+----------------------------------------------------------------------------
+
+This engine has a few limitations, most notably with the spoken language
+support and dragonfly's :class:`Dictation` functionality. That said, most of
+the grammar functionality will work perfectly.
+
+
+Dictation
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Unfortunately, the 'Dictation' support that DNS and WSR provide is difficult
+to reproduce with the CMU Sphinx engines. They don't support speaking grammar
+rules that include :class:`Dictation` elements, although they will work for
+this engine, you'll just have to pause between speaking the grammar and
+dictation parts of rules that use :class:`Dictation` extras.
+
+For those interested, this is done by segmenting rules and using a Pocket
+Sphinx language model search to recognise the dictation parts.
+
+There is a timeout period for the next parts of such rules. If the timeout
+is reached the engine will process any other matched rules or fail to
+recognise altogether. The timeout peroid is set by the
+``NEXT_PART_TIMEOUT`` engine configuration option. If set to 0, the engine
+will wait until speech starts again and process the next part if it is
+spoken.
+
+'Dictation' output also won't have words properly capitalised as they
+are when using DNS, all words will be in lowercase. Additionally,
+punctuation words like "comma" or "apostrophe" won't have special output,
+although such functionality can be added either through grammars or
+processing of the dictation output.
+
+Unknown words
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+CMU Pocket Sphinx uses pronunciation dictionaries to lookup phonetic
+representations for words in grammars, language models and key phrases in
+order to recognise them. If you use words in your grammars and/or key
+phrases that are *not* in the dictionary, a message similar to the
+following will be printed:
+
+  *grammar 'name' used words not found in the pronunciation dictionary: 
+  notaword*
+
+If you get a message like this, try changing the words in your grammars/key
+phrases by splitting up the words or using to similar words,
+e.g. changing "natlink" to "nat link".
+
+I hope to eventually have words and phoneme strings dynamically added to the
+current dictionary and language model using the Pocket Sphinx `ps_add_word`_
+function (from Python of course).
+
+Spoken Language Support
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+There are a only handful of languages with models and dictionaries
+`available from source forge <https://sourceforge.net/projects/cmusphinx/files/Acoustic%20and%20Language%20Models/>`_,
+although it is possible to build your own language model `using lmtool
+<http://www.speech.cs.cmu.edu/tools/lmtool-new.html>`_ or
+pronunciation dictionary `using lextool
+<http://www.speech.cs.cmu.edu/tools/lextool.html>`_.
+There is also a CMU Sphinx tutorial on `building language models
+<https://cmusphinx.github.io/wiki/tutoriallm/>`_.
+
+
+Dragonfly Lists and DictLists
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Dragonfly :class:`Lists` and :class:`DictLists` function as normal, private
+rules for the Pocket Sphinx engine. On updating a dragonfly list or
+dictionary, the grammar they are part of will be reloaded. This is because
+there is unfortunately no JSGF equivalent for lists.
+
+
+Text-to-speech
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+This isn't a limitation of CMU Pocket Sphinx; text-to-speech is not a
+project goal for them, although as the natlink and WSR engines both support
+text-to-speech, there might as well be some suggestions if this
+functionality is desired, perhaps utilised by a custom dragonfly action.
+
+The Jasper project contains `a number of Python interface classes
+<https://github.com/jasperproject/jasper-client/blob/master/client/tts.py>`_
+to popular open source text-to-speech software such as `eSpeak`_,
+`Festival`_ and `CMU Flite`_.
+
+
+.. Links.
+.. _Aenea: https://github.com/dictation-toolbox/aenea
+.. _CMU Flite: http://www.festvox.org/flite/
+.. _CMU Pocket Sphinx speech recognition engine: https://github.com/cmusphinx/pocketsphinx/
+.. _CMU Sphinx LM tutorial: https://cmusphinx.github.io/wiki/tutoriallm/#keyword-lists
+.. _CMU Sphinx wiki: https://cmusphinx.github.io/wiki/
+.. _Festival: http://www.cstr.ed.ac.uk/projects/festival/
+.. _SphinxTrainingHelper: https://github.com/ExpandingDev/SphinxTrainingHelper
+.. _YouTube video on model adaption: https://www.youtube.com/watch?v=IAHH6-t9jK0
+.. _adaption tutorial: https://cmusphinx.github.io/wiki/tutorialadapt/
+.. _default engine config module: https://github.com/Danesprite/dragonfly/blob/master/dragonfly/engines/backend_sphinx/config.py
+.. _dragonfly/examples/sphinx_module_loader.py: https://github.com/Danesprite/dragonfly/blob/master/dragonfly/examples/sphinx_module_loader.py
+.. _eSpeak: http://espeak.sourceforge.net/
+.. _pocketsphinx/cmdln_macro.h: https://github.com/cmusphinx/pocketsphinx/blob/master/include/cmdln_macro.h
+.. _ps_add_word: https://cmusphinx.github.io/doc/pocketsphinx/pocketsphinx_8h.html#a5f3c4fcdbef34915c4e785ac9a1c6005
+.. _sphinxbase/fe.h: https://github.com/cmusphinx/sphinxbase/blob/master/include/sphinxbase/fe.h
+.. _tuning tutorial: https://cmusphinx.github.io/wiki/tutorialtuning/