Skip to content

Commit

Permalink
Update documentation and README
Browse files Browse the repository at this point in the history
- Update docstrings for the SphinxEngine class methods and
  properties.
- Add sphinx_engine documentation page.
- Remove the Sphinx engine README.md file.
- Remove section on the Sphinx engine from the README and
  reference the new page on Read the Docs.
  • Loading branch information
drmfinlay committed Sep 24, 2018
1 parent 55643f7 commit 3222075
Show file tree
Hide file tree
Showing 8 changed files with 413 additions and 205 deletions.
47 changes: 7 additions & 40 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,52 +26,19 @@ There is also a gitter channel:
[![Join the chat at https://gitter.im/sphinx-dragonfly](https://badges.gitter.im/Join%20Chat.svg)](https://gitter.im/sphinx-dragonfly)


CMU Sphinx and Installation
Installation
----------------------------------------------------------------------------

This fork of dragonfly has an engine implementation using the open source
CMU Pocket Sphinx speech recognition engine. You can read more about the
CMU Sphinx speech recognition projects
[here](https://cmusphinx.github.io/wiki/).

This version of dragonfly should work normally with the DNS and WSR engines
and can be installed for that purpose using something like:
Dragonfly can be installed by cloning this repository and running the
following (or similar) in the root directory:

``` Shell
python setup.py install
```

To use the Pocket Sphinx engine you will need to install the
[sphinxwrapper](https://github.com/Danesprite/sphinxwrapper),
[pyjsgf](https://github.com/Danesprite/pyjsgf), and
[pyaudio](http://people.csail.mit.edu/hubert/pyaudio/) Python packages.

You can install *sphinxwrapper* and *pyjsgf* from the git submodules by
running the following commands:
``` Shell
git clone --recursive https://github.com/Danesprite/dragonfly.git
git submodule foreach python setup.py install
```

Then install dragonfly with the 'sphinx' extra using `pip`, which will
install other dependencies:
``` Shell
pip install .[sphinx]
```

Once it's installed, you'll need to copy the *sphinx_module_loader.py*
script from *dragonfly/examples* into the folder with your grammars and run
it using:
``` Shell
python sphinx_module_loader.py
```

This is the equivalent to the 'core' directory that NatLink uses to load
grammar modules.

There is more information on how the engine works, what the limitations
are, the to-do list and more
[here](dragonfly/engines/backend_sphinx/README.md).
To use the CMU Pocket Sphinx engine, see the
[relevant documentation page](http://dragonfly2.readthedocs.org/en/latest/sphinx_engine.html)
on it.


Features
Expand Down Expand Up @@ -111,7 +78,7 @@ Existing command modules

The related resources page of Dragonfly's documentation has a
section on
[command modules](http://dragonfly.readthedocs.org/en/latest/related_resources.html#command-modules)
[command modules](http://dragonfly2.readthedocs.org/en/latest/related_resources.html#command-modules)
which lists various sources.


Expand Down
10 changes: 10 additions & 0 deletions documentation/engines.txt
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,13 @@ Dragonfly supports multiple speech recognition engines as its backend.
The *engines* sub-package implements the interface code for each
supported engine.

There is a separate page on the CMU Pocket Sphinx engine:

.. toctree::
:titlesonly:
:maxdepth: 1

sphinx_engine

EngineBase class
----------------------------------------------------------------------------
Expand All @@ -31,6 +38,9 @@ Engine backends
.. automodule:: dragonfly.engines.backend_sapi5
:members:

.. automodule:: dragonfly.engines.backend_sphinx
:members:


Dictation container classes
----------------------------------------------------------------------------
Expand Down
1 change: 1 addition & 0 deletions documentation/index.txt
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ It currently supports the following speech recognition engines:
- *Dragon NaturallySpeaking* (DNS), a product of *Nuance*
- *Windows Speech Recognition* (WSR), included with Microsoft
Windows Vista, Windows 7, and freely available for Windows XP
- *CMU Pocket Sphinx* (with caveats)

Dragonfly's documentation is available online at
`Read the Docs <http://dragonfly2.readthedocs.org/en/latest/>`_.
Expand Down
292 changes: 292 additions & 0 deletions documentation/sphinx_engine.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,292 @@
.. _RefSphinxEngine:

CMU Pocket Sphinx dragonfly engine
============================================================================

This version of dragonfly contains an engine implementation using the open
source, cross-platform CMU Pocket Sphinx speech recognition engine. You can
read more about the CMU Sphinx speech recognition projects on the
`CMU Sphinx wiki`_.


Setup
----------------------------------------------------------------------------

There are three dependencies for using the Pocket Sphinx engine:

- `pyaudio <http://people.csail.mit.edu/hubert/pyaudio/>`_
- `pyjsgf <https://github.com/Danesprite/pyjsgf>`_
- `sphinxwrapper <https://github.com/Danesprite/sphinxwrapper>`_

*sphinxwrapper* must be installed manually at the moment. You can install
it from the git submodule by running the following::

git clone --recursive https://github.com/Danesprite/dragonfly.git
git submodule foreach python setup.py install

You can install the other packages using::

pip install pyjsgf pyaudio

Once the dependencies are installed, you'll need to copy the
`dragonfly/examples/sphinx_module_loader.py`_ script into the folder
with your grammar modules and run it using::

python sphinx_module_loader.py


This is the equivalent to the 'core' directory that NatLink uses to load
grammar modules.


Cross-platform Engine
----------------------------------------------------------------------------

Pocket Sphinx runs on most platforms, including on architectures other than
x86, so it only makes sense that the Pocket Sphinx dragonfly engine
implementation should work on non-Windows platforms like macOS as well as on
Linux distributions. To this effect, I've made an effort to mock
Windows-only functionality for non-Windows platforms for the time being to
allow the engine components to work correctly regardless of the platform.

Using dragonfly with a non-Windows operating system can already be done with
`Aenea`_ using the existing *NatLink* engine. Aenea communicates with a
separate Windows system running *NatLink* and *DNS* over a network
connection and has server support for Linux (using X11), macOS, and Windows.


Engine Configuration
----------------------------------------------------------------------------

This engine is configurable via the engine configuration module (see the
`default engine config module`_).

To change the engine configuration, create a *config.py* file in the same
directory as *sphinx_engine_loader.py* and make your config changes.

The config module should have each of the following attributes defined:

- ``DECODER_CONFIG`` -- configuration object for the Pocket Sphinx decoder.
- ``LANGUAGE`` -- user language for the engine to use (default: ``"en"``).
- ``NEXT_PART_TIMEOUT`` -- timeout in seconds for speaking the next part of
a rule involving dictation. If set to 0, there will be no timeout.
- ``PYAUDIO_STREAM_KEYWORD_ARGS`` -- keyword arguments dictionary given to
:meth:`PyAudio.open` in :meth:`recognise_forever`. Some values are
also used in :meth:`process_wave_file`. The default values
assume a 16kHz acoustic model is used.
- ``START_ASLEEP`` -- boolean value for whether the engine should start in
a sleep state (default: ``True``).
- ``TRAINING_DATA_DIR`` -- directory to store recorded utterances and
transcriptions for training (default: ``"training/"``). Relative paths
will be interpreted as relative to the module loader's directory. Set to
``None`` to disable training data recording.
- ``WAKE_PHRASE`` -- the keyphrase to listen for when in sleep mode
(default: ``"wake up"``).
- ``WAKE_PHRASE_THRESHOLD`` -- threshold value* for the wake keyphrase
(default: ``1e-20``).
- ``SLEEP PHRASE`` -- the keyphrase to listen for to enter sleep mode
(default: ``"go to sleep"``)
- ``SLEEP_PHRASE_THRESHOLD`` -- threshold value* for the sleep keyphrase
(default: ``1e-40``).
- ``START_TRAINING_PHRASE`` -- keyphrase to listen for to start a training
session where no processing occurs.
(default: ``"start training session"``).
- ``START_TRAINING_THRESHOLD`` -- threshold value* for the start training
keyphrase.
(default: ``1e-48``).
- ``END_TRAINING_PHRASE`` -- keyphrase to listen for to end a training
session if one is in progress.
(default: ``"end training session"``).
- ``END_TRAINING_THRESHOLD`` -- threshold value* for the end training
keyphrase.
(default: ``1e-45``).

\* Threshold values need to be set for each keyphrase. The `CMU Sphinx LM
tutorial`_ has some advice on keyphrase threshold values.


Pocket Sphinx Decoder Configuration
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The ``DECODER_CONFIG`` object initialised in the engine config module can be
used to set various Pocket Sphinx decoder options. For instance, the
following line silences the decoder's log output::

DECODER_CONFIG.set_string("-logfn", os.devnull)

There does not appear to be much documentation on these options outside of
the `pocketsphinx/cmdln_macro.h`_ and `sphinxbase/fe.h`_ header files.
If this is incorrect or has changed, feel free to suggest an edit.

Probably the easiest way of seeing the available options and their default
and current values is to comment the above line in the engine config module
and examine the decoder log output.


Changing Models and Dictionaries
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The ``DECODER_CONFIG`` object can be used to configure the pronunciation
dictionary as well as the acoustic and language models. To do this, add
something like the following in the config module::

DECODER_CONFIG.set_string('-hmm', '/path/to/acoustic-model-folder')
DECODER_CONFIG.set_string('-lm', '/path/to/lm-file.lm')
DECODER_CONFIG.set_string('-dict', '/path/to/dictionary-file.dict')

The language model, acoustic model and pronunciation dictionary should all
use the same language or language variant. See the `CMU Sphinx wiki`_ for
a more detailed explanation of these components.

Engine API
----------------------------------------------------------------------------

.. autoclass:: dragonfly.engines.backend_sphinx.engine.SphinxEngine
:members:



Improving Speech Recognition Accuracy
----------------------------------------------------------------------------

CMU Pocket Sphinx can have some trouble recognising what was said
accurately. To remedy this, you may need to adapt the acoustic model that
Pocket Sphinx is using. This is similar to how Dragon sometimes requires
training. The CMU Sphinx `adaption tutorial`_ covers this topic. There is
also a `YouTube video on model adaption`_.

Adapting your model may not be necessary; there might be other issues with
your setup. There is more information on tuning the recognition accuracy in
the CMU Sphinx `tuning tutorial`_.

By default, the engine will record what you say into wave and transcription
files compatible with the Sphinx accoustic model adaption process. The files
are placed in the directory specified by the engine's ``TRAINING_DATA_DIR``
configuration option.

There are built-in key phrases for starting and ending training sessions
where no grammar rule processing will occur. Key phrases will still be
processed. See the ``START_TRAINING_PHRASE`` and ``END_TRAINING_PHRASE``
engine configuration options. One use case for the training mode is training
commands that take a long time to execute their actions or are dangerous.
Perhaps such commands keep getting falsely recognised and they need more
training.

To use the training files, you will need to correct any incorrect phrases
in the *training.transcription* file and then use the
`SphinxTrainingHelper`_ bash script to adapt your model. This script makes
the process considerably easier, although you may still encounter problems.
You should be able to play the wave files using most media players (e.g.
VLC, Windows Media Player, aplay) if you need to.

You will want to remove the wave and transcription files after a successful
adaption without the engine running or with ``TRAINING_DATA_DIR`` set to
``None``.


Limitations
----------------------------------------------------------------------------

This engine has a few limitations, most notably with the spoken language
support and dragonfly's :class:`Dictation` functionality. That said, most of
the grammar functionality will work perfectly.


Dictation
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Unfortunately, the 'Dictation' support that DNS and WSR provide is difficult
to reproduce with the CMU Sphinx engines. They don't support speaking grammar
rules that include :class:`Dictation` elements, although they will work for
this engine, you'll just have to pause between speaking the grammar and
dictation parts of rules that use :class:`Dictation` extras.

For those interested, this is done by segmenting rules and using a Pocket
Sphinx language model search to recognise the dictation parts.

There is a timeout period for the next parts of such rules. If the timeout
is reached the engine will process any other matched rules or fail to
recognise altogether. The timeout peroid is set by the
``NEXT_PART_TIMEOUT`` engine configuration option. If set to 0, the engine
will wait until speech starts again and process the next part if it is
spoken.

'Dictation' output also won't have words properly capitalised as they
are when using DNS, all words will be in lowercase. Additionally,
punctuation words like "comma" or "apostrophe" won't have special output,
although such functionality can be added either through grammars or
processing of the dictation output.

Unknown words
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

CMU Pocket Sphinx uses pronunciation dictionaries to lookup phonetic
representations for words in grammars, language models and key phrases in
order to recognise them. If you use words in your grammars and/or key
phrases that are *not* in the dictionary, a message similar to the
following will be printed:

*grammar 'name' used words not found in the pronunciation dictionary:
notaword*

If you get a message like this, try changing the words in your grammars/key
phrases by splitting up the words or using to similar words,
e.g. changing "natlink" to "nat link".

I hope to eventually have words and phoneme strings dynamically added to the
current dictionary and language model using the Pocket Sphinx `ps_add_word`_
function (from Python of course).

Spoken Language Support
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

There are a only handful of languages with models and dictionaries
`available from source forge <https://sourceforge.net/projects/cmusphinx/files/Acoustic%20and%20Language%20Models/>`_,
although it is possible to build your own language model `using lmtool
<http://www.speech.cs.cmu.edu/tools/lmtool-new.html>`_ or
pronunciation dictionary `using lextool
<http://www.speech.cs.cmu.edu/tools/lextool.html>`_.
There is also a CMU Sphinx tutorial on `building language models
<https://cmusphinx.github.io/wiki/tutoriallm/>`_.


Dragonfly Lists and DictLists
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Dragonfly :class:`Lists` and :class:`DictLists` function as normal, private
rules for the Pocket Sphinx engine. On updating a dragonfly list or
dictionary, the grammar they are part of will be reloaded. This is because
there is unfortunately no JSGF equivalent for lists.


Text-to-speech
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

This isn't a limitation of CMU Pocket Sphinx; text-to-speech is not a
project goal for them, although as the natlink and WSR engines both support
text-to-speech, there might as well be some suggestions if this
functionality is desired, perhaps utilised by a custom dragonfly action.

The Jasper project contains `a number of Python interface classes
<https://github.com/jasperproject/jasper-client/blob/master/client/tts.py>`_
to popular open source text-to-speech software such as `eSpeak`_,
`Festival`_ and `CMU Flite`_.


.. Links.
.. _Aenea: https://github.com/dictation-toolbox/aenea
.. _CMU Flite: http://www.festvox.org/flite/
.. _CMU Pocket Sphinx speech recognition engine: https://github.com/cmusphinx/pocketsphinx/
.. _CMU Sphinx LM tutorial: https://cmusphinx.github.io/wiki/tutoriallm/#keyword-lists
.. _CMU Sphinx wiki: https://cmusphinx.github.io/wiki/
.. _Festival: http://www.cstr.ed.ac.uk/projects/festival/
.. _SphinxTrainingHelper: https://github.com/ExpandingDev/SphinxTrainingHelper
.. _YouTube video on model adaption: https://www.youtube.com/watch?v=IAHH6-t9jK0
.. _adaption tutorial: https://cmusphinx.github.io/wiki/tutorialadapt/
.. _default engine config module: https://github.com/Danesprite/dragonfly/blob/master/dragonfly/engines/backend_sphinx/config.py
.. _dragonfly/examples/sphinx_module_loader.py: https://github.com/Danesprite/dragonfly/blob/master/dragonfly/examples/sphinx_module_loader.py
.. _eSpeak: http://espeak.sourceforge.net/
.. _pocketsphinx/cmdln_macro.h: https://github.com/cmusphinx/pocketsphinx/blob/master/include/cmdln_macro.h
.. _ps_add_word: https://cmusphinx.github.io/doc/pocketsphinx/pocketsphinx_8h.html#a5f3c4fcdbef34915c4e785ac9a1c6005
.. _sphinxbase/fe.h: https://github.com/cmusphinx/sphinxbase/blob/master/include/sphinxbase/fe.h
.. _tuning tutorial: https://cmusphinx.github.io/wiki/tutorialtuning/

0 comments on commit 3222075

Please sign in to comment.