An out of the box speach to text recognizer using Vosk speech recognition toolkit. It works offline, does not rely on external services and supports multiple languages.
Use the package manager pip to install the below dependencies.
See also recommended version information on the Vosk website.
pip3 install vosk
pip3 install sounddevice
# maybe these are also needed
sudo apt-get install libportaudio2
sudo apt-get install libasound-dev
# run in your ros2_ws/src folder
git clone https://gitlab.com/bob-ros2/voskros.git
cd ..
colcon build
. install/setup.bash
# get device list
ros2 run voskros vosk -l
# run node with default parameter
ros2 run voskros vosk
# run via launch file
ros2 launch voskros voskros.launch.yaml model:=fr
# run via launch and overide result output topic
ros2 launch voskros voskros.launch.yaml result:=/tts_topic
During first startup Vosk downloads a small sized default model around 50MB training data. It works quite good on small sized devices. Other custom models can be placed in the cache folder.
# list existing models
ls ~/.cache/vosk
Find further models and languages here: https://alphacephei.com/vosk/models
~device (string, default: "")
Device to use as input microphone. Leave it empty to use the default input.
~model (string, default: "en")
Model to be used.
~samplerate (int, default: 0)
Sample rate to use.
~result (std_msgs/String)
Detected result text.
~partial (std_msgs/String)
Detected partial text.
~set_grammar (srv/SetGrammar)
Set Vosk grammar list to only allow certain words. See also srv/SetGrammar.srv for details.
This very simple ROS Node can be used to receive an input command (e.g. from Vosk). If this command matches with one of the rules in the config file the according command will be executed.
# start Vosk if not yet running
# in order to work a microphone must be connected or another input device must be configured
ros2 launch voskros voskros.launch.yaml
# if needed reduce the grammar according to the config in prompter.yaml
# setting the grammar greatly improves the STT detection
ros2 run voskros set_grammar.sh <path>/prompter.yaml /stt
# start prompter node and wait for commands to be executed
ros2 run voskros prompter --ros-args \
-r input:=/stt/result \
-r __ns:=/stt \
-p yaml:=<path>/prompter.yaml
~yaml (string, default: "")
Path to a YAML file containing the prompter configuration.
See config directory for an example.
~cwd (string, default: "")
Working directory where to start programms from the configuration.
~input (std_msgs/String)
Input topic for incommning commands to identify.
~output (std_msgs/String)
Output topic containing stdout produced by the called program.
$ ros2 run voskros set_grammar.sh
Usage: set_grammar.sh <prompter config> [<namespace>]
Produces a JSON Grammar array for vosk from a prompter config yaml.
It calls afterwards the vosk node service set_grammar if a namespace is provided.
The Grammar can also be configured using a vosk node parameter.