date | title |
---|---|
2021-09-09 |
Voice Subsystem README |
This readme contains:
- Voice Subsystem description
- Instructions to set up:
- pocketsphinx dependencies
- pocketsphinx-python
- python audio dependencies
- How to run voice recognition node demo
The voice recognition node is the interface between the user and the robot. This node processes raw audio from the system microphone and processes for recognized voice commands. The NLP model active filters background noise to improve voice recognition. The keyword trigger phrase of "Hey Coborg" serves as a barrier to prevent unwanted command recognition. This voice_recog node also serves as an interface for the ROS framework to the system audio output. Other nodes can publish to a topic, then the audio plays through voice_recog.
-
- /speaker (
std_msgs::String
) Name of .mp3 file in Coborg-Platform/catkin_ws/src/voice_recog/src/Sounds/- EX: msg.data = "jeez.mp3" plays the jeez.mp3 file out of the system speakers.
- /speaker (
-
-
/voice_cmd (
std_msgs::Int8
) Interger code for the recognized voice command- 0 - RESTART: Powers up robot arm, triggered by "Hey Coborg...Start up"
- 1 - TARGET: Move to the identified target, triggered by "Hey Coborg ... Go Here"
- 2 - HOME: Return to the Home (Compact) position, triggered by "Hey Coborg ... Come Back"
- 3 - READY: Get into the Ready position in front of the user, triggered by "Hey Coborg ... Get Ready"
- 4 - CELEBRATE: Plays some music, triggered by "Hey Coborg...Successful Fall Validation Demonstration"
- 9 - STOP: Kills power to robot arm, triggered by "Stop Stop Stop"
-
/feedback_voice (
std_msgs::Int32
) Interger code for the recognized voice command- 10 - IDLE: Waiting for a voice command trigger "Hey Coborg"
- 11 - INIT: The voice node is setting up
- 12 - PROCESSING: Heard "Hey Coborg", processing following audio for recognized commands
- 13 - COMPLETED: Command recognition completed (either successfully recognized or no command interpreted)
-
-
Ubuntu: 18.04
-
ROS: Melodic
-
Python: 3.6.9
-
Pocketsphinx-python: subdependencies of Sphinxbase and Pocketsphinx
-
Python Packages: PyAudio, Pydub, rospkg
-
sudo apt update
sudo apt dist-upgrade
sudo apt install bison
sudo apt install swig
sudo apt install pavucontrol linux-sound-base alsa-base alsa-utils
sudo apt install pulseaudio
sudo apt install libpulse-dev
sudo apt install osspd
sudo apt install ffmpeg
(Some dependencies may be installed by default)
-
cd
git clone --recursive https://github.com/cmusphinx/pocketsphinx-python.git
Note: make sure you have
--recursive
tag when downloading the pocketsphinx-python package Note 2: the pocketsphinx-python repo can be deleted once the setup steps are completed -
cd pocketsphinx-python
python3 setup.py install
sudo python3 setup.py install
cd deps/sphinxbase
./autogen.sh
./configure
make
sudo make install
cd ../pocketsphinx
./autogen.sh
./configure
make
sudo make install
-
export LD_LIBRARY_PATH=/usr/local/lib
export PKG_CONFIG_PATH=/usr/local/lib/pkgconfig
-
- PyAudio:
sudo apt-get intall portaudio19-dev python-pyaudio
pip install pyaudio
- Pydub:
pip install pydub
- rospkg:
pip install rospkg
-
-
Install dependencies:
pip3 install tensorflow==1.8
pip3 install tensor2tensor=1.6.6
-
Clone g2p-seq2seq repo:
git clone https://github.com/cmusphinx/g2p-seq2seq.git
-
Follow instructions in repo: https://github.com/cmusphinx/g2p-seq2seq
-
source [Coborg-Platform]/catkin_ws/devel/setup.py
roslaunch voice_recog voice.launch
Now say "Hey Coborg" and let the fun begin
- PocketSphinx-Python Repo: https://github.com/cmusphinx/pocketsphinx-python
- Building a PocketSphinx language model: https://cmusphinx.github.io/wiki/tutoriallm/
- g2p-seq2seq Repo: https://github.com/cmusphinx/g2p-seq2seq