The Bhojpur Speech
is an advanced, high-performance, audio data processing engine using
artificial intelligence techniques for speech recognition and speech synthesis. It is
applied within Bhojpur.NET Platform for delivery
of distributed applications
or services
in various fields (e.g. voice pathology). It
leverages Vosk framework that works in offline mode too.
- Offline mode automated speech recognition and speech synthesis
- A web-based application (using Python) for online speech recognition
Python
-based andGo
-based software framework using C/C++ libraries- Advanced tools (e.g. Oscilloscope, Recorder, Player) for data processing
- Utilities to build speech training models
Please note that this software framework is based on Python
>= 3.8 and Go
>= 1.17. So,
please install these runtimes, if you plan to build any custom applications.
It is assumed that portaudio
will be used to capture audio inputs from your local machine.
However, we have Go libraries to support serial port
, portmidi
, and miniport
as well.
You need OpenFST, Kaldi,
and Vosk software libraries to build the server engine
.
These libraries could be utilised during custom development of speech training models.
On macOS
, you could run the following commands to install these key dependencies. Perhaps,
you can use apt-get
or yum
command on a Linux server to install the same.
brew install openfst automake sox subversion ffmpeg portaudio portmidi mpg123
sudo pip3 install numpy flask openfst pyttsx3 flask sseclient
fstinfo --help
On macOS
, a software developer nneds to copy libvosk.dynlib
into the /usr/local/lib
folder so that the Go
programs could detect the library.
Firstly, issue the following command in a new Terminal window to install the webspeech
server engine. Also, download the Vosk
models.
python3 -m pip install aiortc aiohttp aiorpc vosk
Firstly, please note that evdev
dependency is available on a Linux operating system only.
It is required for Keyboard device events.
sudo pip3 install evdev
Also, you must have mpg123 >= v1.29.3 MPEG audio player installed.
sudo apt-get install -y mpg123
sudo tools/install.sh [username]
sudo tools/install-vosk.sh
Type the following command in a new Termianl window to run the webspeech
server engine.
webspeech.py
You can open http://localhost:8026
URL in a web browser to access the application.
Type the following command in a new Terminal window to run the webspeech
command line.
webspeech_cli.py -H localhost -P 8026 -o
Firstly, the automated speech recognition
engine must be built using Python
.
cd pkg/server
pip3 install -r requirements.txt
then, it should be started in a new Terminal window.
python3 ./pkg/server/websocket/asr_server.py /usr/local/lib/vosk/vosk-model-small-en-us-0.15
Please note that vosk-model-small-en-us-0.15
model
is downloaded and installed on your system. Otherwise, please specify your own PATH.
Typically, the automated speech recognition
engine listens at the ws:localhost:2700
IP
address/port.
Firstly, download the Kaldi source code and run the following commands in a new Terminal window.
git clone https://github.com/kaldi-asr/kaldi.git
cd kaldi/tools/; make; cd ../src; ./configure; make
./configure --use-cuda=no
Now, edit the cmd.sh
file under kaldi/egs/mini_librispeech/s5
(for example). In fact, you could
choose any other folder under kaldi/egs
or choose to make something of your own.
For data preparation, please refer here
For additional datasets, you can find some more models to practice.
You could try to connect the personal computing device's microphone directly by running the following command in a new Terminal window.
python3 ./pkg/server/websocket/test_microphone.py -u ws://localhost:2700
Perhaps, you could run following Go
program (i.e. transcribe)
to test automated speech transcription
methods using Vosk.
go run internal/transcribe/main.go -f ./python/example/test.wav
Our Speech-to-Text
framework is designed to work using Python
and Go
bindings. During
training model development, you could also look into eSpeak
or eSpeak NG frameworks too. Firstly, you need to
install required software libraries on your local machine. For example
brew install espeak
You can try our a web-based user interface of remote speaker Go
application built using
eSpeak framework.
go run ./internal/espeak/web/main.go
It is performed in offline mode using eSpeak framework
and our Go
language bindings.
speechtext "मेरा नाम भोजपुर कंसल्टिंग है"
speechplay audios/test_hi.wav
A sample Python
program (i.e. speaker) is included in this
repository. It is based on pyttsx3
or coqui STT library.
sudo pip3 install pyttsx3 stt
python3 ./internal/speaker/main.py
We have Google enabled language translation capabilities integrated. Perhaps, you could try the following program. Also, it detects the language used.
./internal/translate/main.go "मैं भोजपुर कंसल्टिंग के लिए काम कर रहा हूँ"