Bhojpur Speech - Processing Engine

The Bhojpur Speech is an advanced, high-performance, audio data processing engine using artificial intelligence techniques for speech recognition and speech synthesis. It is applied within Bhojpur.NET Platform for delivery of distributed applications or services in various fields (e.g. voice pathology). It leverages Vosk framework that works in offline mode too.

Key Features

Offline mode automated speech recognition and speech synthesis
A web-based application (using Python) for online speech recognition
Python-based and Go-based software framework using C/C++ libraries
Advanced tools (e.g. Oscilloscope, Recorder, Player) for data processing
Utilities to build speech training models

Prerequisites

Please note that this software framework is based on Python >= 3.8 and Go >= 1.17. So, please install these runtimes, if you plan to build any custom applications.

It is assumed that portaudio will be used to capture audio inputs from your local machine. However, we have Go libraries to support serial port, portmidi, and miniport as well.

You need OpenFST, Kaldi, and Vosk software libraries to build the server engine. These libraries could be utilised during custom development of speech training models.

On macOS, you could run the following commands to install these key dependencies. Perhaps, you can use apt-get or yum command on a Linux server to install the same.

brew install openfst automake sox subversion ffmpeg portaudio portmidi mpg123
sudo pip3 install numpy flask openfst pyttsx3 flask sseclient
fstinfo --help

On macOS, a software developer nneds to copy libvosk.dynlib into the /usr/local/lib folder so that the Go programs could detect the library.

Speech Recognition Framework

Installation

Firstly, issue the following command in a new Terminal window to install the webspeech server engine. Also, download the Vosk models.

python3 -m pip install aiortc aiohttp aiorpc vosk

WebSpeech Application

Firstly, please note that evdev dependency is available on a Linux operating system only. It is required for Keyboard device events.

sudo pip3 install evdev

Also, you must have mpg123 >= v1.29.3 MPEG audio player installed.

sudo apt-get install -y mpg123
sudo tools/install.sh [username]
sudo tools/install-vosk.sh

Type the following command in a new Termianl window to run the webspeech server engine.

webspeech.py

You can open http://localhost:8026 URL in a web browser to access the application.

Type the following command in a new Terminal window to run the webspeech command line.

webspeech_cli.py -H localhost -P 8026 -o

Server-side Speech Recognition

Firstly, the automated speech recognition engine must be built using Python.

cd pkg/server
pip3 install -r requirements.txt

then, it should be started in a new Terminal window.

python3 ./pkg/server/websocket/asr_server.py /usr/local/lib/vosk/vosk-model-small-en-us-0.15

Please note that vosk-model-small-en-us-0.15 model is downloaded and installed on your system. Otherwise, please specify your own PATH.

Typically, the automated speech recognition engine listens at the ws:localhost:2700 IP address/port.

Speech Recognition Training

Firstly, download the Kaldi source code and run the following commands in a new Terminal window.

git clone https://github.com/kaldi-asr/kaldi.git
cd kaldi/tools/; make; cd ../src; ./configure; make
./configure --use-cuda=no

Now, edit the cmd.sh file under kaldi/egs/mini_librispeech/s5 (for example). In fact, you could choose any other folder under kaldi/egs or choose to make something of your own.

For data preparation, please refer here

For additional datasets, you can find some more models to practice.

Client-side Speech Recognition

You could try to connect the personal computing device's microphone directly by running the following command in a new Terminal window.

python3 ./pkg/server/websocket/test_microphone.py -u ws://localhost:2700

Go-based Speech Transcription

Perhaps, you could run following Go program (i.e. transcribe) to test automated speech transcription methods using Vosk.

go run internal/transcribe/main.go -f ./python/example/test.wav

Speech Synthesis Framework

Our Speech-to-Text framework is designed to work using Python and Go bindings. During training model development, you could also look into eSpeak or eSpeak NG frameworks too. Firstly, you need to install required software libraries on your local machine. For example

brew install espeak

Client-side Speech Synthesis

You can try our a web-based user interface of remote speaker Go application built using eSpeak framework.

go run ./internal/espeak/web/main.go

Go-based Speech Synthesis

It is performed in offline mode using eSpeak framework and our Go language bindings.

speechtext "मेरा नाम भोजपुर कंसल्टिंग है"
speechplay audios/test_hi.wav

Python-based Speech Synthesis

A sample Python program (i.e. speaker) is included in this repository. It is based on pyttsx3 or coqui STT library.

sudo pip3 install pyttsx3 stt
python3 ./internal/speaker/main.py

Speech Translation Framework

We have Google enabled language translation capabilities integrated. Perhaps, you could try the following program. Also, it detects the language used.

./internal/translate/main.go "मैं भोजपुर कंसल्टिंग के लिए काम कर रहा हूँ"

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
audios		audios
cmd		cmd
files		files
internal		internal
misc		misc
pkg		pkg
python		python
ssl		ssl
tools		tools
training		training
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
Taskfile.yml		Taskfile.yml
go.mod		go.mod
go.sum		go.sum
recorder.go		recorder.go
requirements.txt		requirements.txt
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Bhojpur Speech - Processing Engine

Key Features

Prerequisites

Speech Recognition Framework

Installation

WebSpeech Application

Server-side Speech Recognition

Speech Recognition Training

Client-side Speech Recognition

Go-based Speech Transcription

Speech Synthesis Framework

Client-side Speech Synthesis

Go-based Speech Synthesis

Python-based Speech Synthesis

Speech Translation Framework

About

Releases

Packages

Languages

bhojpur/speech

Folders and files

Latest commit

History

Repository files navigation

Bhojpur Speech - Processing Engine

Key Features

Prerequisites

Speech Recognition Framework

Installation

WebSpeech Application

Server-side Speech Recognition

Speech Recognition Training

Client-side Speech Recognition

Go-based Speech Transcription

Speech Synthesis Framework

Client-side Speech Synthesis

Go-based Speech Synthesis

Python-based Speech Synthesis

Speech Translation Framework

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages