# Speech Recognition with Python Libraries - vosk, SpeechRecognition and Pocketsphinx

<div style="border: 5px ridge; padding:5%"> 
<h1>Table of Contents</h1>
<hr style="border:2px solid black" />

<h3><a href="#online">1. Online Speech Recognition with SpeechRecognition</a></h3>
<h3><a href="#vosk">2. Offline Speech Recognition with Vosk</a></h3>
<h3><a href="#pocketsphinx">3. Offline Speech Recognition with SpeechRecognition and Pocketsphinx</a></h3>
<h3><a href="#results">Results</a></h3>
</div>

<hr style="border:2px solid black" />
<h1><a id="online">Online Speech Recognition with SpeechRecognition</a></h1>

## Import Libraries

In [1]:
import os
import sys
import time
import speech_recognition as sr
# install with `pip install SpeechRecognition`

## Specify the file name to recognize and language

In [2]:
# name of the audio file to recognize (wav preferably)
audio_filename = "audio/test.wav"
# name of the text file to write recognized text
text_filename = "audio/test.txt"
# language of speech
language = 'en-US'

## Reading a file

In [3]:
if not os.path.exists(audio_filename):
    print(f"File '{audio_filename}' doesn't exist")
    sys.exit()

print(f"Reading your file '{audio_filename}'...")
audio_file = sr.AudioFile(audio_filename)
r = sr.Recognizer()

with audio_file as af:
    r.adjust_for_ambient_noise(af)  # clearing sound from noise
    audio = r.record(af)

print(f"'{audio_filename}' file was successfully read and cleaned from noise")

Reading your file 'audio/test.wav'...
'audio/test.wav' file was successfully read and cleaned from noise


## Recognize

In [4]:
print('Start converting to text. It may take some time...')
start_time = time.time()

Start converting to text. It may take some time...


In [5]:
# recognize speech using Google API
try:
    text = r.recognize_google(audio, language=language)
except sr.UnknownValueError:
    print("Google could not understand audio")
    sys.exit()
except sr.RequestError as e:
    print("Google error; {0}".format(e))
    sys.exit()

In [6]:
time_elapsed = time.strftime(
    '%H:%M:%S', time.gmtime(time.time() - start_time))
print(f'Done! Elapsed time = {time_elapsed}')

Done! Elapsed time = 00:00:01


In [7]:
print("\tGoogle thinks you said:\n")
print(text)

	Google thinks you said:

best bossk speech recognition Library


In [8]:
print(f"Saving text to '{text_filename}'...")
with open(text_filename, "w") as text_file:
    text_file.write(text)
print(f"Text successfully saved")

Saving text to 'audio/test.txt'...
Text successfully saved


<hr style="border:2px solid black" />
<h1><a id="vosk">Offline Speech Recognition with Vosk</a></h1>

## Import Libraries

In [9]:
import os
import sys
import time
import wave
import json
from vosk import Model, KaldiRecognizer, SetLogLevel
# !pip install vosk

SetLogLevel(0)

## Specify the file name to recognize and the path to the vosk model

In [10]:
# name of the audio file to recognize (wav preferably)
audio_filename = "audio/test.wav"
# path to vosk model downloaded from
# https://alphacephei.com/vosk/models
model_path = "models/vosk-model-en-us-0.21"

# name of the text file to write recognized text
text_filename = "audio/test.txt"

## Reading a file and a model

In [11]:
if not os.path.exists(audio_filename):
    print(f"File '{audio_filename}' doesn't exist")
    sys.exit()

print(f"Reading your file '{audio_filename}'...")
wf = wave.open(audio_filename, "rb")
print(f"'{audio_filename}' file was successfully read")

Reading your file 'audio/test.wav'...
'audio/test.wav' file was successfully read


In [12]:
if not os.path.exists(model_path):
    print(f"Please download the model from https://alphacephei.com/vosk/models and unpack as {model_path}")
    sys.exit()

print(f"Reading your vosk model '{model_path}'...")
model = Model(model_path)
rec = KaldiRecognizer(model, wf.getframerate())
rec.SetWords(True)
print(f"'{model_path}' model was successfully read")

Reading your vosk model 'models/vosk-model-en-us-0.21'...
'models/vosk-model-en-us-0.21' model was successfully read


## Recognize

In [13]:
print('Start converting to text. It may take some time...')
start_time = time.time()

Start converting to text. It may take some time...


In [14]:
results = []

# recognize speech using vosk model
while True:
    data = wf.readframes(4000)
    if len(data) == 0:
        break
    if rec.AcceptWaveform(data):
        part_result = json.loads(rec.Result())
        results.append(part_result)

part_result = json.loads(rec.FinalResult())
results.append(part_result)

`result` - list of json dictionaries, each of them has the following structure:

```
{'result': [{'conf': 0.849133, # confidence
             'end': 4.5, # end time
             'start': 4.05, # start time
             'word': 'test'}], # recognized word
 'text': 'test'}
 ```

In [15]:
results

[{'result': [{'conf': 1.0, 'end': 2.55, 'start': 1.98, 'word': 'deus'},
   {'conf': 1.0, 'end': 3.33, 'start': 2.64, 'word': 'vos'},
   {'conf': 1.0, 'end': 4.44, 'start': 3.75, 'word': 'speech'},
   {'conf': 1.0, 'end': 5.25, 'start': 4.44, 'word': 'recognition'},
   {'conf': 1.0, 'end': 6.03, 'start': 5.25, 'word': 'library'}],
  'text': 'deus vos speech recognition library'}]

In [16]:
# forming a final string from the words
text = ''
for r in results:
    text += r['text'] + ' '

In [17]:
time_elapsed = time.strftime(
    '%H:%M:%S', time.gmtime(time.time() - start_time))
print(f'Done! Elapsed time = {time_elapsed}')

Done! Elapsed time = 00:00:01


In [18]:
print("\tVosk thinks you said:\n")
print(text)

	Vosk thinks you said:

deus vos speech recognition library 


In [19]:
print(f"Saving text to '{text_filename}'...")
with open(text_filename, "w") as text_file:
    text_file.write(text)
print(f"Text successfully saved")

Saving text to 'audio/test.txt'...
Text successfully saved


<hr style="border:2px solid black" />
<h1><a id="pocketsphinx">Offline Speech Recognition with SpeechRecognition and Pocketsphinx</a></h1>

## Import Libraries

In [20]:
import os
import sys
import time
import speech_recognition as sr
# install with `pip install SpeechRecognition`

## Specify the file name to recognize and language

In [21]:
# name of the audio file to recognize (wav preferably)
audio_filename = "audio/test.wav"
# name of the text file to write recognized text
text_filename = "audio/test.txt"
# language of speech
language = 'en-US'

## Reading a file

In [22]:
if not os.path.exists(audio_filename):
    print(f"File '{audio_filename}' doesn't exist")
    sys.exit()
    
print(f"Reading your file '{audio_filename}'...")
audio_file = sr.AudioFile(audio_filename)
r = sr.Recognizer()

with audio_file as af:
    r.adjust_for_ambient_noise(af)  # clearing sound from noise
    audio = r.record(af)

print(f"'{audio_filename}' file was successfully read and cleaned from noise")

Reading your file 'audio/test.wav'...
'audio/test.wav' file was successfully read and cleaned from noise


## Recognize

In [23]:
print('Start converting to text. It may take some time...')
start_time = time.time()

Start converting to text. It may take some time...


In [24]:
# recognize speech using Sphinx
try:
    text = r.recognize_sphinx(audio, language=language)
except sr.UnknownValueError:
    print("Sphinx could not understand audio")
    sys.exit()
except sr.RequestError as e:
    print("Sphinx error; {0}".format(e))
    sys.exit()

In [25]:
time_elapsed = time.strftime(
    '%H:%M:%S', time.gmtime(time.time() - start_time))
print(f'Done! Elapsed time = {time_elapsed}')

Done! Elapsed time = 00:00:01


In [26]:
print("\tSphinx thinks you said:\n")
print(text)

	Sphinx thinks you said:

that's bosco speech from the ignition library


In [27]:
print(f"Saving text to '{text_filename}'...")
with open(text_filename, "w") as text_file:
    text_file.write(text)
print(f"Text successfully saved")

Saving text to 'audio/test.txt'...
Text successfully saved


<hr style="border:2px solid black" />
<h1><a id="results">Results</a></h1>

Some text TODO


| Method | Recognised Text |
| ----------- | ----------- |
| Initial Text | test vosk speech recognition library |
| Google API with `recognize_google()` | best bossk speech recognition Library |
| vosk | deus vos speech recognition library  |
| SpeechRecognition with `recognize_sphinx()` | that's bosco speech from the ignition library |