# Speech Recognition With Python

In [1]:
! pip install SpeechRecognition



In [2]:
import speech_recognition as sr
sr.__version__

'3.8.1'

# Section1: Apply Recognizer module

Each Recognizer instance has seven methods for recognizing speech from an audio source using various APIs. These are:

- recognize_bing(): Microsoft Bing Speech
- recognize_google(): Google Web Speech API
- recognize_google_cloud(): Google Cloud Speech - requires installation of the google-cloud-speech package
- recognize_houndify(): Houndify by SoundHound
- recognize_ibm(): IBM Speech to Text
- recognize_sphinx(): CMU Sphinx - requires installing PocketSphinx
- recognize_wit(): Wit.ai


Of the seven, only recognize_sphinx() works offline with the CMU Sphinx engine. The other six all require an internet connection.

In [3]:
r = sr.Recognizer()

### Using record() to Capture Data From a File

In [4]:
harvard = sr.AudioFile('OSR_us_000_0061_8k.wav')
with harvard as source:
     audio = r.record(source)

In [5]:
type(audio)

speech_recognition.AudioData

**recognize_google() to attempt to recognize any speech in the audio**

In [6]:
r.recognize_google(audio)

'the mute muffled the high tones of the horn the gold ring fits only a pierced ear the Old pay and was covered with hard fudge watch the log float in the wide river the Node on the stalk of wheat grew daily the heap of fallen leaves were set on fire right fast if you want to finish early his shirt was clean but one button was gone the barrel of beer was a brew of malt and hops tin cans are absent from store shelves'

*Harvard Sentences.*
    
These phrases were published by the IEEE in 1965 for use in speech intelligibility testing of telephone lines. They are still used in VoIP and cellular testing today.

The Harvard Sentences are comprised of 72 lists of ten phrases. You can find freely available recordings of these phrases on the Open Speech Repository website. Recordings are available in English, Mandarin Chinese, French, and Hindi. They provide an excellent source of free material for testing any code.

### Capturing Segments With offset and duration

In [7]:
with harvard as source:
     audio = r.record(source, duration=8)

r.recognize_google(audio)

'the mute muffled the high tones of the horn the gold ring'

What if we record once for four seconds and then record again for four seconds, the second time returns the four seconds of audio after the first four seconds.

In [8]:
with harvard as source:
    audio1 = r.record(source, duration=4)
    audio2 = r.record(source, duration=4)

In [9]:
r.recognize_google(audio1)

'the mute muffled the Hightown'

In [10]:
r.recognize_google(audio2)

'the gold ring'

Notice that audio2 contains a portion of the third phrase in the file. When specifying a duration, the recording might stop mid-phrase—or even mid-word—which can hurt the accuracy of the transcription

In [11]:
#To capture only the second phrase in the file, you could start with an offset of four seconds and record for, say, four seconds.

with harvard as source:
     audio = r.record(source, offset=8, duration=4)

In [12]:
r.recognize_google(audio)

'only a pierced ear'

# Section 2:  Effect of Noise on Speech Recognition

All audio recordings have some degree of noise in them, and un-handled noise can wreck the accuracy of speech recognition apps

In [13]:
# Our given file has for example This file has the phrase “the stale smell of old beer lingers” spoken with a loud jackhammer in the background.

jackhammer = sr.AudioFile('jackhammer.wav')
with jackhammer as source:
     audio = r.record(source)


In [14]:
r.recognize_google(audio)

'the stale smell of beer drinkers'

### Dealing with noise

In [15]:
with jackhammer as source:
    r.adjust_for_ambient_noise(source)
    audio = r.record(source)

In [16]:
r.recognize_google(audio)

'still smell with your fingers'

Perhaps not perfect yet

In [17]:
with jackhammer as source:
    r.adjust_for_ambient_noise(source, duration=0.5)
    audio = r.record(source)

In [18]:
r.recognize_google(audio)

'the stale smell of your wrinkles'

Not goo enough yet. Let's see all options 

In [19]:
r.recognize_google(audio, show_all=True)

{'alternative': [{'transcript': 'the stale smell of your wrinkles',
   'confidence': 0.68223208},
  {'transcript': 'the stale smell of your fingers'},
  {'transcript': 'the steel smell your fingers'},
  {'transcript': 'the stale smell your fingers'},
  {'transcript': 'the stale smell for wrinkles'}],
 'final': True}

recognize_google() returns a dictionary with the key 'alternative' that points to a list of possible transcripts. The structure of this response may vary from API to API and is mainly useful for debugging.