# Speech recognizer

### Introduction:

In this notebook I will try to understand all the process of speech recognition. Instead of actually training a model to receive and predict all the information (as language is so complicated and I would need to understand how sound actually works), I will be using an API to be able to predict the actual words. Probably I'll be using the google speech API for this objective, still it has its limitations and it might change through time.

### Importing libraries:

I will be just importing the **speech recognition** library in this trial, even though I will be using the PyAudio package, necessary to use the microphone class (the one which creates communication between the user and the microphone)

In [12]:
import speech_recognition as sr

###  First trial: Static audio

Without using the microphone, here I will establish an audio example from harvard and set it as the source

In [None]:
#Get the audio file from the repo
audio_trial = sr.AudioFile('harvard_trial.wav')

This will record the audio from the file and assign it to a variable called 'audio'

In [16]:
# audio = r.record(audio_trial) !!! This will not work! As audio has not been used yet, instead: 
with audio_trial as source:
    audio = r.record(source)

Now let's create a connection with the **google API**

In [10]:
#using the .recognize_google it will return a string with the transcription of the source audio
r.recognize_google(audio)

'the mute muscled the high tones of the horn the gold ring fits only a pierced-ear the old pan was covered with hard fudge watch the log float in the wide river the node on the stalk of wheat grew daily the Heap of fallen leaves was set on fire right fast if you want to finish early his shirt was clean but one button was gone the barrel of beer was a brew of malt and hops tin cans are absent from store shelves'

###  Second trial: Noise

Noise is important too, so let's see how to fix it. For that, we will be using the same source and the **adjust_for_ambient_noise** class. From now on I will not explain the functions parameters in depth, as there a to many to explain each one of them

In [18]:
with audio_trial as source:
    r.adjust_for_ambient_noise(source, duration=0.5)
    audio = r.record(source)
r.recognize_google(audio)

'the mute muscled the high tones of the horn the gold ring fits only a pierced-ear the old pan was covered with hard fudge watch the log float in the wide river the node on the stalk of wheat grew daily the Heap of fallen leaves was set on fire right fast if you want to finish early his shirt was clean but one button was gone the barrel of beer was a brew of malt and hops tin cans are absent from store shelves'

In this case, as the audio was well recorded and the duration of the calibration is really small we got the same result

###  Third trial: Microphone

In [20]:
#Instead of recognizing a file directly, we need those two classes now
rec = sr.Recognizer()
mic = sr.Microphone()

The microphone instance **does not record**, instead, it sets the environment to do that (as AudioFile did with the harvard example)

In [33]:
#This will be running untill the user stop talking, storing the data to the audio variable
with mic as source:
    audio = r.listen(source)
print(type(audio))
print(audio)

<class 'speech_recognition.AudioData'>
<speech_recognition.AudioData object at 0x0000014A195369C8>


Let's improve it with noise reduction

In [None]:
with mic as source:
    rec.adjust_for_ambient_noise(source, duration=0.5)
    audio = r.listen(source)

### So now that we have already finished the first part (understanding the sound), let's hang on to work!