# Speech Recognition


## Introduction Hidden Markov model (HMM)

Hidden Markov Model in which the system is being modeled is assumed to be a Markov Process. 
A Markov process or Markov chain is a model based on a random process usually defined as a family of random variables (see <a href="https://en.wikipedia.org/wiki/Stochastic_process">Stochastic process</a> for more informations). 
The Markov process is describing a sequence of possible events in which the probability of each event depends on the states atteined in the previous event.

Let's say we have a process X with unobservable states, HMM assumes that there is another process Y whose behavior depends on X. The goal is to learn about X by observing Y. The HMM models are known for their applications in signal processing and many others. 

Most of modern speech recognition systems rely on these models. This approach works on the assumption that a speech signal, when viewed on a short enough timescales, can be reasonably approximated as a process in which the properties doesn't change over the time.

In a typical HMM, the speech signal is divided into 10ms fragments. The power spectrum of each fragment is mapped to a vector of real numbers known as cepstral coefficients. To decode the speech into text, groups of vectors are matched to one or more phonemes. This calculation requires training since the sound of a phoneme varies from speaker to speaker. A special algorithm is then applied to determine the most likely words that produce the given sequence of phonemes.

## Importing the libraries

### The Speech Recognition library

This library is made for speech recognition, it supports several engines and APIs. Such as Google Web Speech API, Google Cloud Speech, IDM, Sphinx...

In [2]:
import speech_recognition as sr

### The speech

In [3]:
speech = './tests/Céline-Paris.wav'

### Initializing the speech recognition

In [4]:
r = sr.Recognizer()

In [6]:
with sr.AudioFile(speech) as source:
    # listen for the data (load audio to memory)
    audio_data = r.record(source)
    # recognize (convert from speech to text)
    text = r.recognize_sphinx(audio_data)
    print(text)

it was in on that badly


Well, I forget to set the language

In [8]:
with sr.AudioFile(speech) as source:
    # listen for the data (load audio to memory)
    audio_data = r.record(source)
    # recognize (convert from speech to text)
    text = r.recognize_sphinx(audio_data, language="fr-FR")
    print(text)

je vous allez me rendre à paris


Seems better already, original sentance : "je voudrais me rendre à paris"

In [11]:
def get_sentance(audiospeech):
    with sr.AudioFile(speech) as source:
        audio_data = r.record(source)
        text = r.recognize_sphinx(audio_data, language="fr-FR")
    return text

### Let's try

In [None]:
get_sentance()

In [None]:
get_sentance()

## Sources

https://realpython.com/python-speech-recognition/

https://en.wikipedia.org/wiki/Hidden_Markov_model

http://igm.univ-mlv.fr/~dr/XPOSE2012/HiddenMarkovModel/description.html

https://cmusphinx.github.io/wiki/tutorialconcepts/ 

https://pypi.org/project/SpeechRecognition/ 