GitHub - FarzadForuozanfar/Speech-Recognition: I recorded 10 voices with the same words from myself and compared them with another 10 words from another person. I was able to find a threshold level that acknowledges and recognizes my own voice.

Project Description:

In this project, I first converted my recorded sounds to text using librosa, dtw, and speech_recognition libraries, which can also be a way of recognizing speech, so-called text-dependency. A more reliable way is done using the coefficient of Mel or Capstral.

DTW (Dynamic time warping) :

In time series analysis, dynamic time warping (DTW) is an algorithm for measuring similarity between two temporal sequences, which may vary in speed. For instance, similarities in walking could be detected using DTW, even if one person was walking faster than the other, or if there were accelerations and decelerations during the course of an observation. DTW has been applied to temporal sequences of video, audio, and graphics data — indeed, any data that can be turned into a linear sequence can be analyzed with DTW. A well-known application has been automatic speech recognition, to cope with different speaking speeds. Other applications include speaker recognition and online signature recognition. It can also be used in partial shape matching applications.

In general, DTW is a method that calculates an optimal match between two given sequences (e.g. time series) with certain restriction and rules:

Every index from the first sequence must be matched with one or more indices from the other sequence, and vice versa
The first index from the first sequence must be matched with the first index from the other sequence (but it does not have to be its only match)
The last index from the first sequence must be matched with the last index from the other sequence (but it does not have to be its only match)
The mapping of the indices from the first sequence to indices from the other sequence must be monotonically increasing, and vice versa, i.e. if j>i are indices from the first sequence, then there must not be two indices l>k in the other sequence, such that index i is matched with index l and index j is matched with index k, and vice versa

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
Speech_recognition		Speech_recognition
97440190_1.pdf		97440190_1.pdf
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speech_recognition

Speech_recognition

97440190_1.pdf

97440190_1.pdf

LICENSE

LICENSE

README.md

README.md

Repository files navigation

Project Description:

DTW (Dynamic time warping) :

My voice signals using the `matplotlib.pyplot` chart drawing function:

Another person voice signals using the `matplotlib.pyplot` chart drawing function:

Compare two(me & another person) audio signals using dtw.plot():

Calculate distance between my voices to each other with dtw.distance() :

Calculate distance between my voices and another person voice to each other with dtw.distance() :

About

Releases

Packages

Languages

License

FarzadForuozanfar/Speech-Recognition

Folders and files

Latest commit

History

Repository files navigation

Project Description:

DTW (Dynamic time warping) :

My voice signals using the matplotlib.pyplot chart drawing function:

Another person voice signals using the matplotlib.pyplot chart drawing function:

Compare two(me & another person) audio signals using dtw.plot():

Calculate distance between my voices to each other with dtw.distance() :

Calculate distance between my voices and another person voice to each other with dtw.distance() :

About

Topics

Resources

License

Stars

Watchers

Forks

Languages

My voice signals using the `matplotlib.pyplot` chart drawing function:

Another person voice signals using the `matplotlib.pyplot` chart drawing function: