Flask-Application-Development-for-Speech-Recognition

What is Flask ?

Flask is an API of Python that allows us to build web-applications in a very easy and friendly way. It is a web-application framework which is based on WSGI (Web Server Gateway Interface) toolkit and Jinja2 template engine. Read more about flask from here.

What is Speech Recognition ?

Speech Recognition is a technique where we can convert or transcribe or recognize speech into its corresponding text. We can use various python libraries to do that. Some of the libraries are SpeechRecognition, deepspeech, google-cloud-speech, watson-developer-cloud, wit, etc. You can look at this and this website to know more about the speech recognition. It really helped me in understanding the concept and the workings behind these libraries.

Speech to Text Transcription using Flask application

This repository tells you how to develop a flask application for the speech recognition task where you can directly upload any audio file or record your own audio as well to get the transcripted text. I have used SpeechRecognition library to recognize the speech and convert it into text. You can use any library as per your requirements and suitability. You can check this link which gives you the step by step process of how to install a Flask Web Application. You can also go through this github example to explore more options to do the speech recognition.

For the installation of SpeechRecognition library see this.

Speech/Audio Analysis

Number of words per minute

This feature gives us the information about the number of words have been spoken in the audio per minute. From this, we can also gather the speech speed or the rate of speaking per minute. This tells us that whether the person is speaking slowly (if it has less number of words spoken per minute) or the person is speaking fast (if it has high number of words spoken per minute).

Energy of Audio

The energy of audio tells us the pitch of an audio. This below graph tells us that the intensity of an audio, the high intensity means the audio is clearly audible or in other words it contains some voice whereas low intensity or low energy means the audio doesn't have any voice or it isn't clearly audible.

Filler Words

Filler words are those words which are not properly the words but we use them as fillers in between the sentences while speaking. For example, right, okay, but, umm, yeah, so, yes... etc. These words may tell us that whether the person is speaking very fluently or (s)he is using any kind of other words to complete the sentences. Below graph shows us the count of filler words in one of the audio that I've used in this.

You can check out the video that I've made to demonstrate this speech recognition task using flask using this link. Or you can also check out my linkedin profile to watch this video.

If you found it helpful do upvote it or if you found anything wrong do let me know. I'm open to any kind of feedback. Thanks for your time.

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
audio_dataset		audio_dataset
static		static
README.md		README.md
app.py		app.py
video_sr.mp4		video_sr.mp4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Flask-Application-Development-for-Speech-Recognition

What is Flask ?

What is Speech Recognition ?

Speech to Text Transcription using Flask application

Speech/Audio Analysis

Number of words per minute

Energy of Audio

Filler Words

About

Releases

Packages

Languages

ayushirastogi15/Flask-Application-Development

Folders and files

Latest commit

History

Repository files navigation

Flask-Application-Development-for-Speech-Recognition

What is Flask ?

What is Speech Recognition ?

Speech to Text Transcription using Flask application

Speech/Audio Analysis

Number of words per minute

Energy of Audio

Filler Words

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages