## Project Name : Text Characterization using Speech Recognition

#### In this project we will see how we can use Speech Recognition librarties to characterize audio speech files by importing or by just saying them. We will see how we can perform different operations on a speech to text converted data.

### Time Line of the project :
- Importing Libraries
- Using Speech Recognition 
- Observing the audio data
- Analyzing the lyrics of a song

In [2]:
import pandas as pd
import numpy as np
import speech_recognition as sr

In [5]:
import IPython.display as ipd

In [3]:
recognizer = sr.Recognizer()
audio_file = sr.AudioFile("voice-data.wav")
type(audio_file)

speech_recognition.AudioFile

### APIs : There are various APIs present which does the Speech To Text Conversion. Different APIs are :
- IBM Speech TO Text
- CMU SPhinx
- Google Web Speech API (Recognize Google)
- Google Cloud Speech 
And more

### Google Web Speech API : 

This is a free web service provided by Google i.e. Recognize Google, through which we can convert audio speech files to text and perform operations on them.

In [6]:
with audio_file as source:
    audio_file = recognizer.record(source)
    result=recognizer.recognize_google(audio_data=audio_file)

There are two types of taking input :

1) Duration : This is used to select a specific time of audio data i.e. if you want to select just 5 seconds then you can set Duration= 5 and it will only select the 5 seconds of audio file

In [None]:
with audio_file_ as source:
    audio_file = recognizer.record(source, duration = 5.0)
    result1 = recognizer.recognize_google(audio_data=audio_file)

2) Offset : This is used when you cut out some part of the starting data from your audio file i.e. if you don't want first 2 seconds then you can select offset as 2 and it will skip the first two seconds.

In [None]:
with audio_file_ as source:
    audio_file = recognizer.record(source, offset = 2.0)
    result2 = recognizer.recognize_google(audio_data=audio_file)

Combining both

In [None]:
with audio_file_ as source:
    audio_file = recognizer.record(source, duration= 5.0, offset = 2.0)
    result3 = recognizer.recognize_google(audio_data=audio_file)

Comparing all the results

In [None]:
print(result)
print(result1)
print(result2)
print(result3)

#### Effect of Noise

Noise in the backgroud data can create disturbances or error in the results hence it is necessary to remove the noise from the audio file

In [None]:
with audio_file_ as source:
    recognizer.adjust_for_ambient_noise(source, duration=0.5)
    audio = recognizer.record(source)

result4= recognizer.recognize_google(audio)

In [None]:
print(result4)

In [None]:
result_str= result.split(' ')

In [None]:
result_str

In [None]:
print(result_str)

#### Different number of words used?

In [None]:
unique_words = set(result_str)
print(unique_words)

In [None]:
print("The number of different words used: ",len(unique_words))

#### Count the repetition of words

First we will store the unique words in a dictionary

In [None]:
# To count the number of times the unique words appear , first in the unique_word list
word_dict = {} #An empty dictionary
for word in result_str:
    word_dict[word] = 0
print(word_dict)

In [None]:
for word in result_str:
    word_dict[word] = word_dict[word] + 1
print("The count for each word spoken number of times are: ",word_dict)

In [None]:
cols= ['Repetition']
count_df= pd.DataFrame.from_dict(word_dict,orient ='index',columns=cols)

In [None]:
count_df= count_df.reset_index()

In [None]:
count_df= count_df.rename(columns = {'index':'Word'})

In [None]:
count_df

#### Counting the number of words spoken per minute

In [None]:
print("Total number of words: ",len(result_str))

In [None]:
print("Total length of audio: 3.08 minutes ")

In [None]:
print("Total number of words spoken per minute : ",(len(result_str)/3.08))

#### Approx total number of words spoken are 79 words per minute