## Project Name : Text Characterization using Speech Recognition

#### In this project we will see how we can use Speech Recognition librarties to characterize audio speech files by importing or by just saying them. We will see how we can perform different operations on a speech to text converted data.

### Time Line of the project :
- Importing Libraries
- Using Speech Recognition 
- Observing the audio data
- Analyzing the lyrics of a song

In [1]:
import pandas as pd
import numpy as np
import speech_recognition as sr

In [2]:
import IPython.display as ipd

In [9]:
recognizer = sr.Recognizer()
audio_file = sr.AudioFile("voice-data.wav")
type(audio_file)

speech_recognition.AudioFile

### APIs : There are various APIs present which does the Speech To Text Conversion. Different APIs are :
- IBM Speech TO Text
- CMU SPhinx
- Google Web Speech API (Recognize Google)
- Google Cloud Speech 
And more

### Google Web Speech API : 

This is a free web service provided by Google i.e. Recognize Google, through which we can convert audio speech files to text and perform operations on them.

In [3]:
recognizer = sr.Recognizer()
audio_file = sr.AudioFile("voice-data.wav")
with audio_file as source:
    audio_file = recognizer.record(source)
    result=recognizer.recognize_google(audio_data=audio_file)

In [7]:
result

"awesome would you be then this meeting in so that we don't have nobody"

There are two types of taking input :

1) Duration : This is used to select a specific time of audio data i.e. if you want to select just 5 seconds then you can set Duration= 5 and it will only select the 5 seconds of audio file

In [7]:
recognizer = sr.Recognizer()
audio_file = sr.AudioFile("voice-data.wav")
with audio_file as source:
    audio_file = recognizer.record(source,duration=10)
    result1=recognizer.recognize_google(audio_data=audio_file)

In [8]:
result1

"awesome would you be then this meeting in so that we don't have nobody"

2) Offset : This is used when you cut out some part of the starting data from your audio file i.e. if you don't want first 2 seconds then you can select offset as 2 and it will skip the first two seconds.

In [9]:
recognizer = sr.Recognizer()
audio_file = sr.AudioFile("voice-data.wav")
with audio_file as source:
    audio_file = recognizer.record(source,offset=2)
    result2=recognizer.recognize_google(audio_data=audio_file)

In [10]:
result2

"awesome okay would you be then locking this meeting in so that we don't have nobody"

Combining both

In [12]:
recognizer = sr.Recognizer()
audio_file = sr.AudioFile("voice-data.wav")
with audio_file as source:
    audio_file = recognizer.record(source, duration= 5.0, offset = 2.0)
    result3 = recognizer.recognize_google(audio_data=audio_file)

Comparing all the results

In [13]:
print(result)
print(result1)
print(result2)
print(result3)

awesome okay would you be then clocking this meeting in so that we don't have nobody cuz once the division of penis.
awesome would you be then this meeting in so that we don't have nobody
awesome okay would you be then locking this meeting in so that we don't have nobody
awesome okay would you be then


#### Effect of Noise

Noise in the backgroud data can create disturbances or error in the results hence it is necessary to remove the noise from the audio file

In [31]:
recognizer = sr.Recognizer()
audio_file = sr.AudioFile("voice-data.wav")
with audio_file as source:
    recognizer.adjust_for_ambient_noise(source)
    audio = recognizer.record(source)

result4= recognizer.recognize_google(audio)

In [32]:
print(result4)

awesome would you be then this meeting in so that we don't have nobody


In [33]:
result_str= result.split(' ')

In [34]:
result_str

['awesome',
 'okay',
 'would',
 'you',
 'be',
 'then',
 'clocking',
 'this',
 'meeting',
 'in',
 'so',
 'that',
 'we',
 "don't",
 'have',
 'nobody',
 'cuz',
 'once',
 'the',
 'division',
 'of',
 'penis.']

In [35]:
print(result_str)

['awesome', 'okay', 'would', 'you', 'be', 'then', 'clocking', 'this', 'meeting', 'in', 'so', 'that', 'we', "don't", 'have', 'nobody', 'cuz', 'once', 'the', 'division', 'of', 'penis.']


#### Different number of words used?

In [36]:
unique_words = set(result_str)
print(unique_words)

{'okay', 'in', "don't", 'that', 'you', 'awesome', 'cuz', 'clocking', 'of', 'have', 'once', 'be', 'meeting', 'we', 'the', 'nobody', 'penis.', 'so', 'would', 'then', 'division', 'this'}


In [37]:
print("The number of different words used: ",len(unique_words))

The number of different words used:  22


#### Count the repetition of words

First we will store the unique words in a dictionary

In [38]:
# To count the number of times the unique words appear , first in the unique_word list
word_dict = {} #An empty dictionary
for word in result_str:
    word_dict[word] = 0
print(word_dict)

{'awesome': 0, 'okay': 0, 'would': 0, 'you': 0, 'be': 0, 'then': 0, 'clocking': 0, 'this': 0, 'meeting': 0, 'in': 0, 'so': 0, 'that': 0, 'we': 0, "don't": 0, 'have': 0, 'nobody': 0, 'cuz': 0, 'once': 0, 'the': 0, 'division': 0, 'of': 0, 'penis.': 0}


In [39]:
for word in result_str:
    word_dict[word] = word_dict[word] + 1
print("The count for each word spoken number of times are: ",word_dict)

The count for each word spoken number of times are:  {'awesome': 1, 'okay': 1, 'would': 1, 'you': 1, 'be': 1, 'then': 1, 'clocking': 1, 'this': 1, 'meeting': 1, 'in': 1, 'so': 1, 'that': 1, 'we': 1, "don't": 1, 'have': 1, 'nobody': 1, 'cuz': 1, 'once': 1, 'the': 1, 'division': 1, 'of': 1, 'penis.': 1}


In [40]:
cols= ['Repetition']
count_df= pd.DataFrame.from_dict(word_dict,orient ='index',columns=cols)

In [41]:
count_df= count_df.reset_index()

In [42]:
count_df= count_df.rename(columns = {'index':'Word'})

In [43]:
count_df

Unnamed: 0,Word,Repetition
0,awesome,1
1,okay,1
2,would,1
3,you,1
4,be,1
5,then,1
6,clocking,1
7,this,1
8,meeting,1
9,in,1


#### Counting the number of words spoken per minute

In [44]:
print("Total number of words: ",len(result_str))

Total number of words:  22


In [45]:
print("Total length of audio: 3.08 minutes ")

Total length of audio: 3.08 minutes 


In [46]:
print("Total number of words spoken per minute : ",(len(result_str)/3.08))

Total number of words spoken per minute :  7.142857142857142
