## Project Name : Text Characterization using Speech Recognition

#### In this project we will see how we can use Speech Recognition librarties to characterize audio speech files by importing or by just saying them. We will see how we can perform different operations on a speech to text converted data.

### Time Line of the project :
- Importing Libraries
- Using Speech Recognition 
- Observing the audio data
- Analyzing the lyrics of a song

In [1]:
pip install SpeechRecognition


Note: you may need to restart the kernel to use updated packages.


In [2]:
import pandas as pd 
import numpy as np
import speech_recognition as sr

In [3]:
import IPython.display as ipd

In [4]:
recognizer = sr.Recognizer()
audio_file = sr.AudioFile("voice-data.wav")
type(audio_file)

speech_recognition.AudioFile

### APIs : There are various APIs present which does the Speech To Text Conversion. Different APIs are :
- IBM Speech TO Text
- CMU SPhinx
- Google Web Speech API (Recognize Google)
- Google Cloud Speech 
And more

### Google Web Speech API : 

This is a free web service provided by Google i.e. Recognize Google, through which we can convert audio speech files to text and perform operations on them.

In [5]:
with audio_file as source:
    audio_file = recognizer.record(source)
    result=recognizer.recognize_google(audio_data=audio_file)

There are two types of taking input :

1) Duration : This is used to select a specific time of audio data i.e. if you want to select just 5 seconds then you can set Duration= 5 and it will only select the 5 seconds of audio file

In [6]:
audio_file_path = 'voice-data.wav'

# Open the audio file
with sr.AudioFile(audio_file_path) as source:
    audio_file_path = recognizer.record(source, duration = 5.0)
    result1 = recognizer.recognize_google(audio_data=audio_file_path) 

2) Offset : This is used when you cut out some part of the starting data from your audio file i.e. if you don't want first 2 seconds then you can select offset as 2 and it will skip the first two seconds.

In [7]:
audio_file1 = 'voice-data.wav'

# Open the audio file
with sr.AudioFile(audio_file1) as source:
    audio_file1 = recognizer.record(source, offset = 2.0)
    result2 = recognizer.recognize_google(audio_data=audio_file1)

Combining both

In [8]:
audio_file2 = 'voice-data.wav'

# Open the audio file
with sr.AudioFile(audio_file2) as source:
    audio_file2 = recognizer.record(source, duration= 5.0, offset = 2.0)
    result3 = recognizer.recognize_google(audio_data=audio_file2)

Comparing all the results

In [9]:
print(result)
print(result1)
print(result2)
print(result3)

would you be then you know locking this meeting in so that we don't have more participants because once the division of team is done in stuff like that what does it what is it ok if people if more people join in the game of the game and stuff like that most people would know about it so we playing code names guys it's a fun board game just enjoy yourself understand most of you will be joining us cooperative society
then would you be then
would you be then you know locking this meeting in so that we don't have more participants because once the division of team is done in stuff like that what does it what is it ok people if more people join in the game I think people why don't we start rules of the game and stuff like that most people would know about it so we playing code names guys it's a fun board game just enjoy yourself understand most of you will be joining us operations will be joining as operator
would you be then you know lock


#### Effect of Noise

Noise in the backgroud data can create disturbances or error in the results hence it is necessary to remove the noise from the audio file

In [10]:
audio_file2 = 'voice-data.wav'

# Open the audio file
with sr.AudioFile(audio_file2) as source:
    recognizer.adjust_for_ambient_noise(source, duration=0.5)
    audio = recognizer.record(source)

result4= recognizer.recognize_google(audio)

In [11]:
print(result4)

awesome OK then you know locking this meeting in so that we don't have more participants because one of the division of team is done in stuff like that ok what does it what is it ok people if more people join in the game I think we have about 18 people why don't we start rules of the game and stuff like that most people would know about it so we playing code names guys it's a fun board game just enjoy yourself don't understand how many of you know the games and most of you will be joining as Cooperative you can see operators here so both of the things will be joining as operator


In [12]:
result_str= result.split(' ')

In [13]:
result_str

['would',
 'you',
 'be',
 'then',
 'you',
 'know',
 'locking',
 'this',
 'meeting',
 'in',
 'so',
 'that',
 'we',
 "don't",
 'have',
 'more',
 'participants',
 'because',
 'once',
 'the',
 'division',
 'of',
 'team',
 'is',
 'done',
 'in',
 'stuff',
 'like',
 'that',
 'what',
 'does',
 'it',
 'what',
 'is',
 'it',
 'ok',
 'if',
 'people',
 'if',
 'more',
 'people',
 'join',
 'in',
 'the',
 'game',
 'of',
 'the',
 'game',
 'and',
 'stuff',
 'like',
 'that',
 'most',
 'people',
 'would',
 'know',
 'about',
 'it',
 'so',
 'we',
 'playing',
 'code',
 'names',
 'guys',
 "it's",
 'a',
 'fun',
 'board',
 'game',
 'just',
 'enjoy',
 'yourself',
 'understand',
 'most',
 'of',
 'you',
 'will',
 'be',
 'joining',
 'us',
 'cooperative',
 'society']

In [14]:
print(result_str)

['would', 'you', 'be', 'then', 'you', 'know', 'locking', 'this', 'meeting', 'in', 'so', 'that', 'we', "don't", 'have', 'more', 'participants', 'because', 'once', 'the', 'division', 'of', 'team', 'is', 'done', 'in', 'stuff', 'like', 'that', 'what', 'does', 'it', 'what', 'is', 'it', 'ok', 'if', 'people', 'if', 'more', 'people', 'join', 'in', 'the', 'game', 'of', 'the', 'game', 'and', 'stuff', 'like', 'that', 'most', 'people', 'would', 'know', 'about', 'it', 'so', 'we', 'playing', 'code', 'names', 'guys', "it's", 'a', 'fun', 'board', 'game', 'just', 'enjoy', 'yourself', 'understand', 'most', 'of', 'you', 'will', 'be', 'joining', 'us', 'cooperative', 'society']


#### Different number of words used?

In [15]:
unique_words = set(result_str)
print(unique_words)

{'guys', 'know', 'have', 'that', 'team', 'then', 'most', 'so', "don't", 'is', 'playing', 'done', 'ok', 'be', 'if', 'more', 'once', 'us', 'society', 'just', 'understand', 'names', 'we', 'joining', 'board', 'it', 'the', 'about', 'join', 'code', "it's", 'will', 'cooperative', 'a', 'what', 'people', 'yourself', 'because', 'division', 'locking', 'participants', 'like', 'enjoy', 'does', 'game', 'you', 'of', 'would', 'fun', 'and', 'in', 'stuff', 'meeting', 'this'}


In [16]:
print("The number of different words used: ",len(unique_words))

The number of different words used:  54


#### Count the repetition of words

First we will store the unique words in a dictionary

In [17]:
# To count the number of times the unique words appear , first in the unique_word list
word_dict = {} #An empty dictionary
for word in result_str:
    word_dict[word] = 0
print(word_dict)

{'would': 0, 'you': 0, 'be': 0, 'then': 0, 'know': 0, 'locking': 0, 'this': 0, 'meeting': 0, 'in': 0, 'so': 0, 'that': 0, 'we': 0, "don't": 0, 'have': 0, 'more': 0, 'participants': 0, 'because': 0, 'once': 0, 'the': 0, 'division': 0, 'of': 0, 'team': 0, 'is': 0, 'done': 0, 'stuff': 0, 'like': 0, 'what': 0, 'does': 0, 'it': 0, 'ok': 0, 'if': 0, 'people': 0, 'join': 0, 'game': 0, 'and': 0, 'most': 0, 'about': 0, 'playing': 0, 'code': 0, 'names': 0, 'guys': 0, "it's": 0, 'a': 0, 'fun': 0, 'board': 0, 'just': 0, 'enjoy': 0, 'yourself': 0, 'understand': 0, 'will': 0, 'joining': 0, 'us': 0, 'cooperative': 0, 'society': 0}


In [18]:
for word in result_str:
    word_dict[word] = word_dict[word] + 1
print("The count for each word spoken number of times are: ",word_dict)

The count for each word spoken number of times are:  {'would': 2, 'you': 3, 'be': 2, 'then': 1, 'know': 2, 'locking': 1, 'this': 1, 'meeting': 1, 'in': 3, 'so': 2, 'that': 3, 'we': 2, "don't": 1, 'have': 1, 'more': 2, 'participants': 1, 'because': 1, 'once': 1, 'the': 3, 'division': 1, 'of': 3, 'team': 1, 'is': 2, 'done': 1, 'stuff': 2, 'like': 2, 'what': 2, 'does': 1, 'it': 3, 'ok': 1, 'if': 2, 'people': 3, 'join': 1, 'game': 3, 'and': 1, 'most': 2, 'about': 1, 'playing': 1, 'code': 1, 'names': 1, 'guys': 1, "it's": 1, 'a': 1, 'fun': 1, 'board': 1, 'just': 1, 'enjoy': 1, 'yourself': 1, 'understand': 1, 'will': 1, 'joining': 1, 'us': 1, 'cooperative': 1, 'society': 1}


In [19]:
cols= ['Repetition']
count_df= pd.DataFrame.from_dict(word_dict,orient ='index',columns=cols)

In [20]:
count_df= count_df.reset_index()

In [21]:
count_df= count_df.rename(columns = {'index':'Word'})

In [22]:
count_df

Unnamed: 0,Word,Repetition
0,would,2
1,you,3
2,be,2
3,then,1
4,know,2
5,locking,1
6,this,1
7,meeting,1
8,in,3
9,so,2


#### Counting the number of words spoken per minute

In [23]:
print("Total number of words: ",len(result_str))

Total number of words:  82


In [24]:
print("Total length of audio: 3.08 minutes ")

Total length of audio: 3.08 minutes 


In [25]:
print("Total number of words spoken per minute : ",(len(result_str)/3.08))

Total number of words spoken per minute :  26.623376623376622


#### Approx total number of words spoken are 79 words per minute