<a href="https://colab.research.google.com/github/MK316/workspace/blob/main/ASR01/ASR_speechrecognition.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Generate a sound and run SR (Speech Recognition)

: Create speech using gTTS and check its recognition using {Whisper}

## [1] Setting up {gTTS}

In [56]:
#@markdown {gTTS}, etts(text), ktts(text)
%%capture
!pip install gTTS
from gtts import gTTS
from IPython.display import Audio

def etts(text):
  text_to_say = text

  gtts_object = gTTS(text = text_to_say,
                     lang = "en",
                    slow = False)
  
  gtts_object.save("E-audio.wav")
  return Audio("E-audio.wav")


def ktts(text):
  text_to_say = text

  gtts_object = gTTS(text = text_to_say,
                     lang = "ko",
                    slow = False)
  
  gtts_object.save("K-audio.wav")
  return Audio("K-audio.wav")

##Note: we'll create a folder named "audio" and save generated files in it.

## [2] Sound generation using {gTTS}

In [81]:
from IPython.display import Audio

language = "ko" #@param "en" language = ["en","ko"]
mytext = input("Type text.")
fname = input("Type file name without .wav: e.g., right (for 'right.wav'")

if language == "en":
  etts(mytext)
  mpath = '/content/' + language + "_" + fname + '.wav'
  old_name = r"/content/E-audio.wav"
  new_name = mpath
  os.rename(old_name, new_name)

elif language == "ko":
  ktts(mytext)
  mpath = '/content/' + language + "_" + fname + '.wav'
  old_name = r"/content/K-audio.wav"
  new_name = mpath
  os.rename(old_name, new_name)

print("Text-to-Speech: %s"%mytext) 
print("Audio filename: %s"%mpath)

Audio(mpath)


Type text.heart
Type file name without .wav: e.g., right (for 'right.wav'heart
Text-to-Speech: heart
Audio filename: /content/ko_heart.wav


time: 8.85 s (started: 2022-12-29 18:01:32 +00:00)


##[3] Speech Recognition

In [82]:
#@markdown Install SR tool
%%capture
!pip install git+https://github.com/openai/whisper.git 

import whisper
model = whisper.load_model('base.en') 

time: 6.65 s (started: 2022-12-29 18:01:59 +00:00)


Note: Run this to add audio files from your computer

- Remove '#' to run the code.

In [None]:
# from google.colab import files
# uploaded = files.upload()

List all audio files in current folder (/content/ (left panel))

In [101]:
#@markdown Refer to the following list files:
import os
import pandas as pd

dir_path = '/content/'
dir_files = os.listdir(dir_path)
str1 = 'wav'
flist = []

for i in range(0, len(dir_files)):
  str2 = dir_files[i]
  if str1 in str2:
    flist.append(str2)
flist

df = pd.DataFrame()
n = len(flist)
nt = n + 1
fn = range(1, nt)
df['ID'] = fn
df['Filename'] = flist

df

Unnamed: 0,ID,Filename
0,1,ko_why.wav
1,2,E-audio.wav
2,3,ko_today.wav
3,4,en_test.wav
4,5,ko_test.wav
5,6,ko_newfile.wav
6,7,thelainbow.wav
7,8,ko_heart.wav


time: 35.4 ms (started: 2022-12-29 18:29:31 +00:00)


Run Speech Recognitioin: type ID number (e.g., 1, 2, or 3)

In [84]:
#@markdown Runtime set up
%%capture
!pip install ipython-autotime
%load_ext autotime

time: 5.66 s (started: 2022-12-29 18:02:17 +00:00)


In [93]:
#@markdown Recognition result: Type ID number from the above table (e.g., 1, 2, 3, ...)

rname = input("Type ID")
ind = int(rname) - 1
myf = df['Filename'][ind]
result = model.transcribe(myf, language="en",fp16=False)
print('Filename: %s'%myf)
print('='*30)

print("Speech-to-text (recognized): %s"%result["text"])  

Type ID1
Filename: ko_why.wav
Speech-to-text (recognized):  Why?
time: 4.86 s (started: 2022-12-29 18:19:59 +00:00)


In [94]:
import time
import pandas as pd

def measure_time(function):
  start = time.time()
  function()
  end = time.time()
  return end - start

# your code here
  print("Hello, world!")

runtime = measure_time(code_to_measure)
print(f"Runtime: {runtime} seconds")

Type ID1
Filename: ko_why.wav
Hello, world!
Runtime: 11.530438661575317 seconds
time: 11.5 s (started: 2022-12-29 18:21:20 +00:00)


Description: we'll create a list that will store runtime for each file. 

Run R1 (only once)
Run R2 (repeat) - this data will be stored in the runtimedata(list)

In [127]:
# R1
runtimedata = []

time: 552 µs (started: 2022-12-29 18:49:44 +00:00)


In [130]:
#@markdown # R2
import time
import pandas as pd

def measure_time(function):
  start = time.time()
  function()
  end = time.time()
  return end - start

rname = input("Type ID: ")
ind = int(rname) - 1
myf = df['Filename'][ind]

def code_to_measure():
# your code here  
  result = model.transcribe(myf, language="en",fp16=False)

runtime = measure_time(code_to_measure)
print(f"Runtime: {runtime} seconds")

print('Filename: %s'%myf)
print('='*30)
print("Speech-to-text (recognized): %s"%result["text"])  

# df1 = pd.DataFrame({"function": ["code_to_measure"], "runtime": [runtime]})
print("="*30)
data = {myf: round(runtime,3)}
runtimedata.append(data)
print(runtimedata)

Type ID: 3
Runtime: 4.832329273223877 seconds
Filename: ko_today.wav
Speech-to-text (recognized):  Why?
[{'ko_why.wav': 2.462}, {'E-audio.wav': 2.825}, {'ko_today.wav': 4.832}]
time: 6.99 s (started: 2022-12-29 18:51:25 +00:00)


When done, run R3 below to create a dataframe with runtimedata list

In [160]:
len(runtimedata)

3

time: 9.35 ms (started: 2022-12-29 19:09:27 +00:00)


Todo from here.

In [159]:
rundf = pd.DataFrame()
keydata = []
valuedata = []
n = len(runtimedata) + 1

for i in range(0, n):
  current = runtimedata[i]
  for key, value in current.items():
      a1 = key
      a2 = value
      keydata.append(a1)
      valuedata.append(a2)

rundf['FILENAME'] = keydata
rundf['RUNTIME'] = valuedata

rundf

IndexError: ignored

time: 35.7 ms (started: 2022-12-29 19:09:05 +00:00)
