<a href="https://colab.research.google.com/github/MK316/workspace/blob/main/ASR01/ASR03_Recognition_SK.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 🔍 Audio file recognition process

Note: change 'runtime' type to 'gpu' before you start.

SE data: model (base.en)
SK data: model (base.en), (medium.en) => 2 result files

=> Better result with **medium.en**

* Audio file from Github
https://github.com/MK316/workspace/tree/main/ASR01/SE

## Getting ready

In [1]:
#@markdown Install {IPython.display} and {Pandas}
from IPython.display import Audio
import pandas as pd

* Text file (sentence list) from Github https://raw.githubusercontent.com/MK316/workspace/main/ASR01/data/rainbow_sentences.csv

In [2]:
#@markdown Sentence texts to bring
url = "https://raw.githubusercontent.com/MK316/workspace/main/ASR01/data/rainbow_sentences.csv"
data = pd.read_csv(url)
data.head()

Unnamed: 0,SN,Sentence
0,S1,When the sunlight strikes raindrops in the air...
1,S2,The rainbow is a division of white light into ...
2,S3,"These take the shape of a long round arch, wit..."
3,S4,"There is, according to legend, a boiling pot o..."
4,S5,"People look, but no one ever finds it."


## 📌 *File* handling: Google drive mount to bring a zip file

**Note:** Unfold the following cells and do step by step to set up file handling

- Audio files: get a zip file ready
- Create a new folder and unzip audio files to the new folder

In [3]:
# File location: Google drive 
from google.colab import drive 
drive.mount('/content/drive')

!ls "/content/drive/MyDrive/asrdata"
!pwd

Mounted at /content/drive
SE.zip	SK.zip
/content


✏️ Create a new folder named "SE" and unzip 'SE.zip' to the SE folder (created)

In [4]:
!mkdir SK

!unzip "/content/drive/MyDrive/asrdata/SK.zip" -d "/content/SK/"

Archive:  /content/drive/MyDrive/asrdata/SK.zip
  inflating: /content/SK/SK01.wav    
  inflating: /content/SK/SK02.wav    
  inflating: /content/SK/SK03.wav    
  inflating: /content/SK/SK04.wav    
  inflating: /content/SK/SK05.wav    
  inflating: /content/SK/SK06.wav    
  inflating: /content/SK/SK07.wav    
  inflating: /content/SK/SK08.wav    
  inflating: /content/SK/SK09.wav    
  inflating: /content/SK/SK10.wav    
  inflating: /content/SK/SK11.wav    
  inflating: /content/SK/SK12.wav    
  inflating: /content/SK/SK13.wav    
  inflating: /content/SK/SK14.wav    
  inflating: /content/SK/SK15.wav    
  inflating: /content/SK/SK16.wav    
  inflating: /content/SK/SK17.wav    
  inflating: /content/SK/SK18.wav    
  inflating: /content/SK/SK19.wav    


Unmount Google drive

In [5]:
from google.colab import drive
drive.flush_and_unmount()

## ASR to install

model (base.en)

In [6]:
#@markdown Install SR tool
%%capture
!pip install git+https://github.com/openai/whisper.git 

import whisper
# model = whisper.load_model('base.en') 
model = whisper.load_model('medium.en') 

✏️ * File list under the "SE" or "SK" folder (19 audio files) into a designated folder

In [None]:
#@markdown Create a file list as a dataframe
import os
import pandas as pd

dir_path = '/content/SK/'
dir_files = os.listdir(dir_path)
str1 = 'wav'
flist = []

for i in range(0, len(dir_files)):
  str2 = dir_files[i]
  if str1 in str2:
    flist.append(str2)

flist = sorted(flist)

df = pd.DataFrame()
n = len(flist)
nt = n + 1
fn = range(1, nt)
df['ID'] = fn
df['Filename'] = flist

# print(df.to_string(index=False))
df

In [8]:
#@markdown Runtime set up
%%capture
!pip install ipython-autotime
%load_ext autotime

time: 539 µs (started: 2023-01-09 03:37:47 +00:00)


In [9]:
#@markdown ✏️ Change directory to the audio file folder
import os
os.chdir('/content/SK/')

time: 711 µs (started: 2023-01-09 03:37:49 +00:00)


In [12]:
#@markdown Testing ASR (single file): Type a number between 1~19
rname = input("Type ID")
ind = int(rname) - 1
myf = df['Filename'][ind]
result = model.transcribe(myf, language="en",fp16=False)
print('Filename: %s'%myf)
print('='*30)

print("Speech-to-text (recognized): %s"%result["text"])  

Type ID1
Filename: SK01.wav
Speech-to-text (recognized):  When the sunlight strikes raindrops in the air, behold, the commuters are n Eye V
time: 7.56 s (started: 2023-01-09 03:38:50 +00:00)


In [14]:
## R1 (Run this only once! Remove the hashtag to run)
runtimedata = []

time: 13.8 ms (started: 2023-01-09 03:39:15 +00:00)


# 💙 Ready to run: Sentence recognition using ASR and getting the result file

In [15]:
#@markdown # A. Single file to process (same as the above): ID = sentence number
import time
import pandas as pd

def measure_time(function):
  start = time.time()
  function()
  end = time.time()
  return end - start

rname = input("Type ID: ")
ind = int(rname) - 1
myf = df['Filename'][ind]

def code_to_measure():
# your code here  
  result = model.transcribe(myf, language="en",fp16=False)
  print('='*30)
  print("Speech-to-text (recognized): %s"%result['text']) 

runtime = round(measure_time(code_to_measure),3)
print(f"Runtime: {runtime} seconds")

print('Filename: %s'%myf)

# df1 = pd.DataFrame({"function": ["code_to_measure"], "runtime": [runtime]})
df1 = pd.DataFrame({"file": [myf], "runtime": [runtime]})
print("="*30)
data = {myf: round(runtime,3)}
# runtimedata.append(data)
# print(runtimedata)

Type ID: 1
Speech-to-text (recognized):  When the sunlight strikes raindrops in the air, the sun weochemists from callback does not willup Dietarius Leo
Runtime: 5.4 seconds
Filename: SK01.wav
time: 7.66 s (started: 2023-01-09 03:39:27 +00:00)


In [16]:
#@markdown # B. Processing SE folder (19 files) and creating a result file (including Filename, Runtime, Recognized text) as **df1**

import time
import pandas as pd

fname = []
rt = []
rectext = []
df1 = pd.DataFrame()

def measure_time(function):
  start = time.time()
  function()
  end = time.time()
  return end - start

nfiles = len(df['Filename']) #19

for i in range(0, nfiles):
  rname = df['ID'][i]
  ind = int(rname)
  myf = df['Filename'][i]
  fname.append(myf)

  def code_to_measure():
# your code here  
    result = model.transcribe(myf, language="en",fp16=False)
    #print('='*30)
    #print("Speech-to-text (recognized): %s"%result['text']) 
    recresult = result['text']
    rectext.append(recresult)

  runtime = round(measure_time(code_to_measure),3)
  rt.append(str(runtime))

  #print(f"Runtime: {runtime} seconds")
  #print('Filename: %s'%myf)

df1['Filename'] = fname
df1['Runtime'] = rt
df1['Recognized'] = rectext

df1.head()
# df1 = pd.DataFrame({"function": ["code_to_measure"], "runtime": [runtime]})
  # df1.iloc[i] = pd.DataFrame({"file": [myf], "runtime": [runtime]})
  # print("="*30)
# data = {myf: round(runtime,3)}
# runtimedata.append(data)
# print(runtimedata)

Unnamed: 0,Filename,Runtime,Recognized
0,SK01.wav,5.36,When the sunlight strikes raindrops in the ai...
1,SK02.wav,0.953,Ta rainbow EJ division of white light into ma...
2,SK03.wav,4.879,"T-z take the shape away long round arch, with..."
3,SK04.wav,1.0,"Theor is, according to legend, a building put..."
4,SK05.wav,9.779,Peepoh roop!


time: 44.1 s (started: 2023-01-09 03:39:48 +00:00)


In [17]:
# Result file to csv
import os

os.chdir('/content')
!pwd
df1.to_csv('SK_result_m.csv', index=False)

/content
time: 206 ms (started: 2023-01-09 03:40:48 +00:00)


* The result file of SK data (asof 0106): [github link](https://raw.githubusercontent.com/MK316/workspace/main/ASR01/results/SE_result_0106.csv)

* The result file of SK data (asof 0109): [github link](https://raw.githubusercontent.com/MK316/workspace/main/ASR01/results/SE_result_0109.csv)