# Automatic Speech Recognition (ASR)

## Theoritical Aspect

**References:** 
- [Nvidia Developers Blog: Guide to ASR Technology](https://developer.nvidia.com/blog/essential-guide-to-automatic-speech-recognition-technology/)

Speech recognition technology is capable of converting spoken language (an audio signal) into written text that is often used as a command.

ASR is commonly seen in user facing applications such as: Virtual Agents, Live captions etc.

Natural Language Processing (NLP) is core in ASR pipeline, aside from being applied in Language model, it is also used to augment generated transcipts with puntuation and capitalization at the end of ASR pipeline.

Once the transcript is post-processed with NLP, The text is then used for downstram Language modelling task:
- Sentiment Analysis
- Text Analytics
- Text Summarization
- Question Answering

Speech recognition algorithms can be implemented in a `traditional way using statistical algorithms` or by using `deep learning techniques such as neural networks` to convert speech into text.


### Traditional ASR algorithms

#### Hidden Markov models (HMM)

#### Dynamic time warping (DTW) 

## Implementation

In [82]:
import os
from pathlib import Path
import pandas as pd

In [83]:
dataset_path = Path('dataset/cv-corpus-16.1-delta-2023-12-06/')
file_duration_path = dataset_path / 'en' / 'clip_durations.tsv'
validated_clips_metadata_path = dataset_path / 'en' / 'validated.tsv'
# for child in dataset_path.iterdir():
    # print(child)
# file_duration_path.exists()

metadata = pd.read_table(validated_clips_metadata_path)
metadata = metadata.set_index('path')

clip_durations = pd.read_table(file_duration_path)
clip_durations = clip_durations.set_index('clip')

In [84]:
metadata

Unnamed: 0_level_0,client_id,sentence,up_votes,down_votes,age,gender,accents,variant,locale,segment
path,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
common_voice_en_39584989.mp3,00cb430b113e0ac0ec056de203bfcefbd45ea814b48460...,These locomotives are serviced at Washwood Heath.,2,0,,,,,en,
common_voice_en_39576542.mp3,02cc5a0c68b0ac69c83a24da884f3e7069f63b0078cddb...,"Here, she turned her inspirations towards writ...",3,0,,,"England English,Esturine, from the region arou...",,en,
common_voice_en_39582342.mp3,02e97f2f112f01eee1db675e7eb23850294dc865524581...,There is also a garnet mine in West Redding.,2,0,,,,,en,
common_voice_en_38497561.mp3,02fa98c5a9a3e74e0014dec1e4825a9e29f7b918de2278...,"They hastily drive to the plane, and flee from...",2,0,,,"canadian - toronto english,Canadian English",,en,
common_voice_en_39263187.mp3,04347480ab0b18a8f9c3285107d106945cb2b14d430cdc...,They found one such edge in Fairbairn's system.,2,1,,,,,en,
...,...,...,...,...,...,...,...,...,...,...
common_voice_en_38852923.mp3,cf166b2a376b2518887f5a2e3c347a69f495260309ac27...,She seems to have played little part in politics.,4,0,twenties,female,"Southern African (South Africa, Zimbabwe, Nami...",,en,
common_voice_en_38853034.mp3,cf166b2a376b2518887f5a2e3c347a69f495260309ac27...,He made his film debut from movie Uma.,2,0,twenties,female,"Southern African (South Africa, Zimbabwe, Nami...",,en,
common_voice_en_38855973.mp3,cf166b2a376b2518887f5a2e3c347a69f495260309ac27...,There are no known remaining records of the fi...,2,0,twenties,female,"Southern African (South Africa, Zimbabwe, Nami...",,en,
common_voice_en_38855982.mp3,cf166b2a376b2518887f5a2e3c347a69f495260309ac27...,They can be described as medium-sized.,2,0,twenties,female,"Southern African (South Africa, Zimbabwe, Nami...",,en,


In [85]:
clip_durations

Unnamed: 0_level_0,duration[ms]
clip,Unnamed: 1_level_1
common_voice_en_38739592.mp3,7056
common_voice_en_38964148.mp3,3492
common_voice_en_38694636.mp3,3636
common_voice_en_38627562.mp3,5256
common_voice_en_39017255.mp3,4608
...,...
common_voice_en_39237544.mp3,2196
common_voice_en_39575177.mp3,8496
common_voice_en_38558646.mp3,5400
common_voice_en_39228365.mp3,4788


In [86]:
metadata = metadata.join(clip_durations)

In [87]:
if 'client_id' in metadata.columns:
    print("Removing Client-id ...")
    metadata = metadata.drop('client_id', axis=1)
    print("Removal successful ...")
metadata

Removing Client-id ...
Removal successful ...


Unnamed: 0_level_0,sentence,up_votes,down_votes,age,gender,accents,variant,locale,segment,duration[ms]
path,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
common_voice_en_39584989.mp3,These locomotives are serviced at Washwood Heath.,2,0,,,,,en,,5976
common_voice_en_39576542.mp3,"Here, she turned her inspirations towards writ...",3,0,,,"England English,Esturine, from the region arou...",,en,,6084
common_voice_en_39582342.mp3,There is also a garnet mine in West Redding.,2,0,,,,,en,,5184
common_voice_en_38497561.mp3,"They hastily drive to the plane, and flee from...",2,0,,,"canadian - toronto english,Canadian English",,en,,6516
common_voice_en_39263187.mp3,They found one such edge in Fairbairn's system.,2,1,,,,,en,,4896
...,...,...,...,...,...,...,...,...,...,...
common_voice_en_38852923.mp3,She seems to have played little part in politics.,4,0,twenties,female,"Southern African (South Africa, Zimbabwe, Nami...",,en,,5148
common_voice_en_38853034.mp3,He made his film debut from movie Uma.,2,0,twenties,female,"Southern African (South Africa, Zimbabwe, Nami...",,en,,4860
common_voice_en_38855973.mp3,There are no known remaining records of the fi...,2,0,twenties,female,"Southern African (South Africa, Zimbabwe, Nami...",,en,,7056
common_voice_en_38855982.mp3,They can be described as medium-sized.,2,0,twenties,female,"Southern African (South Africa, Zimbabwe, Nami...",,en,,5148


In [88]:
print("Removing completely NULL columns")
metadata.dropna(how='all',axis=1,inplace=True)
metadata

Removing completely NULL columns


Unnamed: 0_level_0,sentence,up_votes,down_votes,age,gender,accents,locale,duration[ms]
path,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
common_voice_en_39584989.mp3,These locomotives are serviced at Washwood Heath.,2,0,,,,en,5976
common_voice_en_39576542.mp3,"Here, she turned her inspirations towards writ...",3,0,,,"England English,Esturine, from the region arou...",en,6084
common_voice_en_39582342.mp3,There is also a garnet mine in West Redding.,2,0,,,,en,5184
common_voice_en_38497561.mp3,"They hastily drive to the plane, and flee from...",2,0,,,"canadian - toronto english,Canadian English",en,6516
common_voice_en_39263187.mp3,They found one such edge in Fairbairn's system.,2,1,,,,en,4896
...,...,...,...,...,...,...,...,...
common_voice_en_38852923.mp3,She seems to have played little part in politics.,4,0,twenties,female,"Southern African (South Africa, Zimbabwe, Nami...",en,5148
common_voice_en_38853034.mp3,He made his film debut from movie Uma.,2,0,twenties,female,"Southern African (South Africa, Zimbabwe, Nami...",en,4860
common_voice_en_38855973.mp3,There are no known remaining records of the fi...,2,0,twenties,female,"Southern African (South Africa, Zimbabwe, Nami...",en,7056
common_voice_en_38855982.mp3,They can be described as medium-sized.,2,0,twenties,female,"Southern African (South Africa, Zimbabwe, Nami...",en,5148
