# Collaborators

**Authors**
- Daniyal Hassan -- **i17-0411**
- Rafsha Mazhar -- **i17-0028**
- Saad Zahoor -- **i17-0046**

**Mentions**
- Taha Firoz -- **i17-0323** 
- Mohammad Saad -- **i17-0033** 

# Tasks


- [x] Setup g2p on colab
- [x] Prepare data for g2p training
- [x] Train g2p
- [x] Prepare vocabulary from transcription
- [x] Decode phonetic dictionary from the trained g2p
- [x] Run Kaldi on colab **[thanks to SAAD]**
- [X] Prepare sample project on chunk of data to test kaldi
- [X] Test Kaldi on sample **[FAILED & ABANDONED]**
- [X] Write script to prepare data in folder heirarchy
- [X] Language Model
- [X] Acoustic Model
- [X] Train ASR
- [X] Evaluate ASR
- [X] Build a user friendly decoder (PyKaldi)

# Resources



[Urdu Language Resource](http://www.cle.org.pk/software/ling_resources.htm)

[Arabic Kadli Repo](https://github.com/Anwarvic/Arabic-Speech-Recognition/tree/master/Kaldi)

[General Kaldi Tutorial](https://eleanorchodroff.com/tutorial/kaldi/training-acoustic-models.html)

[Official Kaldi for Dummies](https://kaldi-asr.org/doc/kaldi_for_dummies.html)

[CMU Sphinx Tutorial](https://cmusphinx.github.io/wiki/tutorial/)

[Arabic ASR Demo](https://www.youtube.com/watch?v=eChY_3sbNAQ)

[Arabic ASR News Article](https://medium.com/@omar.merghany95/how-we-built-arabic-speech-recognition-system-using-kaldi-a10a54678180)

[g2p Training Guide](https://github.com/cmusphinx/g2p-seq2seq/issues/93)

# Project Constants and Imports

In [None]:
import os
from google.colab import files
from google.colab import drive
from os import path

MAX_ALLOCATED = 26998

home = '/content/drive/My Drive/ASR'

# Kaldi Paths
kaldi_tar = path.join(home, 'KS.tar.gz')
kaldi_root = '/content/libs/kaldi'
kaldi_tools = path.join(kaldi_root, 'tools')
kaldi_egs = path.join(kaldi_root, 'egs')
model_name = 'kaldi_b_urdu'
prj_root = path.join(kaldi_egs, model_name)

rsc = path.join(home, "dataset")
scripts_root = path.join(rsc,"final")

# Dataset Paths
nuces_corpora = path.join(home, "downsampled-audio-files")
rumi_corpora = path.join(home, "RUMI/Corpus/Recordings")

# Flags
F_RUMI_CORPORA = False

# Mounting Google Drive

In [None]:
drive.mount('/content/drive', force_remount=True)

# Training g2p-seq2seq & Preparing Phonetic dDctionary

In [None]:
# Preparing Words
words = []
f = open("/content/drive/My Drive/ASR/dataset/UrduPhoneticSpeechCorpus/Transcription-UNICODE-Arabic.txt", "r")
lines = f.readlines()
for line in lines:
    temp = line.strip().split(' ')
    words += temp

In [None]:
# Preparing Phonemes
phonemes = []
f = open("/content/drive/My Drive/ASR/dataset/UrduPhoneticSpeechCorpus/Transcription-CISAMPA.txt", "r")
lines = f.readlines()
lines = lines[2:]
for line in lines:
    temp = line.strip().split('##')
    temp = temp[1:-1]
    phonemes += temp

In [None]:
# Preparing training dataset with words and phonemes [<word> <phoneme>]
length = len(phonemes)
f = open("/content/drive/My Drive/ASR/dataset/training-data.txt", "w")
for i in range(length):
    f.write(words[i] + ' ' + phonemes[i] + os.linesep)
f.close()

In [None]:
# Cloning Repo [OPTIONAL]
! cd "/content/drive/My Drive/ASR" && git clone https://github.com/cmusphinx/g2p-seq2seq.git

In [None]:
# Installing g2p-seq2seq & Downgrading tensorflow [IMPORTANT]
! cd "/content/drive/My Drive/ASR/g2p-seq2seq/" && python3 setup.py install
! pip install tensorflow==1.8.0

In [None]:
# Train model
! g2p-seq2seq --train '/content/drive/My Drive/ASR/dataset/training-data.txt' --model_dir '/content/drive/My Drive/ASR/phonetic-model-1200'

In [None]:
# Test Model in Interactive Mode [OPTIONAL]
! g2p-seq2seq --interactive --model_dir '/content/drive/My Drive/ASR/phonetic-model-4000'

In [None]:
# Preparing Vocabulary to decode via the trained model
vocabulary = []
f = open('/content/drive/My Drive/ASR/dataset/final/corpus.txt')
lines = f.readlines()
for line in lines:
    vocabulary += line.strip().split(' ')

vocabulary = list(filter(lambda _: _ != '', vocabulary))
vocabulary = list(set(vocabulary))

In [None]:
# Conforming prepared vocabulary into dataset for decoding
f = open('/content/drive/My Drive/ASR/dataset/final/vocabulary.txt',"w")

for unique_word in vocabulary:
    f.write(unique_word + os.linesep)
f.close()

In [None]:
# decoding the prepared vocabulary and preparing our phonetic dictionary
! g2p-seq2seq --decode '/content/drive/My Drive/ASR/dataset/final/vocabulary.txt' --model_dir '/content/drive/My Drive/ASR/phonetic-model-4000' --output '/content/drive/My Drive/ASR/dataset/final/lexicon.txt'

# Preparing Speech Corpus & Checking Missing Files

In [None]:
# Extracting sentences from transcription file
sentences = []
f = open('/content/drive/My Drive/ASR/dataset/transcription.csv')
lines = f.readlines()
for line in lines[:MAX_ALLOCATED]:
    temp = line.split('     ')
    sentences.append(temp[1].strip())

sentences = list(set(sentences))

In [None]:
# Write sentences to sentences.txt [One sentence per line]
f = open('/content/drive/My Drive/ASR/dataset/sentences.txt',"w")
for sentence in sentences:
    f.write(sentence + os.linesep)
f.close()

In [None]:
# Store missing audio files in filenames_u set
filenames_t = set()
f = open('/content/drive/My Drive/ASR/dataset/transcription.csv')
lines = f.readlines()
for line in lines[:MAX_ALLOCATED]:
    filenames_t.add(line.split('     ')[0].strip() + '.wav')

filenames_a = set()
for filename in os.listdir('/content/drive/My Drive/ASR/downsampled-audio-files'):
    filenames_a.add(filename)

filenames_u = filenames_t.difference(filenames_a)

f.close()

# Setting Up Kaldi

In [None]:
# We probably need to install these again
! apt install -qq g++ automake autoconf libtool subversion sox gawk

g++ is already the newest version (4:7.4.0-1ubuntu2.3).
g++ set to manually installed.
The following package was automatically installed and is no longer required:
  libnvidia-common-440
Use 'apt autoremove' to remove it.
The following additional packages will be installed:
  autotools-dev file libapr1 libaprutil1 libmagic-mgc libmagic1
  libopencore-amrnb0 libopencore-amrwb0 libserf-1-1 libsigsegv2
  libsox-fmt-alsa libsox-fmt-base libsox3 libsvn1 m4
Suggested packages:
  autoconf-archive gnu-standards autoconf-doc gettext gawk-doc libsox-fmt-all
  libtool-doc gcj-jdk m4-doc db5.3-util libapache2-mod-svn subversion-tools
The following NEW packages will be installed:
  autoconf automake autotools-dev file gawk libapr1 libaprutil1 libmagic-mgc
  libmagic1 libopencore-amrnb0 libopencore-amrwb0 libserf-1-1 libsigsegv2
  libsox-fmt-alsa libsox-fmt-base libsox3 libsvn1 libtool m4 sox subversion
0 upgraded, 21 newly installed, 0 to remove and 43 not upgraded.
Need to get 4,696 kB of archives

In [None]:
# Untar the kaldi tarball to /content/kaldi
! tar xvzf '$home/KS.tar.gz'

[1;30;43mStreaming output truncated to the last 5000 lines.[0m
libs/kaldi/egs/gop/s5/local/remove_phone_markers.pl
libs/kaldi/egs/gop/s5/local/make_testcase.sh
libs/kaldi/egs/gop/s5/steps
libs/kaldi/egs/gop/s5/cmd.sh
libs/kaldi/egs/gop/s5/run.sh
libs/kaldi/egs/gop/s5/utils
libs/kaldi/egs/gop/s5/path.sh
libs/kaldi/egs/gop/README.md
libs/kaldi/egs/chime5/
libs/kaldi/egs/chime5/s5/
libs/kaldi/egs/chime5/s5/local/
libs/kaldi/egs/chime5/s5/local/check_tools.sh
libs/kaldi/egs/chime5/s5/local/prepare_data.sh
libs/kaldi/egs/chime5/s5/local/score_for_submit.sh
libs/kaldi/egs/chime5/s5/local/run_recog.sh
libs/kaldi/egs/chime5/s5/local/run_wpe.sh
libs/kaldi/egs/chime5/s5/local/nnet3/
libs/kaldi/egs/chime5/s5/local/nnet3/compare_wer.sh
libs/kaldi/egs/chime5/s5/local/nnet3/run_ivector_common.sh
libs/kaldi/egs/chime5/s5/local/train_lms_srilm.sh
libs/kaldi/egs/chime5/s5/local/chain/
libs/kaldi/egs/chime5/s5/local/chain/tuning/
libs/kaldi/egs/chime5/s5/local/chain/tuning/run_tdnn_1a.sh
libs/kaldi/eg

In [None]:
%cd '$kaldi_tools'

/content/libs/kaldi/tools


In [None]:
# Now we can just check if everything still works
! chmod 755 extras/install_mkl.sh
! ./extras/install_mkl.sh
! chmod 755 extras/check_dependencies.sh
! ./extras/check_dependencies.sh 

./extras/install_mkl.sh: Your system is using debian-style package management.
+ apt-get update
0% [Working]            Get:1 https://cloud.r-project.org/bin/linux/ubuntu bionic-cran35/ InRelease [3,626 B]
0% [Connecting to archive.ubuntu.com] [Connecting to security.ubuntu.com (91.180% [Connecting to archive.ubuntu.com] [Connecting to security.ubuntu.com (91.180% [1 InRelease gpgv 3,626 B] [Connecting to archive.ubuntu.com] [Connecting to                                                                               Ign:2 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64  InRelease
0% [1 InRelease gpgv 3,626 B] [Connecting to archive.ubuntu.com (91.189.88.152)                                                                               Get:3 http://security.ubuntu.com/ubuntu bionic-security InRelease [88.7 kB]
Get:4 https://cloud.r-project.org/bin/linux/ubuntu bionic-cran35/ Packages [93.1 kB]
Ign:5 https://developer.download.nvidia.com/compute/mach

# Kaldi Folder Heirarchy

In [None]:
'''
kaldi_b_urdu
│   run.sh
│   run-custom.sh
│   cmd.sh
│   cmd-custom.sh
|   path.sh
|   path-custom.sh
└───audio
│   └───speaker-1
|   |   |   000000.wav
|   |   |   000001.wav
|   |   |   ...  
│   └───speaker-2
|   |   |   000000.wav
|   |   |   000001.wav
|   |   |   ...
|   └───speaker-3
|   |   |   000000.wav
|   |   |   000001.wav
|   |   |   ...
└───conf
│   |   decode.config
│   |   mfcc.conf
└───custom_scripts
│   |   mfcc_cmvn.sh
│   |   mmi_sgmm2.sh
│   |   mono.sh
│   |   prep_lm.sh
│   |   sgmm2.sh
│   |   tri1.sh
│   |   tri2.sh
│   |   tri3.sh
└───data
│   └───test
|   |   |   spk2gender
|   |   |   text
|   |   |   utt2spk
|   |   |   wav.scp
│   └───train
|   |   |   spk2gender
|   |   |   text
|   |   |   utt2spk
|   |   |   wav.scp
│   └───local
|   |   |   corpus.txt
|   |   |   score.sh
|   |   └───dict
|   |   |   |   lexicon.txt
|   |   |   |   nonsilence_phones.txt
|   |   |   |   optional_silence.txt
|   |   |   |   silence_phones.txt
└───steps
|   | <files from steps folder>
└───utils
|   | <files from utils folder>
'''

# Testing On Sample [ABANDONED]

In [None]:
# copy sample to kaldi/egs
! cp -avr '$home/sample' '$kaldi_egs'

In [None]:
# some final touches
! cp -avr '$home/srilm-1.7.3.tar.gz' '$kaldi_tools/srilm.tgz'
! chmod 755 '$kaldi_tools/extras/install_srilm.sh'
! sh '$kaldi_tools/extras/install_srilm.sh'
! chmod -R 755 /content/libs/kaldi

In [None]:
# exec run.sh
! cd '$kaldi_egs/sample' && sudo ./run.sh

# Data Prepration [In collaboration with Taha]

In [None]:
! mkdir "$kaldi_root" "$kaldi_tools" "$kaldi_egs" "$prj_root"  

In [None]:
! mkdir -p "$prj_root/audio" "$prj_root/conf" "$prj_root/data" "$prj_root/data/test" \
            "$prj_root/data/train" "$prj_root/data/local" "$prj_root/data/local/dict"

In [None]:
import pandas as pd
from csv import reader
from itertools import chain
import re
with open(path.join(rsc,'speakers_list.csv')) as csv_file:
    parse_range = lambda rng: tuple(map(int,list(re.search('(\d*) - (\d*)', rng).group(1,2))))
    rows = [[*row[1:3],row[3].strip()] for row in list(reader(csv_file, delimiter=','))[1:]]
    map_name = lambda name: name[2:4] + name[-4:-2]
    parsed_rows = [(row[0], ''.join(map_name(row[1].replace(" ","").upper())),parse_range(row[2]))for row in rows]

dataset_filenames = set([audio.strip(".wav") for audio in os.listdir(nuces_corpora) if audio.find('_')])
data_df = pd.read_csv(path.join(rsc, "_transcription.csv"), header=None)

In [None]:
speaker_files = {}
for gender, name, row_range in parsed_rows:
  assigned_audio = set(data_df[row_range[0]-1:row_range[1]-1][0])
  submitted_audio = list(dataset_filenames.intersection(assigned_audio))
  file_text = {x[0]: x[1:][0].strip() for x in data_df.loc[data_df[0].isin(submitted_audio)].itertuples(index=False)}
  speaker_files[name] = {"audio":file_text, "gender":gender}
  
speaker_files = {k: v for k, v in speaker_files.items() if len(v["audio"])>0}

# Make sure all names are unique
assert(len(speaker_files.keys()) == len(set(speaker_files.keys())))

In [None]:
from random import shuffle
train_size = 0.9
shuffle_num = 3
split_speakers = {}

for speaker, data in speaker_files.items():
  files = list(data["audio"].items())
  for i in range(shuffle_num):
    shuffle(files)
  train_idx = int(len(files)*train_size)
  split_speakers[speaker] = {"audio":{"train":files[:train_idx],"test":files[train_idx:]}, "gender": data["gender"]}

In [None]:
utterances = {"test":[],"train":[]}
text = {"test":[],"train":[]}
utt_spk = {"test":[],"train":[]}
spk_gndr = {"test":[],"train":[]}

for speaker, data in split_speakers.items():
  audio, gender = data.values()

  for set_type, audio_set in audio.items():
    spk_gndr[set_type].append((speaker, gender))

    for utt_id, transcript in audio_set:
      utt_id = speaker+utt_id[4:]+utt_id[3]+utt_id[:3] # Remove rumi names[001F...] and add nuces names
      file_path = path.join(prj_root, "audio",speaker, f"{utt_id}.wav")
      utterances[set_type].append((f"{utt_id}",file_path))
      text[set_type].append((f"{utt_id}",transcript))
      utt_spk[set_type].append((f"{utt_id}",speaker))

key=lambda x: x[0]

for set_type in ["test","train"]:
  text[set_type].sort(key=key)
  utt_spk[set_type].sort(key=key)
  spk_gndr[set_type].sort(key=key)
  utterances[set_type].sort(key=key)

  text[set_type] = map(lambda row:f"{row[0]} {row[1]}",text[set_type])
  utt_spk[set_type] = map(lambda row:f"{row[0]} {row[1]}",utt_spk[set_type])
  spk_gndr[set_type] = map(lambda row:f"{row[0]} {row[1]}",spk_gndr[set_type])
  utterances[set_type] = map(lambda row:f"{row[0]} {row[1]}",utterances[set_type])
  
  with open(path.join(prj_root, "data", set_type,"wav.scp"), "w") as f:
    f.write('\n'.join(utterances[set_type])+'\n')

  with open(path.join(prj_root, "data", set_type,"text"), "w") as f:
    f.write('\n'.join(text[set_type])+'\n')

  with open(path.join(prj_root, "data", set_type,"utt2spk"), "w") as f:
    f.write('\n'.join(utt_spk[set_type])+'\n')

  with open(path.join(prj_root, "data", set_type,"spk2gender"), "w") as f:
    f.write('\n'.join(spk_gndr[set_type])+'\n')
    

In [None]:
# Clear audio folder before copying audio files
! rm -rf "$prj_root/audio" && mkdir "$prj_root/audio" 

In [None]:
if path.exists(path.join(home,"downsampled-audio-files.tar.gz")):
  ! tar xvzf '$home/downsampled-audio-files.tar.gz' -C '$prj_root'
else:  
  import shutil
  import subprocess
  from tqdm.notebook import tqdm

  for speaker, data in tqdm(speaker_files.items()):
    audio, gender = data.values()
    parent = path.join(prj_root, 'audio', speaker)
    os.mkdir(parent)
    for utt_id in tqdm(audio.keys()):
      src_path = path.join(nuces_corpora, f"{utt_id}.wav")

      utt_id = speaker+utt_id[4:]+utt_id[3]+utt_id[:3]
      dst_path = path.join(parent, f"{utt_id}.wav")
      subprocess.run(["touch",dst_path])
      shutil.copy(src_path, dst_path)
  if F_RUMI_CORPORA:
    ! cp -avr '$rumi_corpora/.' '$prj_root/audio'
  ! cd "$prj_root" && tar -czvf "$home/downsampled-audio-files.tar.gz" "audio" 
  

[1;30;43mStreaming output truncated to the last 5000 lines.[0m
audio/FAAK/FAAKWN092M030.wav
audio/FAAK/FAAKCN039M031.wav
audio/FAAK/FAAKSN008M030.wav
audio/FAAK/FAAKPN152M031.wav
audio/FAAK/FAAKSN038M030.wav
audio/FAAK/FAAKSN040M030.wav
audio/FAAK/FAAKWN037M030.wav
audio/FAAK/FAAKWN041M030.wav
audio/FAAK/FAAKSN013M030.wav
audio/FAAK/FAAKEN020M031.wav
audio/FAAK/FAAKWN009M030.wav
audio/FAAK/FAAKRN084M030.wav
audio/FAAK/FAAKEN055M031.wav
audio/FAAK/FAAKPN015F032.wav
audio/FAAK/FAAKCN024M030.wav
audio/FAAK/FAAKSN038M031.wav
audio/FAAK/FAAKRN049M030.wav
audio/FAAK/FAAKCN023M030.wav
audio/FAAK/FAAKWN094M031.wav
audio/FAAK/FAAKRN076M030.wav
audio/FAAK/FAAKEN064M031.wav
audio/FAAK/FAAKWN045M030.wav
audio/FAAK/FAAKCN037M030.wav
audio/FAAK/FAAKPN075M030.wav
audio/FAAK/FAAKSN022M030.wav
audio/FAAK/FAAKPN100M031.wav
audio/FAAK/FAAKPN084M030.wav
audio/FAAK/FAAKCN056M031.wav
audio/FAAK/FAAKCN050M031.wav
audio/FAAK/FAAKPN185M031.wav
audio/FAAK/FAAKWN115M031.wav
audio/FAAK/FAAKSN026M030.wav
audio/F

In [None]:
if F_RUMI_CORPORA:
  ! cp '$rsc/final/lexicon.txt' '$prj_root/data/local/dict/lexicon.txt'
  ! cp '$rsc/final/corpus.txt' '$prj_root/data/local/corpus.txt' 
else:
  ! cp '$rsc/phonetic-dictionary.txt' '$prj_root/data/local/dict/lexicon.txt'
  ! cp '$rsc/corpus.txt' '$prj_root/data/local/corpus.txt'

! cut -d ' ' -f 2- "$prj_root"/data/local/dict/lexicon.txt | sed 's/ /\n/g' | sed '/UNK/d' | sed '/SIL/d' | sort -u > "$prj_root"/data/local/dict/nonsilence_phones.txt
! echo -e 'SIL\nUNK' > "$prj_root"/data/local/dict/silence_phones.txt
! echo 'SIL' > "$prj_root"/data/local/dict/optional_silence.txt

! echo -e '--use-energy=false\n--sample-frequency=16000' > "$prj_root"/conf/mfcc.conf
! echo -e 'first_beam=10.0\nbeam=13.0\nlattice_beam=6.0' > "$prj_root"/conf/decode.config

! ln -s "$kaldi_egs/wsj/s5/steps" "$prj_root"
! ln -s "$kaldi_egs/wsj/s5/utils" "$prj_root"

! cp -r '$scripts_root/custom_scripts' '$prj_root'
! cp -a '$scripts_root/scripts/cmd-custom.sh' '$scripts_root/scripts/path-custom.sh' '$scripts_root/scripts/run-custom.sh' '$prj_root'
! cp -a '$rsc/sample/cmd.sh' '$rsc/sample/path.sh' '$rsc/sample/run.sh' '$prj_root'
! cp -a '$kaldi_egs/voxforge/s5/local/score.sh' '$prj_root/data/local/score.sh' 

# Running On NUCES Dataset

In [None]:
# Restoring progress from checkpoint
! tar xvzf '$home/checkpoint_tri3.tar.gz' -C '$kaldi_egs'

kaldi_b_urdu/
kaldi_b_urdu/run.sh
kaldi_b_urdu/custom_scripts/
kaldi_b_urdu/custom_scripts/tri3.sh
kaldi_b_urdu/custom_scripts/sgmm2.sh
kaldi_b_urdu/custom_scripts/mfcc_cmvn.sh
kaldi_b_urdu/custom_scripts/tri2.sh
kaldi_b_urdu/custom_scripts/mmi_sgmm2.sh
kaldi_b_urdu/custom_scripts/prep_lm.sh
kaldi_b_urdu/custom_scripts/tri1.sh
kaldi_b_urdu/custom_scripts/mono.sh
kaldi_b_urdu/data/
kaldi_b_urdu/data/local/
kaldi_b_urdu/data/local/tmp/
kaldi_b_urdu/data/local/tmp/lm.arpa
kaldi_b_urdu/data/local/lang/
kaldi_b_urdu/data/local/lang/lexiconp_disambig.txt
kaldi_b_urdu/data/local/lang/lex_ndisambig
kaldi_b_urdu/data/local/lang/align_lexicon.txt
kaldi_b_urdu/data/local/lang/phone_map.txt
kaldi_b_urdu/data/local/lang/lexiconp.txt
kaldi_b_urdu/data/local/score.sh
kaldi_b_urdu/data/local/corpus.txt
kaldi_b_urdu/data/local/dict/
kaldi_b_urdu/data/local/dict/nonsilence_phones.txt
kaldi_b_urdu/data/local/dict/silence_phones.txt
kaldi_b_urdu/data/local/dict/lexicon.txt
kaldi_b_urdu/data/local/dict/lex

In [None]:
# Some final touches
! chmod -R 755 /content/libs/kaldi
! cp -avr '$home/srilm-1.7.3.tar.gz' '$kaldi_tools/srilm.tgz'
! cd '$kaldi_tools' && sh '$kaldi_tools/extras/install_srilm.sh'

'/content/drive/My Drive/ASR/srilm-1.7.3.tar.gz' -> '/content/libs/kaldi/tools/srilm.tgz'
Installing libLBFGS library to support MaxEnt LMs
--2020-06-17 04:57:53--  https://github.com/downloads/chokkan/liblbfgs/liblbfgs-1.10.tar.gz
Resolving github.com (github.com)... 140.82.113.4
Connecting to github.com (github.com)|140.82.113.4|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://github.s3.amazonaws.com/downloads/chokkan/liblbfgs/liblbfgs-1.10.tar.gz?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAISTNZFOVBIJMK3TQ%2F20200617%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20200617T045754Z&X-Amz-Expires=300&X-Amz-SignedHeaders=host&X-Amz-Signature=bca0c288d05bc1185d510979ab4ce9b086029037ce2ce301c40bb5c8dc6d2ddb [following]
--2020-06-17 04:57:54--  https://github.s3.amazonaws.com/downloads/chokkan/liblbfgs/liblbfgs-1.10.tar.gz?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAISTNZFOVBIJMK3TQ%2F20200617%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=

Automatic Approach

In [None]:
# Say bismillah, close your eyes and exec run-custom.sh
! cd '$prj_root' && ./run-custom.sh

Manual Step by Step Approach

In [None]:
%cd '$prj_root/custom_scripts'

/content/libs/kaldi/egs/kaldi_b_urdu/custom_scripts


In [None]:
!./prep_lm.sh

In [None]:
!./mfcc_cmvn.sh

In [None]:
!./mono.sh

In [None]:
!./tri1.sh

In [None]:
!./tri2.sh

In [None]:
!cd "$kaldi_egs" && tar --exclude='audio' -czvf "$home/checkpoint_tri2.tar.gz" "kaldi_b_urdu"

In [None]:
!./tri3.sh

In [None]:
!./tri3.sh

In [None]:
!cd "$kaldi_egs" && tar --exclude='audio' -czvf "$home/checkpoint_tri3.tar.gz" "kaldi_b_urdu"

In [None]:
!./sgmm2.sh

In [None]:
!cd "$kaldi_egs" && tar --exclude='audio' -czvf "$home/checkpoint_sgmm2.tar.gz" "kaldi_b_urdu"

In [None]:
!./mmi_sgmm2.sh

In [None]:
!cd "$kaldi_egs" && tar --exclude='audio' -czvf "$home/checkpoint_mmisgmm2.tar.gz" "kaldi_b_urdu"

In [None]:
!cd "$kaldi_egs" && tar -czvf "$home/final_trained.tar.gz" "kaldi_b_urdu"

KALDI Specifics

In [None]:
# Clean the files for retraining
! rm '$prj_root/data/test/utt2dur'
! rm '$prj_root/data/test/utt2num_frames'

! rm '$prj_root/data/train/utt2num_frames'
! rm '$prj_root/data/train/utt2dur'

Commands that may come handy at times

In [None]:
# For the sake of report

# ! cd '/content/libs/kaldi/egs/kaldi_b_urdu/utils' && sh 'fix_data_dir.sh' '/content/libs/kaldi/egs/kaldi_b_urdu/data/train'
# ! cd '/content/kaldi/egs/sample' && sudo ./utils/utt2spk_to_spk2utt.pl data/train/utt2spk > data/train/spk2utt
# ! cd '/content/kaldi/egs/sample' && ./utils/utt2spk_to_spk2utt.pl
# ! find '/content/libs/kaldi/' -name "fix_data_dir.sh"
# ! cd '/content/drive/My Drive/ASR' && mkdir 'phonetic-model-4000'
# ! cp -avr '/content/drive/My Drive/ASR/model/.' '/content/drive/My Drive/ASR/phonetic-model-400000/'
# ! sh /content/libs/kaldi/egs/sample/utils/fix_data_dir.sh
# ! rm -rf '/content/libs/kaldi/egs/kaldi_b_urdu/content'
# ! rm -rf '/content/libs/kaldi/egs/kaldi_b_urdu/data/test/data'
# !cd "$kaldi_egs" && tar -czvf "$home/checkpoint.tar.gz" "kaldi_b_urdu"

Creating a Checkpoint

In [None]:
! cd "$kaldi_egs" && tar -czvf "$home/checkpoint.tar.gz" "kaldi_b_urdu"

# Setup PyKaldi

In [None]:
import sys
# install conda
! wget https://repo.continuum.io/miniconda/Miniconda3-4.5.4-Linux-x86_64.sh && bash Miniconda3-4.5.4-Linux-x86_64.sh -bfp /usr/local

# add to path
sys.path.append('/usr/local/lib/python3.6/site-packages')

# install pyKaldi
! conda install -c pykaldi pykaldi

--2020-07-29 10:33:39--  https://repo.continuum.io/miniconda/Miniconda3-4.5.4-Linux-x86_64.sh
Resolving repo.continuum.io (repo.continuum.io)... 104.18.201.79, 104.18.200.79, 2606:4700::6812:c84f, ...
Connecting to repo.continuum.io (repo.continuum.io)|104.18.201.79|:443... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: https://repo.anaconda.com/miniconda/Miniconda3-4.5.4-Linux-x86_64.sh [following]
--2020-07-29 10:33:39--  https://repo.anaconda.com/miniconda/Miniconda3-4.5.4-Linux-x86_64.sh
Resolving repo.anaconda.com (repo.anaconda.com)... 104.16.131.3, 104.16.130.3, 2606:4700::6810:8203, ...
Connecting to repo.anaconda.com (repo.anaconda.com)|104.16.131.3|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 58468498 (56M) [application/x-sh]
Saving to: ‘Miniconda3-4.5.4-Linux-x86_64.sh’


2020-07-29 10:33:40 (176 MB/s) - ‘Miniconda3-4.5.4-Linux-x86_64.sh’ saved [58468498/58468498]

PREFIX=/usr/local
installing: python-3.6.5-hc3

# Audio Decoding Using Trained Models

In [None]:
# get recognizer.py
! cp -av '$home/recognizer.py' '/content/recognizer.py'
%cd /content

'/content/drive/My Drive/ASR/recognizer.py' -> '/content/recognizer.py'
/content


Using Mono Trained Model

In [None]:
! python recognizer.py mono '$home/test_recordings'

/content/libs/kaldi/egs/kaldi_b_urdu/exp/mono
/content/libs/kaldi/src/featbin/apply-cmvn-sliding --cmn-window=1000000000 --center=true ark:- ark:- 
/content/libs/kaldi/src/featbin/compute-mfcc-feats --allow-downsample --config=/content/libs/kaldi/egs/kaldi_b_urdu/conf/mfcc.conf scp:wav.scp ark:- 
/content/libs/kaldi/src/featbin/add-deltas ark:- ark:- 
Audio file: 001FEN029
Trancription: اسلام تلیگو ملیالم گیا ہے
Audio file: 001FEN030
Trancription: اسمبلی کو نکالا گیا ہے ت
Audio file: 001FEN032
Trancription: یہ واضح اشا ساتھ اور دایان نے مل کر دیا
LOG (compute-mfcc-feats[5.5.707~1-c9d8b]:main():compute-mfcc-feats.cc:181) Processed 10 utterances
Audio file: 001FPN001
Trancription: مسٹراسوکن آئکون تھیں
LOG (compute-mfcc-feats[5.5.707~1-c9d8b]:main():compute-mfcc-feats.cc:185)  Done 11 out of 11 utterances.
Audio file: 001FPN002
Trancription: میرے پاس موبائل فون نہیں ہے
Audio file: 001FPN003
Trancription: فیس بک میکرز بنایا جن کی سوشل میڈیا پر افسوس ہے
Audio file: 001FPN004
Trancription: ف

Using TR1 Trained Model

In [None]:
! python recognizer.py tri1 '$home/test_recordings'

/content/libs/kaldi/egs/kaldi_b_urdu/exp/tri1
/content/libs/kaldi/src/featbin/compute-mfcc-feats --allow-downsample --config=/content/libs/kaldi/egs/kaldi_b_urdu/conf/mfcc.conf scp:wav.scp ark:- 
/content/libs/kaldi/src/featbin/add-deltas ark:- ark:- 
/content/libs/kaldi/src/featbin/apply-cmvn-sliding --cmn-window=1000000000 --center=true ark:- ark:- 
Audio file: 001FEN029
Trancription: مسلم والی تو میرا نام کا خیال ہے
Audio file: 001FEN030
Trancription: مسلم والی تو میرا نام داری حال ہے
Audio file: 001FEN032
Trancription: یہ پروجیکٹ نقشہ ساتھ اور داری حال میں مل کر کیا
LOG (compute-mfcc-feats[5.5.707~1-c9d8b]:main():compute-mfcc-feats.cc:181) Processed 10 utterances
Audio file: 001FPN001
Trancription: میرے پاس موبائل فون نہیں ہے
LOG (compute-mfcc-feats[5.5.707~1-c9d8b]:main():compute-mfcc-feats.cc:185)  Done 11 out of 11 utterances.
Audio file: 001FPN002
Trancription: میرے پاس موبائل فون نہیں ہے
Audio file: 001FPN003
Trancription: فیس بک دیکھنے رہا جن کی ایک سوشل میڈیا پر فورڈ ہے
Audi

Using TR2 Trained Model

In [None]:
! python recognizer.py tri2 '$home/test_recordings'

/content/libs/kaldi/egs/kaldi_b_urdu/exp/tri2
/content/libs/kaldi/src/featbin/transform-feats /content/libs/kaldi/egs/kaldi_b_urdu/exp/tri2/final.mat ark:- ark:- 
/content/libs/kaldi/src/featbin/compute-mfcc-feats --allow-downsample --config=/content/libs/kaldi/egs/kaldi_b_urdu/conf/mfcc.conf scp:wav.scp ark:- 
/content/libs/kaldi/src/featbin/apply-cmvn-sliding --cmn-window=1000000000 --center=true ark:- ark:- 
/content/libs/kaldi/src/featbin/splice-feats ark:- ark:- 
Audio file: 001FEN029
Trancription: اس علم ہو لیکن میرا نام ڈال یاد ہے
LOG (compute-mfcc-feats[5.5.707~1-c9d8b]:main():compute-mfcc-feats.cc:181) Processed 10 utterances
Audio file: 001FEN030
Trancription: مثلا والی کم میرا نام دار یاد ہے ت
Audio file: 001FEN032
Trancription: یہ پروجیکٹ بخشا ساتھ اور داڑھی والے مل کر کیا
Audio file: 001FEN033
Trancription: الامکان ہے اور یہ بجے تک شاہ اور سات نے کیا
Audio file: 001FEN034
Trancription: اب وقت شکریہ خدا حافظ
LOG (compute-mfcc-feats[5.5.707~1-c9d8b]:main():compute-mfcc-feats

Using TR3 Trained Model

In [None]:
! python recognizer.py tri3 '$home/test_recordings'

/content/libs/kaldi/egs/kaldi_b_urdu/exp/tri3
/content/libs/kaldi/src/featbin/compute-mfcc-feats --allow-downsample --config=/content/libs/kaldi/egs/kaldi_b_urdu/conf/mfcc.conf scp:wav.scp ark:- 
/content/libs/kaldi/src/featbin/apply-cmvn-sliding --cmn-window=1000000000 --center=true ark:- ark:- 
/content/libs/kaldi/src/featbin/splice-feats ark:- ark:- 
/content/libs/kaldi/src/featbin/transform-feats /content/libs/kaldi/egs/kaldi_b_urdu/exp/tri3/final.mat ark:- ark:- 
Audio file: 001FEN029
Trancription: اس چلا ملک قوم میرا نام کا خیال ہے
LOG (compute-mfcc-feats[5.5.707~1-c9d8b]:main():compute-mfcc-feats.cc:181) Processed 10 utterances
Audio file: 001FEN030
Trancription: اشک سلام علی کون میرا نام دار یاد ہے ت
Audio file: 001FEN032
Trancription: یہ پروجیکٹ پر شاہ ساتھ اور داڑھی والے مل کر کیا
Audio file: 001FEN033
Trancription: الامکان ہے اور یہ پروجکٹ افشا اور سواد نے کیا
Audio file: 001FEN034
Trancription: وہ اب شکریہ خدا حافظ
LOG (compute-mfcc-feats[5.5.707~1-c9d8b]:main():compute-mfc