## Description
This notebook shows how to use a pre-trained conformer-ctc model with [icefall] using HLG decoding + n-gram LM rescoring + attention decoder rescoring.(https://huggingface.co/csukuangfj/icefall-asr-librispeech-conformer-ctc-jit-bpe-500-2021-11-09/tree/main).

## Environment setup

To use a pre-trained model with icefall, we have to install the following dependencies:

- [k2][k2], for FSA operations
- [torchaudio][audio], for reading sound files
- [kaldifeat][kaldifeat], for extracting features from a single sound
  file or multiple sound files

**NOTE**: [lhotse][lhotse] is used only in training time, for data preparation.


[k2]: https://github.com/k2-fsa/k2
[audio]: https://github.com/pytorch/audio
[kaldifeat]: https://github.com/csukuangfj/kaldifeat
[lhotse]: https://github.com/lhotse-speech/lhotse

### Install PyTorch and torchaudio

In [None]:
! nvcc --version

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Sun_Feb_14_21:12:58_PST_2021
Cuda compilation tools, release 11.2, V11.2.152
Build cuda_11.2.r11.2/compiler.29618528_0


In [None]:
! pip install torch==1.7.1+cu101 torchaudio==0.7.2 torchvision==0.8.2 torchtext==0.8.1 -f https://download.pytorch.org/whl/torch_stable.html

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Looking in links: https://download.pytorch.org/whl/torch_stable.html
Collecting torch==1.7.1+cu101
  Downloading https://download.pytorch.org/whl/cu101/torch-1.7.1%2Bcu101-cp38-cp38-linux_x86_64.whl (735.4 MB)
[K     |████████████████████████████████| 735.4 MB 5.9 kB/s 
[?25hCollecting torchaudio==0.7.2
  Downloading torchaudio-0.7.2-cp38-cp38-manylinux1_x86_64.whl (7.6 MB)
[K     |████████████████████████████████| 7.6 MB 4.9 MB/s 
[?25hCollecting torchvision==0.8.2
  Downloading https://download.pytorch.org/whl/cu92/torchvision-0.8.2%2Bcu92-cp38-cp38-linux_x86_64.whl (12.5 MB)
[K     |████████████████████████████████| 12.5 MB 24.9 MB/s 
[?25hCollecting torchtext==0.8.1
  Downloading torchtext-0.8.1-cp38-cp38-manylinux1_x86_64.whl (7.0 MB)
[K     |████████████████████████████████| 7.0 MB 68.6 MB/s 
Installing collected packages: torch, torchvision, torchtext, torchaudio
  Attemptin

In [None]:
! pip install lhotse

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting lhotse
  Downloading lhotse-1.11.0-py3-none-any.whl (588 kB)
[K     |████████████████████████████████| 588 kB 4.7 MB/s 
Collecting cytoolz>=0.10.1
  Downloading cytoolz-0.12.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.8 MB)
[K     |████████████████████████████████| 1.8 MB 90.4 MB/s 
[?25hCollecting lilcom>=1.1.0
  Downloading lilcom-1.5.1.tar.gz (45 kB)
[K     |████████████████████████████████| 45 kB 4.2 MB/s 
Collecting dataclasses
  Downloading dataclasses-0.6-py3-none-any.whl (14 kB)
Collecting intervaltree>=3.1.0
  Downloading intervaltree-3.1.0.tar.gz (32 kB)
Building wheels for collected packages: intervaltree, lilcom
  Building wheel for intervaltree (setup.py) ... [?25l[?25hdone
  Created wheel for intervaltree: filename=intervaltree-3.1.0-py2.py3-none-any.whl size=26118 sha256=ee6c5635c65c44618fe723188b68c158f03001766bf9a92481083a8c32bbd58c
  St

### Install k2

In [None]:
! pip install k2==1.17

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting k2==1.17
  Downloading k2-1.17-py38-none-any.whl (72.7 MB)
[K     |████████████████████████████████| 72.7 MB 1.7 MB/s 
Installing collected packages: k2
Successfully installed k2-1.17


Check that k2 was installed successfully:

In [None]:
! python3 -m k2.version

Collecting environment information...

k2 version: 1.17
Build type: Release
Git SHA1: 3dc222f981b9fdbc8061b3782c3b385514a2d444
Git date: Mon Jul 4 02:13:04 2022
Cuda used to build k2: 10.1
cuDNN used to build k2: 8.0.2
Python version used to build k2: 3.8
OS used to build k2: Ubuntu 18.04.6 LTS
CMake version: 3.23.2
GCC version: 5.5.0
CMAKE_CUDA_FLAGS:   -lineinfo --expt-extended-lambda -use_fast_math -Xptxas=-w  --expt-extended-lambda -gencode arch=compute_35,code=sm_35  -lineinfo --expt-extended-lambda -use_fast_math -Xptxas=-w  --expt-extended-lambda -gencode arch=compute_50,code=sm_50  -lineinfo --expt-extended-lambda -use_fast_math -Xptxas=-w  --expt-extended-lambda -gencode arch=compute_60,code=sm_60  -lineinfo --expt-extended-lambda -use_fast_math -Xptxas=-w  --expt-extended-lambda -gencode arch=compute_61,code=sm_61  -lineinfo --expt-extended-lambda -use_fast_math -Xptxas=-w  --expt-extended-lambda -gencode arch=compute_70,code=sm_70  -lineinfo --expt-extended-lambda -use_fast_

### Install kaldifeat

In [None]:
! pip install kaldifeat==1.21

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting kaldifeat==1.21
  Downloading kaldifeat-1.21.tar.gz (482 kB)
[K     |████████████████████████████████| 482 kB 5.1 MB/s 
[?25hBuilding wheels for collected packages: kaldifeat
  Building wheel for kaldifeat (setup.py) ... [?25l[?25hdone
  Created wheel for kaldifeat: filename=kaldifeat-1.21-cp38-cp38-linux_x86_64.whl size=269348 sha256=c9b8070d9512c5dee9443db1ed49328ebab9cd1d12e347c674b2ce6729faa8d6
  Stored in directory: /root/.cache/pip/wheels/cd/32/ad/768cf2700e58c7899d3668e14e5a513c45e69e36f258b9d0d3
Successfully built kaldifeat
Installing collected packages: kaldifeat
Successfully installed kaldifeat-1.21


To check that kaldifeat was installed successfully, run

In [None]:
! python3 -c "import kaldifeat; print(kaldifeat.__version__)"

1.21


### Install icefall

icefall is a collection of Python scripts. All you need is just to
download its source code and set the `PYTHONPATH` environment variable.

In [None]:
! git clone https://github.com/k2-fsa/icefall


Cloning into 'icefall'...
remote: Enumerating objects: 10001, done.[K
remote: Counting objects: 100% (82/82), done.[K
remote: Compressing objects: 100% (63/63), done.[K
remote: Total 10001 (delta 31), reused 47 (delta 15), pack-reused 9919[K
Receiving objects: 100% (10001/10001), 11.91 MiB | 33.15 MiB/s, done.
Resolving deltas: 100% (6829/6829), done.


In [None]:
! pip install -q kaldialign sentencepiece>=0.1.96

### Load the data from My Drive

In [None]:
import os, sys
from google.colab import drive
drive.mount('/content/drive')


Mounted at /content/drive


## Download pre-trained conformer CTC model

To make the following steps easier, we decide to download the model
to `icefall/egs/librispeech/ASR/tmp`

In [None]:
! apt-get install -y -qq tree sox git-lfs

Selecting previously unselected package libopencore-amrnb0:amd64.
(Reading database ... 124016 files and directories currently installed.)
Preparing to unpack .../0-libopencore-amrnb0_0.1.3-2.1_amd64.deb ...
Unpacking libopencore-amrnb0:amd64 (0.1.3-2.1) ...
Selecting previously unselected package libopencore-amrwb0:amd64.
Preparing to unpack .../1-libopencore-amrwb0_0.1.3-2.1_amd64.deb ...
Unpacking libopencore-amrwb0:amd64 (0.1.3-2.1) ...
Selecting previously unselected package libmagic-mgc.
Preparing to unpack .../2-libmagic-mgc_1%3a5.32-2ubuntu0.4_amd64.deb ...
Unpacking libmagic-mgc (1:5.32-2ubuntu0.4) ...
Selecting previously unselected package libmagic1:amd64.
Preparing to unpack .../3-libmagic1_1%3a5.32-2ubuntu0.4_amd64.deb ...
Unpacking libmagic1:amd64 (1:5.32-2ubuntu0.4) ...
Selecting previously unselected package libsox3:amd64.
Preparing to unpack .../4-libsox3_14.4.2-3ubuntu0.18.04.1_amd64.deb ...
Unpacking libsox3:amd64 (14.4.2-3ubuntu0.18.04.1) ...
Selecting previously un

In [None]:
! cd /content/icefall/egs/librispeech/ASR && \
  mkdir tmp1 && \
  cd tmp1 && \
  git lfs install && \
  git clone -v https://huggingface.co/pkufool/icefall_asr_librispeech_conformer_ctc && \
  cd icefall_asr_librispeech_conformer_ctc && \
  cd ../.. && \
  tree tmp1



Updated git hooks.
Git LFS initialized.
Cloning into 'icefall_asr_librispeech_conformer_ctc'...
POST git-upload-pack (165 bytes)
remote: Enumerating objects: 55, done.[K
remote: Counting objects: 100% (55/55), done.[K
remote: Compressing objects: 100% (51/51), done.[K
remote: Total 55 (delta 14), reused 0 (delta 0), pack-reused 0[K
Unpacking objects: 100% (55/55), done.
tcmalloc: large alloc 1471086592 bytes == 0x560e743e0000 @  0x7fae09ab22a4 0x560e3899578f 0x560e389728db 0x560e389275b3 0x560e388cb34a 0x560e388cb806 0x560e388e8ad1 0x560e388e9069 0x560e388e9593 0x560e3898e482 0x560e3882ecc2 0x560e38815a75 0x560e38816735 0x560e3881573a 0x7fae08df9c87 0x560e3881578a
tcmalloc: large alloc 2206621696 bytes == 0x560ecbed0000 @  0x7fae09ab22a4 0x560e3899578f 0x560e389728db 0x560e389275b3 0x560e388cb34a 0x560e388cb806 0x560e388e8ad1 0x560e388e9069 0x560e388e9593 0x560e3898e482 0x560e3882ecc2 0x560e38815a75 0x560e38816735 0x560e3881573a 0x7fae08df9c87 0x560e3881578a
tcmalloc: large alloc 3

In [None]:
! soxi /content/drive/MyDrive/Neurological_Signals/PEC_7_ses1_Naming_object_1.wav


Input File     : '/content/drive/MyDrive/Neurological_Signals/PEC_7_ses1_Naming_object_1.wav'
Channels       : 1
Sample Rate    : 24000
Precision      : 16-bit
Duration       : 00:00:03.50 = 84000 samples ~ 262.5 CDDA sectors
File Size      : 168k
Bit Rate       : 384k
Sample Encoding: 16-bit Signed Integer PCM



### Change sample rate for decoding from 48000 to 16000

In [None]:
import glob
from pathlib import Path
import os

In [None]:
# original dataset path
path1 = "/content/drive/MyDrive/NLS7/"

#os.chdir(path1)

# save path
path2 = "/content/drive/MyDrive/NLS7/NLS7_16k/"

audio_paths = glob.glob(path1 + '/*.wav')
len(audio_paths)


381

In [None]:
audio_paths = glob.glob(path1 + '/*.wav')

os.chdir(path2)

count = 0
for i in range(len(audio_paths)):
  name = audio_paths[i]
  name=name.split("/")[-1].split(".")[0]
  file_name_original =  path1 + name + '.wav'
  file_name_resample =  path2 + name + '-16k.wav'
  command = "sox "+ file_name_original +" -r 16000 "+ file_name_resample
  os.system(command)
  count += 1

  print(count)

print(count)



1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277


### CTC Decoding

In [None]:
import glob
from pathlib import Path
import os



In [None]:
path3 = '/content/drive/MyDrive/DATA_NEW/test_dataset_16k/'

audio_paths = glob.glob(path3 + '/*.wav')

print(audio_paths)

! cd /content/icefall/egs/librispeech/ASR && \
    PYTHONPATH=/content/icefall python3 ./conformer_ctc/pretrained.py \
      --method ctc-decoding \
      --checkpoint ./tmp1/icefall_asr_librispeech_conformer_ctc/exp/pretrained.pt \
      --bpe-model ./tmp1/icefall_asr_librispeech_conformer_ctc/data/lang_bpe/bpe.model \
      --num-classes 5000 \
      '/content/drive/MyDrive/DATA_NEW/test/test_dataset_16k/AD_1_ses1_CookieThief-16k.wav'




['/content/drive/MyDrive/DATA_NEW/test/test_dataset_16k/AD_1_ses1_Blue_green-16k.wav', '/content/drive/MyDrive/DATA_NEW/test/test_dataset_16k/AD_1_ses1_CookieThief-16k.wav']
2022-12-27 02:06:11,170 INFO [pretrained.py:258] {'sample_rate': 16000, 'subsampling_factor': 4, 'vgg_frontend': False, 'use_feat_batchnorm': True, 'feature_dim': 80, 'nhead': 8, 'attention_dim': 512, 'num_decoder_layers': 0, 'search_beam': 20, 'output_beam': 8, 'min_active_states': 30, 'max_active_states': 10000, 'use_double_scores': True, 'checkpoint': './tmp1/icefall_asr_librispeech_conformer_ctc/exp/pretrained.pt', 'words_file': None, 'HLG': None, 'bpe_model': './tmp1/icefall_asr_librispeech_conformer_ctc/data/lang_bpe/bpe.model', 'method': 'ctc-decoding', 'G': None, 'num_paths': 100, 'ngram_lm_scale': 1.3, 'attention_decoder_scale': 1.2, 'nbest_scale': 0.5, 'sos_id': 1, 'num_classes': 5000, 'eos_id': 1, 'sound_files': ['/content/drive/MyDrive/DATA_NEW/test/test_dataset_16k/AD_1_ses1_CookieThief-16k.wav']}
2022