# Yesno recipe in icefall

This notebook shows you how to setup the environment to use [icefall][icefall] for training and decoding.
It also describes how to use a per-trained model to decode waves.


We use the [yesno] dataset as an example.

[icefall]: https://github.com/k2-fsa/icefall
[yesno]: https://www.openslr.org/1/

## Environment setup

### Install PyTorch and torchaudio

In [1]:
import torch
print(torch.__version__)

2.0.1+cu117


Colab pre-installs PyTorch, so we don't need to install it here.

From https://pytorch.org/audio/main/installation.html#compatibility-matrix, we need to install torchaudio==2.0.2 as the current PyTorch version is 2.0.1

In [2]:
! pip install torchaudio==2.0.2



### Install k2

We are going to install k2 by following https://k2-fsa.github.io/k2/installation/from_wheels.html.


In [5]:
! pip install k2==1.24.3.dev20230718+cuda11.7.torch2.0.1 -f https://k2-fsa.github.io/k2/cuda.html

Looking in links: https://k2-fsa.github.io/k2/cuda.html
Collecting k2==1.24.3.dev20230718+cuda11.7.torch2.0.1
  Downloading https://huggingface.co/csukuangfj/k2/resolve/main/cuda/k2-1.24.3.dev20230718%2Bcuda11.7.torch2.0.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (92.9 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m92.9/92.9 MB[0m [31m4.6 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: k2
  Attempting uninstall: k2
    Found existing installation: k2 1.24.3.dev20230718+cuda11.8.torch2.0.1
    Uninstalling k2-1.24.3.dev20230718+cuda11.8.torch2.0.1:
      Successfully uninstalled k2-1.24.3.dev20230718+cuda11.8.torch2.0.1
Successfully installed k2-1.24.3.dev20230718+cuda11.7.torch2.0.1


## Inference with a pre-trained model


Check that k2 was installed successfully:

In [6]:
! python3 -m k2.version

Collecting environment information...

k2 version: 1.24.3
Build type: Release
Git SHA1: e400fa3b456faf8afe0ee5bfe572946b4921a3db
Git date: Sat Jul 15 04:21:50 2023
Cuda used to build k2: 11.7
cuDNN used to build k2: 
Python version used to build k2: 3.10
OS used to build k2: CentOS Linux release 7.9.2009 (Core)
CMake version: 3.26.4
GCC version: 9.3.1
CMAKE_CUDA_FLAGS:  -Wno-deprecated-gpu-targets   -lineinfo --expt-extended-lambda -use_fast_math -Xptxas=-w  --expt-extended-lambda -gencode arch=compute_35,code=sm_35  -lineinfo --expt-extended-lambda -use_fast_math -Xptxas=-w  --expt-extended-lambda -gencode arch=compute_50,code=sm_50  -lineinfo --expt-extended-lambda -use_fast_math -Xptxas=-w  --expt-extended-lambda -gencode arch=compute_60,code=sm_60  -lineinfo --expt-extended-lambda -use_fast_math -Xptxas=-w  --expt-extended-lambda -gencode arch=compute_61,code=sm_61  -lineinfo --expt-extended-lambda -use_fast_math -Xptxas=-w  --expt-extended-lambda -gencode arch=compute_70,code=sm_7

### Install lhotse
[lhotse][lhotse] is used for data preparation.

[lhotse]: https://github.com/lhotse-speech/lhotse

Normally, we would use `pip install lhotse`. However, the yesno recipe is added recently and has not been released to PyPI yet, so we install the latest unreleased version here.

In [7]:
# ! pip install lhotse
! pip install git+https://github.com/lhotse-speech/lhotse

Collecting git+https://github.com/lhotse-speech/lhotse
  Cloning https://github.com/lhotse-speech/lhotse to /tmp/pip-req-build-8zqkao5p
  Running command git clone --filter=blob:none --quiet https://github.com/lhotse-speech/lhotse /tmp/pip-req-build-8zqkao5p
  Resolved https://github.com/lhotse-speech/lhotse to commit 1b68036d20e5a674c45c01d7719ddacc2d6742ac
  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
Collecting cytoolz>=0.10.1 (from lhotse==1.23.0.dev0+git.1b68036.clean)
  Downloading cytoolz-0.12.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.0 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.0/2.0 MB[0m [31m13.2 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting intervaltree>=3.1.0 (from lhotse==1.23.0.dev0+git.1b68036.clean)
  Downloading intervaltree-3.1.0.tar.gz (32 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdo

### Install icefall

[icefall][icefall] is a collection of Python scripts.
You don't need to install it. What you need to do is
to get its source code, install its dependencies, and
set the `PYTHONPATH` pointing to it.

[icefall]: https://github.com/k2-fsa/icefall

In [8]:
! pwd

/content


In [9]:
! git clone https://github.com/k2-fsa/icefall

Cloning into 'icefall'...
remote: Enumerating objects: 17594, done.[K
remote: Counting objects: 100% (1110/1110), done.[K
remote: Compressing objects: 100% (653/653), done.[K
remote: Total 17594 (delta 594), reused 756 (delta 358), pack-reused 16484[K
Receiving objects: 100% (17594/17594), 18.82 MiB | 18.07 MiB/s, done.
Resolving deltas: 100% (11870/11870), done.


Now install dependencies of `icefall`:

In [10]:
! cd icefall && \
  pip install -r requirements.txt

Collecting kaldifst>1.7.0 (from -r requirements.txt (line 1))
  Downloading kaldifst-1.7.10-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (9.5 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m9.5/9.5 MB[0m [31m32.4 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting kaldilm (from -r requirements.txt (line 2))
  Downloading kaldilm-1.15.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.0 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.0/1.0 MB[0m [31m20.8 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting kaldialign (from -r requirements.txt (line 3))
  Downloading kaldialign-0.9.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (91 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m91.8/91.8 kB[0m [31m13.2 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting num2words (from -r requirements.txt (line 4))
  Downloading num2words-0.5.13-py3-none-any.whl (143 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━

## Data preparation

We have set up the environment. Now it is the time to prepare data for training and decoding.

As we just said, `icefall` is a collection of Python scripts and we have to set up the `PYTHONPATH` variable to use it. Remember that `icefall` was downloaded to
`/content/icefall`, so we use

```
export PYTHONPATH=/content/icefall:$PYTHONPATH
```

**HINT**: You can have several versions of `icefall` in your virtual environemnt. To switch to a specific version of `icefall`, just change the `PYTHONPATH` environment variable.

In [11]:
# To remove the following warning message
# 2023-07-27 05:03:07.156920: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
! pip uninstall -y tensorflow

Found existing installation: tensorflow 2.15.0
Uninstalling tensorflow-2.15.0:
  Successfully uninstalled tensorflow-2.15.0


In [12]:
! export PYTHONPATH=/content/icefall:$PYTHONPATH && \
  cd /content/icefall/egs/yesno/ASR && \
  rm -rf data && \
  ./prepare.sh

2024-04-14 11:35:30 (prepare.sh:27:main) dl_dir: /content/icefall/egs/yesno/ASR/download
2024-04-14 11:35:30 (prepare.sh:30:main) Stage 0: Download data
/content/icefall/egs/yesno/ASR/download/waves_yesno.tar.gz: 100% 4.70M/4.70M [00:00<00:00, 42.1MB/s]
2024-04-14 11:35:33 (prepare.sh:39:main) Stage 1: Prepare yesno manifest
2024-04-14 11:35:37 (prepare.sh:45:main) Stage 2: Compute fbank for yesno
2024-04-14 11:35:43,568 INFO [compute_fbank_yesno.py:65] Processing train
Extracting and storing features: 100% 90/90 [00:00<00:00, 172.64it/s]
2024-04-14 11:35:44,114 INFO [compute_fbank_yesno.py:65] Processing test
Extracting and storing features: 100% 30/30 [00:00<00:00, 298.40it/s]
2024-04-14 11:35:45 (prepare.sh:51:main) Stage 3: Prepare lang
2024-04-14 11:35:50,878 INFO [prepare_lang_fst.py:174] Building standard CTC topology
2024-04-14 11:35:50,879 INFO [prepare_lang_fst.py:183] Building L
2024-04-14 11:35:50,879 INFO [prepare_lang_fst.py:191] Building HL
2024-04-14 11:35:50,880 INFO [

## Training

In [27]:
! export PYTHONPATH=/content/icefall:$PYTHONPATH && \
  cd /content/icefall/egs/yesno/ASR && \
  ./tdnn/train.py

2024-04-14 11:45:36,621 INFO [train.py:481] Training started
2024-04-14 11:45:36,621 INFO [train.py:482] {'exp_dir': PosixPath('tdnn/exp'), 'lang_dir': PosixPath('data/lang_phone'), 'lr': 0.01, 'feature_dim': 23, 'weight_decay': 1e-06, 'start_epoch': 0, 'best_train_loss': inf, 'best_valid_loss': inf, 'best_train_epoch': -1, 'best_valid_epoch': -1, 'batch_idx_train': 0, 'log_interval': 10, 'reset_interval': 20, 'valid_interval': 10, 'beam_size': 10, 'reduction': 'sum', 'use_double_scores': True, 'world_size': 1, 'master_port': 12354, 'tensorboard': True, 'num_epochs': 15, 'seed': 42, 'feature_dir': PosixPath('data/fbank'), 'max_duration': 30.0, 'bucketing_sampler': False, 'num_buckets': 10, 'concatenate_cuts': False, 'duration_factor': 1.0, 'gap': 1.0, 'on_the_fly_feats': False, 'shuffle': False, 'return_cuts': True, 'num_workers': 2, 'env_info': {'k2-version': '1.24.3', 'k2-build-type': 'Release', 'k2-with-cuda': True, 'k2-git-sha1': 'e400fa3b456faf8afe0ee5bfe572946b4921a3db', 'k2-git-

## Decoding

In [28]:
! export PYTHONPATH=/content/icefall:$PYTHONPATH && \
  cd /content/icefall/egs/yesno/ASR && \
  ./tdnn/decode.py

2024-04-14 11:46:14,729 INFO [decode.py:262] Decoding started
2024-04-14 11:46:14,729 INFO [decode.py:263] {'exp_dir': PosixPath('tdnn/exp'), 'lang_dir': PosixPath('data/lang_phone'), 'feature_dim': 23, 'search_beam': 20, 'output_beam': 8, 'min_active_states': 30, 'max_active_states': 10000, 'use_double_scores': True, 'epoch': 14, 'avg': 2, 'export': False, 'feature_dir': PosixPath('data/fbank'), 'max_duration': 30.0, 'bucketing_sampler': False, 'num_buckets': 10, 'concatenate_cuts': False, 'duration_factor': 1.0, 'gap': 1.0, 'on_the_fly_feats': False, 'shuffle': False, 'return_cuts': True, 'num_workers': 2, 'env_info': {'k2-version': '1.24.3', 'k2-build-type': 'Release', 'k2-with-cuda': True, 'k2-git-sha1': 'e400fa3b456faf8afe0ee5bfe572946b4921a3db', 'k2-git-date': 'Sat Jul 15 04:21:50 2023', 'lhotse-version': '1.23.0.dev+git.1b68036.clean', 'torch-version': '2.0.1+cu117', 'torch-cuda-available': True, 'torch-cuda-version': '11.7', 'python-version': '3.1', 'icefall-git-branch': 'maste

### Show the decoding result

In [29]:
! cd /content/icefall/egs/yesno/ASR && \
  cat tdnn/exp/recogs-test_set.txt

0_0_0_1_0_0_0_1-0:	ref=['NO', 'NO', 'NO', 'YES', 'NO', 'NO', 'NO', 'YES']
0_0_0_1_0_0_0_1-0:	hyp=['NO', 'NO', 'NO', 'YES', 'NO', 'NO', 'NO', 'YES']
0_0_1_0_0_0_1_0-1:	ref=['NO', 'NO', 'YES', 'NO', 'NO', 'NO', 'YES', 'NO']
0_0_1_0_0_0_1_0-1:	hyp=['NO', 'NO', 'YES', 'NO', 'NO', 'NO', 'YES', 'NO']
0_0_1_0_0_1_1_1-2:	ref=['NO', 'NO', 'YES', 'NO', 'NO', 'YES', 'YES', 'YES']
0_0_1_0_0_1_1_1-2:	hyp=['NO', 'NO', 'YES', 'NO', 'NO', 'YES', 'YES', 'YES']
0_0_1_0_1_0_0_1-3:	ref=['NO', 'NO', 'YES', 'NO', 'YES', 'NO', 'NO', 'YES']
0_0_1_0_1_0_0_1-3:	hyp=['NO', 'NO', 'YES', 'NO', 'YES', 'NO', 'NO', 'YES']
0_0_1_1_0_0_0_1-4:	ref=['NO', 'NO', 'YES', 'YES', 'NO', 'NO', 'NO', 'YES']
0_0_1_1_0_0_0_1-4:	hyp=['NO', 'NO', 'YES', 'YES', 'NO', 'NO', 'NO', 'YES']
0_0_1_1_0_1_1_0-5:	ref=['NO', 'NO', 'YES', 'YES', 'NO', 'YES', 'YES', 'NO']
0_0_1_1_0_1_1_0-5:	hyp=['NO', 'NO', 'YES', 'YES', 'NO', 'YES', 'YES', 'NO']
0_0_1_1_1_0_0_0-6:	ref=['NO', 'NO', 'YES', 'YES', 'YES', 'NO', 'NO', 'NO']
0_0_1_1_1_0_0_0-6:	hyp=['

### Show the detailed WER

In [30]:
! cd /content/icefall/egs/yesno/ASR && \
  cat tdnn/exp/errs-test_set.txt

%WER = 0.42
Errors: 0 insertions, 1 deletions, 0 substitutions, over 240 reference words (239 correct)
Search below for sections starting with PER-UTT DETAILS:, SUBSTITUTIONS:, DELETIONS:, INSERTIONS:, PER-WORD STATS:

PER-UTT DETAILS: corr or (ref->hyp)  
0_0_0_1_0_0_0_1-0:	NO NO NO YES NO NO NO YES
0_0_1_0_0_0_1_0-1:	NO NO YES NO NO NO YES NO
0_0_1_0_0_1_1_1-2:	NO NO YES NO NO YES YES YES
0_0_1_0_1_0_0_1-3:	NO NO YES NO YES NO NO YES
0_0_1_1_0_0_0_1-4:	NO NO YES YES NO NO NO YES
0_0_1_1_0_1_1_0-5:	NO NO YES YES NO YES YES NO
0_0_1_1_1_0_0_0-6:	NO NO YES YES YES NO NO NO
0_0_1_1_1_1_0_0-7:	NO NO YES YES YES YES NO NO
0_1_0_0_0_1_0_0-8:	NO YES NO NO NO YES NO NO
0_1_0_0_1_0_1_0-9:	NO YES NO NO YES NO YES NO
0_1_0_1_0_0_0_0-10:	NO YES NO YES NO NO NO (NO->*)
0_1_0_1_1_1_0_0-11:	NO YES NO YES YES YES NO NO
0_1_1_0_0_1_1_1-12:	NO YES YES NO NO YES YES YES
0_1_1_1_0_0_1_0-13:	NO YES YES YES NO NO YES NO
0_1_1_1_1_0_1_0-14:	NO YES YES YES YES NO YES NO
1_0_0_0_0_0_0_0-15:	YES NO NO NO NO NO