# Demo Notebook

## Setting link to dataset

Let's use sample dataset with correct format to demonstrate how inference pipeline is launched. Sample dataset is stored in GDrive storage in **.zip** format. 

In [None]:
YOUR_DATASET_LINK="https://drive.google.com/file/d/1vqqNkDrv6rsc83QQmDPXbLW94KvTqXxT/view?usp=sharing"

Export link to env for convenience.

In [None]:
%env DATASET=$YOUR_DATASET_LINK

env: DATASET=https://drive.google.com/drive/folders/1hFGVck42sjMV8v65LGRXk2AZDdFJgk1d?usp=sharing


## Downloading and installing requirements

Clone repository with code.

In [None]:
!git clone https://github.com/ayazvaliev/asr_hw.git

Change CWD in notebook to repo's root dir.

In [None]:
import os
os.chdir('asr_hw')

Install all required libraries.

In [None]:
!python -m pip install -r requirements.txt

Load all required resources for inference. To read more about script's arguments use:
```
python load_resources.py --help
```

In [None]:
!python load_resources.py --output output_dir --inference_only

## Inference

Launch inference. You need to specifiy following parameters in command line:
- data_dir - data dir in correct format (audios are located in `data_dir/audio`, transcriptions (optionally) are located in `data_dir/transcriptions`)
- inferencer.save_path - dir for saving predictions. Predictions are saved in .txt with names corresponding to audiofiles' names.
- inferencer.from_pretrained - path to checkpoint, that'll be used for inference. The checkpoint is fetched in `load_resources.py` script.
- lm_guidance_dir - dir for enabling LM Guided decoding. All necessary utilities for this are fetched in `load_resources.py` script.

In [None]:
!python inference.py 'data_dir=${oc.env:DATASET}' inferencer.save_path=saved_preds inferencer.from_pretrained=output_dir/ckpt/model_best.pth lm_guidance_dir=output_dir/lm_guidance

Optionally calculate and log CER/WER metrics on ground truth for recieved predictions. To read more about script's arguments use:
```
python calc_metrics.py --help
```

In [None]:
!python calc_metrics.py --predictions saved_preds --ground_truth sample_dataset/transcriptions