This repository implements the full pipeline required for the HSE DLA 2025 ASR homework. It builds on the official PyTorch template and extends it.
Python: 3.12 (required by Torch 2.7 which is used on Blackwell GPUs) Package manager: uv
Clone the repository and install all dependencies:
uv syncThe repository expects the following resources, all downloadable via notebook or CLI commands:
- Trained checkpoints (hosted on HuggingFace):
- Final model:
https://huggingface.co/aspisov/asr/resolve/main/model_best-193347.pth
- Final model:
- KenLM language model + vocabulary:
https://openslr.org/resources/11/3-gram.arpa.gzhttps://openslr.org/resources/11/librispeech-vocab.txt
Place them under saved/ or run the commands embedded in the demo notebook (see below).
The main schedule trains on LibriSpeech combined splits:
uv run python3 train.py -cn=train.yaml# test-clean
uv run python3 inference.py -cn=inference_test_clean.yaml inferencer.from_pretrained=path/to/checkpoint.pth
# test-other
uv run python3 inference.py -cn=inference_test_other.yaml inferencer.from_pretrained=path/to/checkpoint.pthTo evaluate on an arbitrary folder structured as required by the homework (audio + optional transcriptions), run:
uv run python3 inference.py -cn=inference_custom.yaml inferencer.from_pretrained=path/to/checkpoint.pth custom_dataset.dataset_root=path/to/datasetnotebooks/demo_notebook.ipynb is designed to run on Google Colab from a clean environment. It covers:
- Cloning the repo and installing dependencies via
uv - Downloading checkpoints, KenLM files, and vocab
- Running inference on LibriSpeech custom dataset as well as LibriSpeech test-clean and test-other
| Experiment | Checkpoint | CER (test-clean) | WER (test-clean) | CER (test-other) | WER (test-other) |
|---|---|---|---|---|---|
| Full training (20M parameters conformer) | model_best-193347.pth | 5.2 | 14.27 | 15.02 | 35.04 |
Logging is done via CometML. You can find the report here.