Automatic Speech Recognition Homework

Overview

This repository implements the full pipeline required for the HSE DLA 2025 ASR homework. It builds on the official PyTorch template and extends it.

Environment & Installation

Python: 3.12 (required by Torch 2.7 which is used on Blackwell GPUs) Package manager: uv

Clone the repository and install all dependencies:

uv sync

Artifacts

The repository expects the following resources, all downloadable via notebook or CLI commands:

Trained checkpoints (hosted on HuggingFace):
- Final model: https://huggingface.co/aspisov/asr/resolve/main/model_best-193347.pth
KenLM language model + vocabulary:
- https://openslr.org/resources/11/3-gram.arpa.gz
- https://openslr.org/resources/11/librispeech-vocab.txt

Place them under saved/ or run the commands embedded in the demo notebook (see below).

Training

The main schedule trains on LibriSpeech combined splits:

uv run python3 train.py -cn=train.yaml

Evaluation & Inference

LibriSpeech evaluation scripts

# test-clean
uv run python3 inference.py -cn=inference_test_clean.yaml inferencer.from_pretrained=path/to/checkpoint.pth

# test-other
uv run python3 inference.py -cn=inference_test_other.yaml inferencer.from_pretrained=path/to/checkpoint.pth

Custom directory inference

To evaluate on an arbitrary folder structured as required by the homework (audio + optional transcriptions), run:

uv run python3 inference.py -cn=inference_custom.yaml inferencer.from_pretrained=path/to/checkpoint.pth custom_dataset.dataset_root=path/to/dataset

Demo Notebook

notebooks/demo_notebook.ipynb is designed to run on Google Colab from a clean environment. It covers:

Cloning the repo and installing dependencies via uv
Downloading checkpoints, KenLM files, and vocab
Running inference on LibriSpeech custom dataset as well as LibriSpeech test-clean and test-other

Experiments & Results

Experiment	Checkpoint	CER (test-clean)	WER (test-clean)	CER (test-other)	WER (test-other)
Full training (20M parameters conformer)	model_best-193347.pth	5.2	14.27	15.02	35.04

Logging

Logging is done via CometML. You can find the report here.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
notebooks		notebooks
src		src
.flake8		.flake8
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
inference.py		inference.py
pyproject.toml		pyproject.toml
train.py		train.py
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Automatic Speech Recognition Homework

Overview

Environment & Installation

Artifacts

Training

Evaluation & Inference

LibriSpeech evaluation scripts

Custom directory inference

Demo Notebook

Experiments & Results

Logging

License

About

Uh oh!

Releases

Packages

Languages

License

aspisov/asr

Folders and files

Latest commit

History

Repository files navigation

Automatic Speech Recognition Homework

Overview

Environment & Installation

Artifacts

Training

Evaluation & Inference

LibriSpeech evaluation scripts

Custom directory inference

Demo Notebook

Experiments & Results

Logging

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages