scribe-eval is the open-source evaluation framework introduced in the SCRIBE
paper (Diagnostic Evaluation and Rich Transcription Models for Indic ASR,
accepted at Interspeech 2026). It provides fine-grained error metrics for
ASR systems on Indic languages (Malayalam, Kannada, Hindi, ...) and on
domain-specific transcription (legal, medical, technical).
Token categories are decomposed into base classes (WORD, NUMERAL, PUNCT) and optional domain classes (LEGAL, MEDICAL, TECH, or custom). Domain-critical terminology is shielded from incorrect splitting and tracked separately — so a single misrecognized legal term doesn't inflate your general WER.
scribe-eval is not yet on PyPI — install from source for now (pip install scribe-eval will work once published):
git clone https://github.com/adalat-ai-tech/scribe-eval.git
cd scribe-eval
pip install -e . # core library
pip install -e '.[visualizer]' # adds Streamlit UI
pip install -e '.[charts]' # adds matplotlib chartsfrom scribe import text_error_rates, DomainConfig
ref = "charged u/s 302 IPC on 22.05.2023"
hyp = "charged u/s 303 IPC on 22.05.2023"
report = text_error_rates(ref, hyp, DomainConfig.legal())
print(f"WER: {report['WORD']['error_rate']:.2%}")
print(f"LER: {report['LEGAL']['error_rate']:.2%}")
print(f"NER: {report['NUMERAL']['error_rate']:.2%}")- Domain-aware tokenization — shield domain terms from punctuation splitting; track errors separately
- Sandhi correction detection — identifies merged/split words common in Indic ASR
- Normalized error rates — combined denominator prevents misleading metrics for sparse categories
- Batch evaluation — process JSONL files with per-sample detail and dataset-level aggregation
- Interactive visualizer — Streamlit UI with color-coded alignment, TER/Accuracy metric tiles, category breakdown chart, frequent-error tables, and per-sample drill-down
| Category | Type | Label | Description |
|---|---|---|---|
| WORD | base | WER | General words (Indic and English) |
| NUMERAL | base | NER | Numbers, dates, times (302, 22.05.2023, 10:30) |
| PUNCT | base | PER | Punctuation marks |
| LEGAL | domain | LER | Indian legal terminology (u/s, r/w, PW1, Ext.A) |
| MEDICAL | domain | MER | Medical units and dosages (mg, ml, 500mg) |
| TECH | domain | TchER | Technical abbreviations (API, SDK, v1.0) |
| Custom | domain | configurable | Define your own with lists or regex patterns |
Factory methods for bundled domains: DomainConfig.legal(), DomainConfig.medical(), DomainConfig.technical()
File-based and custom inline configs are also supported. See docs/domain-configuration.md.
Runnable scripts under examples/ demonstrate alignment,
single-sample reports, domain-config patterns, and full batch evaluation.
See examples/README.md for the full index.
uv run examples/batch_evaluate.py --analysis --chartSee docs/batch-processing.md for the Python API, CLI arguments, and output schema.
scribe-visualizer # requires the [visualizer] extra (see Installation)See docs/visualizer.md.
Core: jiwer>=4.0.0, levenshtein>=0.27.1, tabulate>=0.9.0
Optional extras: matplotlib (for [charts]), streamlit and pandas (for [visualizer]).
git clone https://github.com/adalat-ai-tech/scribe-eval.git
cd scribe-eval
uv sync --all-extras --dev # core + [charts] + [visualizer] + [dev]uv run pytest # full suite
uv run pytest tests/test_analysis.py # one file
uv run pytest -k sandhi # name pattern (-k matches by substring)
uv run pytest -v # verbose, with each test name
uv run pytest --cov=scribe # with coverageTests are organised one file per library module under tests/, plus
tests/test_paper_cases.py for end-to-end golden
cases from the SCRIBE paper. pytest itself ships with the [dev] extra, so
uv sync --all-extras --dev (above) is required first.
uv run ruff check src tests examples # lint
uv run ruff format src tests examples # auto-formatSee docs/architecture.md for the module map and a
glossary of project-specific terminology (sandhi, combined denominator, TER,
Accuracy, ...).
The SCRIBE paper is accepted at Interspeech 2026. A preprint is available on arXiv: https://arxiv.org/abs/2605.20712
@article{manohar2026scribe,
title={SCRIBE: Diagnostic Evaluation and Rich Transcription Models for Indic ASR},
author={Manohar, Kavya and Bhattacharya, Arghya and Juvekar, Kush and Nethil, Kumarmanas},
journal={arXiv preprint arXiv:2605.20712},
year={2026}
}To cite the software itself, use the "Cite this repository" button on GitHub (see CITATION.cff) or the Zenodo DOI: 10.5281/zenodo.20765419.
Licensed under the Apache License 2.0.
Developed as part of the Adalat AI initiative for Indic language technologies.