Skip to content

adalat-ai-tech/scribe-eval

Repository files navigation

SCRIBE — Diagnostic Evaluation for Indic & Domain-Specific ASR

Python License DOI

scribe-eval is the open-source evaluation framework introduced in the SCRIBE paper (Diagnostic Evaluation and Rich Transcription Models for Indic ASR, accepted at Interspeech 2026). It provides fine-grained error metrics for ASR systems on Indic languages (Malayalam, Kannada, Hindi, ...) and on domain-specific transcription (legal, medical, technical).

Token categories are decomposed into base classes (WORD, NUMERAL, PUNCT) and optional domain classes (LEGAL, MEDICAL, TECH, or custom). Domain-critical terminology is shielded from incorrect splitting and tracked separately — so a single misrecognized legal term doesn't inflate your general WER.

Installation

scribe-eval is not yet on PyPI — install from source for now (pip install scribe-eval will work once published):

git clone https://github.com/adalat-ai-tech/scribe-eval.git
cd scribe-eval
pip install -e .                 # core library
pip install -e '.[visualizer]'   # adds Streamlit UI
pip install -e '.[charts]'       # adds matplotlib charts

Quick Start

from scribe import text_error_rates, DomainConfig

ref = "charged u/s 302 IPC on 22.05.2023"
hyp = "charged u/s 303 IPC on 22.05.2023"

report = text_error_rates(ref, hyp, DomainConfig.legal())

print(f"WER: {report['WORD']['error_rate']:.2%}")
print(f"LER: {report['LEGAL']['error_rate']:.2%}")
print(f"NER: {report['NUMERAL']['error_rate']:.2%}")

Features

  • Domain-aware tokenization — shield domain terms from punctuation splitting; track errors separately
  • Sandhi correction detection — identifies merged/split words common in Indic ASR
  • Normalized error rates — combined denominator prevents misleading metrics for sparse categories
  • Batch evaluation — process JSONL files with per-sample detail and dataset-level aggregation
  • Interactive visualizer — Streamlit UI with color-coded alignment, TER/Accuracy metric tiles, category breakdown chart, frequent-error tables, and per-sample drill-down

Token Categories

Category Type Label Description
WORD base WER General words (Indic and English)
NUMERAL base NER Numbers, dates, times (302, 22.05.2023, 10:30)
PUNCT base PER Punctuation marks
LEGAL domain LER Indian legal terminology (u/s, r/w, PW1, Ext.A)
MEDICAL domain MER Medical units and dosages (mg, ml, 500mg)
TECH domain TchER Technical abbreviations (API, SDK, v1.0)
Custom domain configurable Define your own with lists or regex patterns

Domain Configuration

Factory methods for bundled domains: DomainConfig.legal(), DomainConfig.medical(), DomainConfig.technical()

File-based and custom inline configs are also supported. See docs/domain-configuration.md.

Examples

Runnable scripts under examples/ demonstrate alignment, single-sample reports, domain-config patterns, and full batch evaluation. See examples/README.md for the full index.

Batch Processing

uv run examples/batch_evaluate.py --analysis --chart

See docs/batch-processing.md for the Python API, CLI arguments, and output schema.

Interactive Visualizer

scribe-visualizer    # requires the [visualizer] extra (see Installation)

See docs/visualizer.md.

Dependencies

Core: jiwer>=4.0.0, levenshtein>=0.27.1, tabulate>=0.9.0

Optional extras: matplotlib (for [charts]), streamlit and pandas (for [visualizer]).

Development

git clone https://github.com/adalat-ai-tech/scribe-eval.git
cd scribe-eval
uv sync --all-extras --dev    # core + [charts] + [visualizer] + [dev]

Running tests

uv run pytest                              # full suite
uv run pytest tests/test_analysis.py       # one file
uv run pytest -k sandhi                    # name pattern (-k matches by substring)
uv run pytest -v                           # verbose, with each test name
uv run pytest --cov=scribe                 # with coverage

Tests are organised one file per library module under tests/, plus tests/test_paper_cases.py for end-to-end golden cases from the SCRIBE paper. pytest itself ships with the [dev] extra, so uv sync --all-extras --dev (above) is required first.

Lint and format

uv run ruff check src tests examples       # lint
uv run ruff format src tests examples      # auto-format

See docs/architecture.md for the module map and a glossary of project-specific terminology (sandhi, combined denominator, TER, Accuracy, ...).

Citation

The SCRIBE paper is accepted at Interspeech 2026. A preprint is available on arXiv: https://arxiv.org/abs/2605.20712

@article{manohar2026scribe,
  title={SCRIBE: Diagnostic Evaluation and Rich Transcription Models for Indic ASR},
  author={Manohar, Kavya and Bhattacharya, Arghya and Juvekar, Kush and Nethil, Kumarmanas},
  journal={arXiv preprint arXiv:2605.20712},
  year={2026}
}

To cite the software itself, use the "Cite this repository" button on GitHub (see CITATION.cff) or the Zenodo DOI: 10.5281/zenodo.20765419.

License

Licensed under the Apache License 2.0.

Acknowledgements

Developed as part of the Adalat AI initiative for Indic language technologies.

About

Error Rate Analysis tool for Dictation

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages