Skip to content

A simple, deterministic, and extensible approach to inverse text normalization (ITN) for numbers.

License

Notifications You must be signed in to change notification settings

barseghyanartur/itnpy

Repository files navigation

Inverse Text Normalization

PyPI Version Supported Python versions Build Status Documentation Status MIT Coverage

A simple, deterministic, and extensible approach to inverse text normalization (ITN) for numbers.

Overview

This package converts raw spoken-form text (speech recognition output) into user-friendly written-form text. It works best for converting spoken numbers into numerical digits, or other translation tasks that do not modify word ordering. A csv file is provided to define the basic rules for transforming spoken tokens into written tokens, and extra pre/post-processing may be applied for more specific formatting requirements, i.e. dates, measurements, money, etc.


Terminal

These examples were produced by running this script.

Installation

This package supports Python versions >= 3.7

To install from PyPI:

pip install itnpy2

To install locally:

pip install -e .

Tests

To run tests, use pytest in the root folder of this repository:

pytest

Issues

This package has been verified on a limited set of test-cases. For any translation mistakes, feel free to open a pull request and update failing.csv with the input, expected output, and mistake; thanks!

Citation

If you find this work useful, please consider citing it.

@misc{hsu2022itn,
  title        = {A simple, deterministic, and extensible approach to inverse text normalization for numbers},
  author       = {Brandhsu},
  howpublished = {https://github.com/barseghyanartur/itnpy},
  year         = {2022}
}

About

A simple, deterministic, and extensible approach to inverse text normalization (ITN) for numbers.

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published