Skip to content

Commit

Permalink
Add initial doc for text_normalization
Browse files Browse the repository at this point in the history
  • Loading branch information
Tuan Lai committed Jun 28, 2021
1 parent 4c53304 commit 86c8cb4
Showing 1 changed file with 15 additions and 0 deletions.
15 changes: 15 additions & 0 deletions docs/source/nlp/text_normalization.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
.. _text_normalization:

Text Normalization Models
==========================
Text normalization is the task of converting a written text into its spoken form. For example,
``$123`` should be verbalized as ``one hundred twenty three dollars``, while ``123 King Ave``
should be verbalized as ``one twenty three King Avenue``. Text normalization is typically used as
a pre-processing step for a range of speech application such as text-to-speech synthesis (TTS).

Data format
------------------

The data needs to be stored in TAB separated files (.tsv) with three columns, the first of which
is the "semiotic class", the second is the input token and the third is the output. An example can
be the dataset used in the `Google Text Normalization Challenge <https://www.kaggle.com/google-nlu/text-normalization>`_.

0 comments on commit 86c8cb4

Please sign in to comment.