Normatex - Russian text normalization
This is a set of Finite-State Transducers (FSTs) for normalization of Russian texts for speech synthesis, machine translation and other natural language processing tasks.
The FSTs are developed using Unitex, a corpus processor.
To normalize a Russian text:
- Copy your text (e.g.
Corpusfolder, open it in Unitex and preprocess it with following resources:
Graphs/Preprocessing/Sentence/SentenceUniver.grfin MERGE mode
Graphs/Preprocessing/Replace/replace.grfin REPLACE mode
- Apply lexical resources:
- the full version of the Russian computational morphological dictionary developed at CIS, Munich:
- Create a cascade (
Text\Apply CasSys Cascade...menu,
New) to sequentially apply the following FSTs to your text in REPLACE mode:
- Launch the cascade of FSTs.
- The normalized text is in