No description, website, or topics provided.
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Failed to load latest commit information.

Normatex - Russian text normalization

This is a set of Finite-State Transducers (FSTs) for normalization of Russian texts for speech synthesis, machine translation and other natural language processing tasks.

The FSTs are developed using Unitex, a corpus processor.

To normalize a Russian text:

  1. Copy your text (e.g. example.txt) to Corpus folder, open it in Unitex and preprocess it with following resources:
  • apply Graphs/Preprocessing/Sentence/SentenceUniver.grf in MERGE mode
  • apply Graphs/Preprocessing/Replace/replace.grf in REPLACE mode
  1. Apply lexical resources:
  1. Create a cascade (Text\Apply CasSys Cascade... menu, New) to sequentially apply the following FSTs to your text in REPLACE mode:
  • Graphs/numbers.fst2
  • Graphs/abbr/abbr_w.fst2
  • Graphs/abbr/acronyms_w.fst2
  • Graphs/Postprocessing/replace.fst2
  1. Launch the cascade of FSTs.
  2. The normalized text is in Corpus/example_csc/example_4_0.snt.