Skip to content
master
Go to file
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 
 
 
 
 
 
 
 
 

README.md

Normatex - Russian text normalization

This is a set of Finite-State Transducers (FSTs) for normalization of Russian texts for speech synthesis, machine translation and other natural language processing tasks.

The FSTs are developed using Unitex, a corpus processor.

To normalize a Russian text:

  1. Copy your text (e.g. example.txt) to Corpus folder, open it in Unitex and preprocess it with following resources:
  • apply Graphs/Preprocessing/Sentence/SentenceUniver.grf in MERGE mode
  • apply Graphs/Preprocessing/Replace/replace.grf in REPLACE mode
  1. Apply lexical resources:
  1. Create a cascade (Text\Apply CasSys Cascade... menu, New) to sequentially apply the following FSTs to your text in REPLACE mode:
  • Graphs/numbers.fst2
  • Graphs/abbr/abbr_w.fst2
  • Graphs/abbr/acronyms_w.fst2
  • Graphs/Postprocessing/replace.fst2
  1. Launch the cascade of FSTs.
  2. The normalized text is in Corpus/example_csc/example_4_0.snt.

Slides

About

No description, website, or topics provided.

Resources

License

Releases

No releases published

Packages

No packages published
You can’t perform that action at this time.