Skip to content

Implementing text normalization for Farsi(Persian) language.

Notifications You must be signed in to change notification settings

Amir79Naziri/TextNormalization_Project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

43 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Text Normalizer

This project implements text normalization for Farsi(Persian) language.

it contains types below :

  • normalizing numbers
  • normalizing dates
  • normalizing times
  • normalizing currency
  • normalizing measurement (physical measurement)
  • normalizing phone number and ID number
  • normalizing punctuation
  • normalizing miscellaneous abbreviations

for text-to-speech and speech-to-text (TTSv1(default) , TTSv2, STT).

Usage

python main.py [input file address] [output file address] [version] [type1, type2, ....]

examples

normalize all types for text-to-speech version 1

python main.py inp.txt out.txt 

normalize time and date for speech-to-text

python main.py inp.txt out.txt TTSv2 -t -d
  1. by declaring a type the normalizer Limited to the declared type !
  2. The difference between TTS version 1 and TTS version 2 is in the way the punctuations are normalized

Releases

No releases published

Packages

No packages published

Languages