Skip to content

amasad/arabish

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Arabish (beta)

Arabic transliteration in Python. Similar to Yamli.com, Google Ta3reeb, and Microsoft Maren.

Why

Because there isn't an open source transliteration project available. And it's not that hard!
I'm sure with there are some corner cases that makes it harder and harder to reach the 100% accuracy but it seems it's fairly easy to get the 80%.

Approach

  1. Given a list of simple mappings between one or two english letters representing a single arabic letter
  2. Append to english letter keys in the mapping vowels to simply ignore the Harakaat.
  3. Given an english word phonatically representing an arabic word.
  4. Construct the set of all possible arabic words (valid or not) using a recursive search algorithm.
  5. Use word frequency to get the most likely word to occur out of the list.

Current state

I'm very pleased, even surprised with the initial results. With a better training corpus and some simple tweaking to the rules we can get at least up to 80% accuracy of Yamli or similar services. The current training corpus is a frequency list based on words from opensubtitles.org. And is mostly classical arabic.

See TODO.txt

About

Arabic Transliteration in Python

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages