Skip to content

annabonazzi/yiddish

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 

Repository files navigation

yiddish

Script to transliterate Yiddish text into Latin (English) alphabet. The script takes in text typed by the user and returns a transliterated version (Hebrew alphabet -> Latin alphabet).

A user-friendly web-based version of this script based on Trinket (https://trinket.io) is available at this website: http://tinyurl.com/yiddish-reader

yiddish_main.py does the input, transliteration, and output tasks.

yiddish_fix_issues.py does the most text-intensive part of the job, which is:

  • Editing the transliterated text
  • Fixing common issues deriving from transliteration mistakes
  • Completely changing Hebrew words commonly used in Yiddish which make no sense when transliterated directly (e.g. 'Sholem Aleykhem' would actually look like 'shlum elikhm' if transliterated directly). NB: This is a transliterator, not a translator.

Main problem to be addressed: the current code doesn't know how to deal with diacritic marks, i.e. the tiny dots/lines above or under Hebrew letters. It skips them entirely. This means that the program is unable to tell the difference between these letters:

  • A, O and silent A (אַ, אָ, א) are all considered A
  • F and P (פּ, פֿ) are both considered F
  • AY and EY (יי, ײַ) are both considered EY.

The temporary solution is a long list of conditions intoduced to deal with the most common mistakes / ambiguities (e.g. 'babe' [grandma] should actully be 'bobe', so it is automaticaly changed to 'bobe'). Although this process does catch most common situations, it's still full of gaps.

Next steps:

  • Try and fix this issue at the root by getting the script to recognize diacritics.
  • Introducing a larger dictionary of Hebrew words used in Yiddish which have their own peculiar transliteration

About

Yiddish transliteration tool

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages