Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

epitran.exceptions.DatafileError: Header is ["Prth", "Phon"] instead of ["Orth", "Phon"]. #69

Closed
ItsSeaJay opened this issue Mar 10, 2021 · 3 comments

Comments

@ItsSeaJay
Copy link

I'm trying to use epitran to obtain the correct phonetic pronunciations of French words. I did get it working eventually through the use of the fra-Latn preprocessor, however its performance is lackluster. It seems to give me very literal translations, and ones that never use the uvular "ʁ" sound or the sound separating ".":

  • "acteur" ("actor") comes out as "atyr" (should be "ak.tœʁ")
  • "actrice" ("actress") comes out as "aktriz" (should be "ak.tʁis")
  • "chat" ("cat") comes out as "ʃa", which is correct, but at least one time when I tried it I got trailing symbols, like "ʃat"
  • "chien" ("dog") comes out as "ʃjâ" when it should be "ʃjɛ̃"

So after having mixed performance with that, I looked at the documentation and noticed there was a more phonetic translator "fra-Latn-np". Upon attempting to use this to translate any given word, I get the following error:

Traceback (most recent call last):
  File "main.py", line 6, in <module>
    epi = epitran.Epitran('fra-Latn-np')
  File "/home/callum/.local/share/virtualenvs/first625-xxpZk1TH/lib/python3.8/site-packages/epitran/_epitran.py", line 46, in __init__
    self.epi = SimpleEpitran(code, preproc, postproc, ligatures, rev, rev_preproc, rev_postproc, tones=tones)
  File "/home/callum/.local/share/virtualenvs/first625-xxpZk1TH/lib/python3.8/site-packages/epitran/simple.py", line 43, in __init__
    self.g2p = self._load_g2p_map(code, False)
  File "/home/callum/.local/share/virtualenvs/first625-xxpZk1TH/lib/python3.8/site-packages/epitran/simple.py", line 100, in _load_g2p_map
    raise DatafileError('Header is ["{}", "{}"] instead of ["Orth", "Phon"].'.format(orth, phon))
epitran.exceptions.DatafileError: Header is ["Prth", "Phon"] instead of ["Orth", "Phon"].

I'm not sure what causes it, but looking in that directory there is also an undocumented "fra-Lang-p" preprocessor, which does better at other times and worse than others. Could you please explain what is going on here?

Here is my code:

import sys
from google_trans_new import google_translator
import epitran

translator = google_translator()
epi = epitran.Epitran('fra-Latn-np')

# Translate the first system argument
#translated_text = translator.translate(sys.argv[1], lang_src='en', lang_tgt='fr')
# Get the IPA pronunciation
#ipa_symbols = epi.transliterate(translated_text)

#print(translated_text)
#print(ipa_symbols)
print(epi.transliterate(sys.argv[1]))
@dmort27
Copy link
Owner

dmort27 commented Mar 11, 2021

As is noted in the README, support for French is not very good. This is partly due to ambiguities in the French orthography and partly do to insufficient work being devoted to the modules. The use of /r/ rather than /ʁ/ is intentional, however. One use of Epitran, early in its history, but producing representations that were relatively close to etymologically related forms in other languages. Since French was historically /r/ (and still is in some dialects), the distance between French and other languages was reduced by treating in this way.

If you can provide me with more test cases, I can update 'fra-Latn' so it passes them.

@ItsSeaJay
Copy link
Author

I should have read the README file further before attempting what I was attempting the other day. I'm currently working on a test case module for the fr-Latn preprocessor that should be available at https://github.com/ItsSeaJay/epitran-fr-Latn-testcases
I should also mention that for my purposes I need to use uvular /ʁ/. The whole reason I'm working with this module is so I can automagically produce language flashcards for anki, and for that I need correct native pronunciations. I notice that in the README file you mention a downloadable bilingual or monolingual dictionary for chinese. Is there anything like that for the French language? Cheers.

@ftyers
Copy link
Contributor

ftyers commented Mar 24, 2021

@ItsSeaJay there is https://github.com/kylebgorman/wikipron

@dmort27 dmort27 closed this as completed May 9, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants