epitran.exceptions.DatafileError: Header is ["Prth", "Phon"] instead of ["Orth", "Phon"]. #69

ItsSeaJay · 2021-03-10T12:50:22Z

I'm trying to use epitran to obtain the correct phonetic pronunciations of French words. I did get it working eventually through the use of the fra-Latn preprocessor, however its performance is lackluster. It seems to give me very literal translations, and ones that never use the uvular "ʁ" sound or the sound separating ".":

"acteur" ("actor") comes out as "atyr" (should be "ak.tœʁ")
"actrice" ("actress") comes out as "aktriz" (should be "ak.tʁis")
"chat" ("cat") comes out as "ʃa", which is correct, but at least one time when I tried it I got trailing symbols, like "ʃat"
"chien" ("dog") comes out as "ʃjâ" when it should be "ʃjɛ̃"

So after having mixed performance with that, I looked at the documentation and noticed there was a more phonetic translator "fra-Latn-np". Upon attempting to use this to translate any given word, I get the following error:

Traceback (most recent call last):
  File "main.py", line 6, in <module>
    epi = epitran.Epitran('fra-Latn-np')
  File "/home/callum/.local/share/virtualenvs/first625-xxpZk1TH/lib/python3.8/site-packages/epitran/_epitran.py", line 46, in __init__
    self.epi = SimpleEpitran(code, preproc, postproc, ligatures, rev, rev_preproc, rev_postproc, tones=tones)
  File "/home/callum/.local/share/virtualenvs/first625-xxpZk1TH/lib/python3.8/site-packages/epitran/simple.py", line 43, in __init__
    self.g2p = self._load_g2p_map(code, False)
  File "/home/callum/.local/share/virtualenvs/first625-xxpZk1TH/lib/python3.8/site-packages/epitran/simple.py", line 100, in _load_g2p_map
    raise DatafileError('Header is ["{}", "{}"] instead of ["Orth", "Phon"].'.format(orth, phon))
epitran.exceptions.DatafileError: Header is ["Prth", "Phon"] instead of ["Orth", "Phon"].

I'm not sure what causes it, but looking in that directory there is also an undocumented "fra-Lang-p" preprocessor, which does better at other times and worse than others. Could you please explain what is going on here?

Here is my code:

import sys
from google_trans_new import google_translator
import epitran

translator = google_translator()
epi = epitran.Epitran('fra-Latn-np')

# Translate the first system argument
#translated_text = translator.translate(sys.argv[1], lang_src='en', lang_tgt='fr')
# Get the IPA pronunciation
#ipa_symbols = epi.transliterate(translated_text)

#print(translated_text)
#print(ipa_symbols)
print(epi.transliterate(sys.argv[1]))

The text was updated successfully, but these errors were encountered:

dmort27 · 2021-03-11T14:57:14Z

As is noted in the README, support for French is not very good. This is partly due to ambiguities in the French orthography and partly do to insufficient work being devoted to the modules. The use of /r/ rather than /ʁ/ is intentional, however. One use of Epitran, early in its history, but producing representations that were relatively close to etymologically related forms in other languages. Since French was historically /r/ (and still is in some dialects), the distance between French and other languages was reduced by treating in this way.

If you can provide me with more test cases, I can update 'fra-Latn' so it passes them.

ItsSeaJay · 2021-03-12T11:41:39Z

I should have read the README file further before attempting what I was attempting the other day. I'm currently working on a test case module for the fr-Latn preprocessor that should be available at https://github.com/ItsSeaJay/epitran-fr-Latn-testcases
I should also mention that for my purposes I need to use uvular /ʁ/. The whole reason I'm working with this module is so I can automagically produce language flashcards for anki, and for that I need correct native pronunciations. I notice that in the README file you mention a downloadable bilingual or monolingual dictionary for chinese. Is there anything like that for the French language? Cheers.

ftyers · 2021-03-24T20:36:50Z

@ItsSeaJay there is https://github.com/kylebgorman/wikipron

dmort27 closed this as completed May 9, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

epitran.exceptions.DatafileError: Header is ["Prth", "Phon"] instead of ["Orth", "Phon"]. #69

epitran.exceptions.DatafileError: Header is ["Prth", "Phon"] instead of ["Orth", "Phon"]. #69

ItsSeaJay commented Mar 10, 2021

dmort27 commented Mar 11, 2021

ItsSeaJay commented Mar 12, 2021

ftyers commented Mar 24, 2021

epitran.exceptions.DatafileError: Header is ["Prth", "Phon"] instead of ["Orth", "Phon"]. #69

epitran.exceptions.DatafileError: Header is ["Prth", "Phon"] instead of ["Orth", "Phon"]. #69

Comments

ItsSeaJay commented Mar 10, 2021

dmort27 commented Mar 11, 2021

ItsSeaJay commented Mar 12, 2021

ftyers commented Mar 24, 2021