Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect grapheme-phoneme alignment in word_to_tuple response #44

Closed
mashabelyi opened this issue Mar 14, 2020 · 2 comments
Closed

Incorrect grapheme-phoneme alignment in word_to_tuple response #44

mashabelyi opened this issue Mar 14, 2020 · 2 comments

Comments

@mashabelyi
Copy link

Thank you for this great tool!
I was hoping to use Epitran to extract frequencies of grapheme-phoneme alignment in different languages. But I am running into issues when using the word_to_tuples and word_to_segs features.

Here is the output of epi.word_to_tuples for the word tough in English

('L', 0, 't', 't', [('t', <map object at 0x113817c50>)])
('L', 0, 'o', 'ʌ', [('ʌ', <map object at 0x113817250>)])
('L', 0, 'u', 'f', [('f', <map object at 0x1120a06d0>)])
('L', 0, 'g', '', [(-1, [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0])])
('L', 0, 'h', '', [(-1, [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0])])

Here is the output for choice

('L', 0, 'c', 't͡ʃ', [('t͡ʃ', <map object at 0x11380cad0>)])
('L', 0, 'h', 'o', [('o', <map object at 0x11380c5d0>)])
('L', 0, 'o', 'j', [('j', <map object at 0x11380cb10>)])
('L', 0, 'i', 's', [('s', <map object at 0x1120a0fd0>)])
('L', 0, 'c', '', [(-1, [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0])])
('L', 0, 'e', '', [(-1, [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0])])

I'd expect the phonetic form /f/ in tough to correspond to either g or h. And the phonetic form /s/ in choice to correspond to c. However, that's not the case. I am wondering if this is expected behavior or a bug?

@dmort27
Copy link
Owner

dmort27 commented Apr 3, 2020

Sorry for the late response. The answer is that Epitran was not made to do what you want to do (extract phoneme-grapheme alignments). The behavior you is expected—these methods were added with a very specific application in mind which did not require accurate alignments between the two representations, only some alignment. Perhaps this code should be removed. In any case, Epitran, because of its architecture, will only get you part way to phoneme-grapheme alignments (phonemic representations). You must do the rest with an aligner.

@dmort27 dmort27 closed this as completed Apr 3, 2020
@mashabelyi
Copy link
Author

Got it, thanks for your response.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants