Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Function xsampa_list() in _eptiran.py deletes things a lot #48

Open
m-wiesner opened this issue May 13, 2020 · 2 comments
Open

Function xsampa_list() in _eptiran.py deletes things a lot #48

m-wiesner opened this issue May 13, 2020 · 2 comments

Comments

@m-wiesner
Copy link

For instance in cebuano

felix --> [e, l, i]
x --> []

In swedish

och --> []

I fixed this (I think), by simply replacing the commented line below with the uncommented one. Maybe this is horribly wrong, but it seems to work now.

#ipa_segs = self.ft.ipa_segs(self.epi.strict_trans(word, normpunc,
# ligaturize))
ipa_segs = self.ft.segs_safe(self.epi.transliterate(word, normpunc, ligaturize))

@dmort27
Copy link
Owner

dmort27 commented May 14, 2020

The deletion is by design. The applications for which this method were originally designed required that only segments that were converted from orthography to IPA be present in the X-SAMPA output. The Epitran.strict_trans method does that. Epitran.transliterate allows every character that cannot be mapped to IPA to "pass through" to the output. In noisy data this can produce some unexpected results. For example, many of the output segments will not be valid IPA and cannot be converted to X-SAMPA.

(I'm confused by the Swedish example, though. This appears to be due to errors in the mapping file.)

If you want something like this, the best solution is to add another method, rather than change the existing one which already does what we want it to do. Submit a pull request and I'll add this.

@m-wiesner
Copy link
Author

Thanks for the answer. I thought it might be something like that at first, but the swedish example also seemed strange to me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants