You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The deletion is by design. The applications for which this method were originally designed required that only segments that were converted from orthography to IPA be present in the X-SAMPA output. The Epitran.strict_trans method does that. Epitran.transliterate allows every character that cannot be mapped to IPA to "pass through" to the output. In noisy data this can produce some unexpected results. For example, many of the output segments will not be valid IPA and cannot be converted to X-SAMPA.
(I'm confused by the Swedish example, though. This appears to be due to errors in the mapping file.)
If you want something like this, the best solution is to add another method, rather than change the existing one which already does what we want it to do. Submit a pull request and I'll add this.
For instance in cebuano
felix --> [e, l, i]
x --> []
In swedish
och --> []
I fixed this (I think), by simply replacing the commented line below with the uncommented one. Maybe this is horribly wrong, but it seems to work now.
#ipa_segs = self.ft.ipa_segs(self.epi.strict_trans(word, normpunc,
# ligaturize))
ipa_segs = self.ft.segs_safe(self.epi.transliterate(word, normpunc, ligaturize))
The text was updated successfully, but these errors were encountered: