Function xsampa_list() in _eptiran.py deletes things a lot #48

m-wiesner · 2020-05-13T22:40:05Z

For instance in cebuano

felix --> [e, l, i]
x --> []

In swedish

och --> []

I fixed this (I think), by simply replacing the commented line below with the uncommented one. Maybe this is horribly wrong, but it seems to work now.

#ipa_segs = self.ft.ipa_segs(self.epi.strict_trans(word, normpunc,
# ligaturize))
ipa_segs = self.ft.segs_safe(self.epi.transliterate(word, normpunc, ligaturize))

dmort27 · 2020-05-14T01:58:31Z

The deletion is by design. The applications for which this method were originally designed required that only segments that were converted from orthography to IPA be present in the X-SAMPA output. The Epitran.strict_trans method does that. Epitran.transliterate allows every character that cannot be mapped to IPA to "pass through" to the output. In noisy data this can produce some unexpected results. For example, many of the output segments will not be valid IPA and cannot be converted to X-SAMPA.

(I'm confused by the Swedish example, though. This appears to be due to errors in the mapping file.)

If you want something like this, the best solution is to add another method, rather than change the existing one which already does what we want it to do. Submit a pull request and I'll add this.

m-wiesner · 2020-05-14T02:05:11Z

Thanks for the answer. I thought it might be something like that at first, but the swedish example also seemed strange to me.

ruohoruotsi mentioned this issue Nov 30, 2021

Best way to get a delimited transliteration with XSampa #99

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Function xsampa_list() in _eptiran.py deletes things a lot #48

Function xsampa_list() in _eptiran.py deletes things a lot #48

m-wiesner commented May 13, 2020

dmort27 commented May 14, 2020

m-wiesner commented May 14, 2020

Function xsampa_list() in _eptiran.py deletes things a lot #48

Function xsampa_list() in _eptiran.py deletes things a lot #48

Comments

m-wiesner commented May 13, 2020

dmort27 commented May 14, 2020

m-wiesner commented May 14, 2020