-
Notifications
You must be signed in to change notification settings - Fork 121
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
German transliteration issues #64
Comments
These clearly are bugs—bugs which should have been caught by my tests. Let me look into this. |
These are both due, to some degree, to the same problem (a rule introduced by a PR I should have vetted more carefully). I think I have it fixed, but I have to do more testing. |
I have uploaded a new version of Epitran to PyPI. I have fixed the bugs you mentioned. However, I believe that there are other bugs in the German modules (dealing, for example, with vowel length). If you are willing to check this out, I will try to fix them. |
I just checked through a decent amount of examples and it looks like the /s/ is fixed for the environments I mentioned above. I went through some examples with the vowel length, it's ok when there's an /h/ in the orthography. But it has a bug when it comes to the letter /ß/. Here are a couple examples (epi2 was instantiated with 'deu-Latn-nar'):
Here in both pairs, the second should have a long vowel. Here are a couple example in the other direction:
In the first two, I'd expect the final vowel to be short and in the last one I would expect the first vowel to be short. I'd also expect the /s/ to be transcribed as [z] in the second example. Thanks for all your help so far. Let me know if you need some more examples and I can go digging! |
A couple more examples that might be helpful:
In these examples, the last vowel is the one that should be long, and the others short.
In these examples, all the initial vowels should be short, as they are followed by a consonant cluster. This follows the same rule that vowels before double written consonants are always short, but German doesn't double /z/, /k/, /ch/, /pf/ in orthography, instead opting for /tz/, /ck/, /ch/ and /pf/. |
Thank you. This is helpful. Is vowel length in German better stated as lengthening or shortening? |
The most helpful way of sharing this information would be in terms of tests: < Busse → busə Any more examples you could provide would be good, as well as rules that describe why vowels are long or short in a particular context. Thanks! |
In the literature they talk more often about a tense/lax distinction, which is conflated with the long/short distinction, as short, tense vowels occur so infrequently. You'd then have tense(long) vowels as the default, with a 'laxing'(shortening) process triggered by the different orthographic contexts.
You're right about so that it should be long. My mistake there. But the /o/ should definitely be short in also This makes the rule generalization a little harder. The more I think about it, there seems to be a lot of exceptions to the rules. My gut feeling tells me that since the stress falls on the /a/, the /o/ is not 'allowed' to be long. This same rule could then apply to kreativ, sozial and Lokomotive, since the stress falls on the last syllable. This seems to align with what this document is saying. Now that I think about it, a lot of exceptions to the rules could be explained by the frequency of the word's occurrence in daily speech, but I guess that transliteration logic is out of the scope of Epitran. As I continue with my research and come across more interesting cases, I'll report back asap. |
Sorry to have dropped this. It seems as if the situation in German is not unlike that of English—there are tense and lax vowels; the tense vowels are long and the lax vowels are short—except that the correlation is imperfect in German. Is this correct? I was working with sources that described the German distinction in terms of length rather than vowel quality, but I'd be willing change how this works in Epitran if you can point me to the literature I should follow. |
Hi @dmort27 sorry for the late response. I got caught up with other topics and am finally now making the rounds back to German transliteration. According to sections 1.3 and 1.4 in the document in the last comment I made, it seems like the tense/lax distinction is often conflated with the long/short distinction, which is different than English because you can have a long, lax vowel and a short, tense vowel (if my memory isn't failing me.). In the document, there can be short, tense vowels, but no long, lax vowels in German (section 1.4). What's interesting to me is that all the examples of short, tense vowels they give are words of foreign origin, but I can't seem to find a rule anywhere saying this is the case across the board. I also stumbled across an interesting post here which would explain the cases where the short-vowel-before-double-consonant rule doesn't apply. You can see it in the second comment in the link. Even though I'm not sure how he came to the conclusion that the 'd' in Mond is a suffix, the idea that if a syllable's coda is long, then the nucleus is short and vice-versa is a better rule than simply relying on the orthography. |
Hello,
I came across what I believe to be a bug in German transliteration of the grapheme 's'. This occurs when using the 'deu-Latn' and the 'deu-Latn-nar' dictionaries. Take for example the word 'sehr':
Here
epi1
was initialized with the 'deu-Latn' dictionary andepi3
with the 'deu-Latn-nar' dictionary.In both cases I would expect the 's' in 'sehr' to be transliterated with [z]. I know that [s] is also possible in this case when dealing with southern German dialects, and I see this transliteration when using the 'deu-Latn-np' dictionary. However, after consulting all my sources, I don't see a case where this can be transliterated as [t͡s].
Another example would be the word 'Stock':
In the case of the 'deu-Latn' example, I can understand why this may be transliterated as [s], but at least with the narrow transliteration I would expect [ʃ]. As far as I know, [s] only occurs in this environment in northern German dialects.
Would you mind investigating this with me? What I've done so far is look at 10s of examples(I'm transliterating a large corpus) and it seems that it happens across the board, no exceptions. I also made sure that I pip installed the latest version of Epitran.
The text was updated successfully, but these errors were encountered: