-
Notifications
You must be signed in to change notification settings - Fork 3
Conversation
…rving for vowels).
Codecov Report
@@ Coverage Diff @@
## master #123 +/- ##
=======================================
Coverage 99.56% 99.56%
=======================================
Files 8 8
Lines 696 696
=======================================
Hits 693 693
Misses 3 3
Continue to review full report at Codecov.
|
I've run the local commands for release and the checks, but please note that most of the steps for releasing (such as bumping version number, preparing for PyPI etc.) are not in place yet. I can take care of that once the changes are approved. |
I am against that, for reasons of the parsing procedure: if they qualify as a cluster, we can't handle them, only if they are a consonant. I'm happy with this PR, but also note that there is no way to have diphtongs being aliased, as a dipthong is a gain a derived sound, so no way to define it. Here, you need to go to the data in |
Sorry, my comment was not clear: the ȹ and ȸ digraphs are used for labiodental plosives, which would make it easier to annotate labiodental affricates with stuff like ȹs. As for the diphthongs, one more things to discuss. |
Thanks very much for this @tresoldi. I have a few comments.
|
|
As usual this is a Unicode problem of pre-composed vs. composed. It should be part of the normalization, but we'd better wait for the next version (this was my fault, I looked at the list of problems and parsed the grapheme in my mind, it didn't occur to me that it could be a pre-composed one...)
|
Yes, this is an example for normalization. In general, we need to always be aware of where to fix problems. We have the following:
Keeping in mind where a problem needs to be fixed will help in the future, we'll have to adjust our labels accordingly. I won't move a finger for solving problems like "ej" from the code, neither handling triphthongs, but if any of you wants to adjust the code accordingly here, feel free to do so. I think, however, it's more important to make an explicit list of accepted consonant combinations for clusters (listing all nasal+stop, stop+nasal, etc. whatever you want), as those are currently produced in an erratic fashion. |
IMPORTANT: less rounded and more rounded as "roundedness" should be given the preference, "rounding" is a main feature of a sound, and per definitionem, they can't be modified via diacritic, unless you add FULL CHARACTERS WITH DIACRITICS in vowels.tsv! This is a no-discussion, there is a workaround, and since the character is duplicated, roundedness should be deleted. These lines are anyway ignored in teh code by now. |
I fully agree here. Triphthongs should probably only have two patterns: with a trailing schwa or with with a central vocoid between approximants. For consonant clusters, I would only really accept sibilants+plosives+liquids, but I trust Cormac might convince me here. 😉 No matter what, the priority should however be adding more normalizations and other transcription systems. Now that I know the code in more detail it should not take me too long to do a PR with my unified feature system, which would be my priority in terms of CLTS innovations. |
This PR relates to most of the stuff discussed in #121
In detail:
linguolabial
as a value for featureplace
of consonants,, as well as the most important sounds to the catalog.There are some issues for discussion here, as per IPA all linguolabial
consonants need a diacritic (i.e., there is no linguolabial consonant with
its own, diacritic-less representation), which in turn makes things a bit
complex when setting an alias. As such, no stuff like U032B (combining arches) was
implemented: the only diacritic for linguolabial place of articulaton is
the standard IPA U033C (the seagull).
into a single feature "relative_articulation" (possibly not the best name),
as we were allowing for things like "advanced retracted centralized open front
vowel". This is now fixed.
t̟
andŋ˗
).like
mˑ
.s̻
andz̻
).ł
(Polish letter) is now normalized toɬ
(voiceless alveolar lateral fricative).ᶑ
(voiced retroflex implosive),b̪
(voiced labio-dental stop),and
p̪
(voiceless labio-dental stop) to BIPA."labialized-velar"
to"labio-velar"
and"labialized-palatal"
to"labio-palatal"
in CLTS; all the transcription dataand systems were updated (using
sed
from the command line, I've checked manytimes and don't think there are any false positives or negatives in the
replacements)
Things I didn't implement from the issue:
IPA for quite some-time; it's implementation, if really necessary, should be
part of specific TranscriptionData and TranscriptionSystems, not CLTS/BIPA.
more, it should be included in ad-hoc transcription data, not in BIPA), and diacritics for place of articulation are better part of the catalog of sounds than that of diacritics (as place of articulation is one of the essential features).
ʌʶ
); while I believethere is room for them, I agree with @cormacanderson that as a general feature
is questionable, and we should discuss this in more detail; one potential
source can be found here,
but many more references are presented by the most basic Google serach (mostly when describing dialects, and
I couldn't find a clear-cut case where it is phonemic).
ɗ
from dental to alveolar (even though alveolar as a defaultplace of articulation, requiring a diacritic for the dental, makes sense to me);
we need to discuss this in more detail, especially considering that the
transcription data we are linking to seem to use it as dental.
feature to all consonants; while I don't oppose this from an articulatory point
of view, it should be further discussed (if we go for the feature only for
approximants, things are more complicated, as we'd need to either set
approximants as a different sound type from consonants or to change the code
in order to implement the limitation).
the glyph), as it is currently not possible to have a one-fits-all solution;
any apparently simple change can result in unintended consequences (the easiest
solution is probably to just default to one position and manually list the
alternatives in the sound catalog, but once more this is something to be
discussed and agreed upon).
for labio-dental affricates, there is no rush in doing so and it is
probably a good idea to only include them in a second release (Cormac is
also waiting for an answer from Anne-Maria). They are not
formally IPA, but my opinion is that they would fit very well in BIPA considering
that [i] there is no independent glyph for those sounds, [ii] the graphical
solution is very good, and [iii] the symbols have been in use for quite some
time
ı
(dotless i U0131) as an alias ofɯ
U026F: this is reallya matter of Turkish orthography and not phonological transcription, and if
necessary should be part of a Turkish orthographic profile.
supports two-sound clusters by design. I can understand objections to that,
but this should really be first discussed with @LinguList .
nasalized nasal stops) are part of the design of CLTS; this can be changed
by checking for redundant features, but it is not something that can be
implemented with five minutes of coding. In any case, @LinguList should be
part of this discussion.
part of transcription data / orthographic profiles dealing with PIE, not
BIPA.
rounding
androundedness
shouldnot be different features, which would mean adding a continuum of
unrounded
,less-rounded
,rounded
, andmore-rounded
values. However,I didn't change that as it would brake some datasets such Eurasian (which
might be problematic anyway, with its "more-rounded unrounded" vowels which,
from a quick inspection, likely come from problems in parsing the diacritic
for "more-rounded" as one for "rounded", see the case of
Bulgarian
in their website) and it is something that should be investigated (for example,
shold we just take this as aliases for protusion and compression, or
endo- and exolabial?).
Many stuff from the issue needs to be discussed, perhaps individually;
among those:
⁰²
)ɛʲ
, and also longsuch as
oːʷ
)ɔʰ
)As I said, most if not all of the other issues are related to individual
transcription systems/data; I've added to CLTS/BIPA those that I found important
and necessary, but the remaining ones should probably be kept in their
specific contexts.