-
Notifications
You must be signed in to change notification settings - Fork 624
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Shaping bugs for Cham? #376
Comments
I can't reproduce any non-USE issue, leaving only the already-confirmed 2 issues of USE confirmed again. Ben, please clarify in what environment you're doing these tests. From a developer's point of view I see there're 3 issues reported:
More about issue 2 and 3: These issues have been discussed also on Twitter: https://twitter.com/lianghai/status/794637250361368576 Basically, the USE expects medial consonant signs and vowel signs stored in the standard visual order and only 1 medial consonant sign is allowed on each side: The requirement of vowel sign order is debatable (don't vowel sign need logic order at all?) but does conform to the Unicode Standard — see the core spec 9.0, section 12.3 Gurmukhi, the last paragraph of "Encoding Principles" on page 474:
However, actually, the Unicode Standard is not self-consistent. See table 16-15 "Cham Syllabic Structure" on page 645, which shows AA sign (U+AA29), an above-base mark, is expected to follow any other vowel signs (including U sign, a below-base mark). About medial consonant signs though, the Unicode Standard clearly specifies multiple medial consonant signs on the same side is valid ( My test string is "ꨀꨀꨩꨀꨳꨀꨳꨩꨀꨶꨩꨄꨰꨃꨩꨆꨭꨩꨆꨩꨭ꩑ꨩꨆꨵꨶ" ( Tested in Chrome 55.0.2883.87, macOS Sierra 10.12.1, with a modified Noto Sans Cham (added the correct script tag "cham" as the USE requests, mapping U+25CC to "@" so the dotted circle (missing in Noto Sans Cham) is available and visible). |
In my understanding AA29 can serve either as a long vowel mark -aa-, when alone on a consonant, or as a vowel lengthener to act with other vowel diacritics, when it's typed/stored after that diacritic. I wonder if Unicode makes provision for things that can be in more than one category.
If Burmese is ever migrated to USE, we'll have the same situation there. Medial Wa and medial Ha both go underneath, and can be combined on one base. I'd be surprised if there are not other scripts that need similar arrangements. But this seems to be a USE issue. Regarding issue 1, I'm on Mavericks, and I gather the version of USE it uses might not be as up-to-date as on other setups, so that could explain why you can't reproduce it. |
I understand it as simply a pure vowel sign AA that hints a long vowel when combined with other vowel signs — so there isn't a significant logic order in this case. It can also be analyzed as a pure vowel lengthener that lengthens all vowels, including consonant letters' inherent vowel A (then nasal consonant letter's inherent vowel UE can be analyzed as a higher-level conditioned variant of A).
The current model of Unicode
Yes, this is apparently a USE issue.
Uh… OS X Mavericks has nothing to do with USE. If you test in the latest Chrome, then Chrome uses its own HarfBuzz, independent from the operating system, to shape complex scripts. If you test in OS X's native text fields, then OS X Mavericks' Core Text doesn't have any support of USE yet. Ben, you really, REALLY need to explain very clearly in what software you do all those tests. Saying "I'm on Mavericks" is not helpful at all. Screenshots (of the tested string in the environment, preferably with the software's About window) is helpful if you're not sure about what to explain… |
Sorry, I thought I'd mentioned, I do all my font testing on Firefox. I'm currently on 50.0.1. I'll get to Win10 later this week, try Chrome too, and post images of each. |
Okay, I can confirm Firefox 50.1.0 behaves like what Ben described: So it seems to be a Firefox bug. How does Firefox behave differently if it uses the latest (?) HarfBuzz directly? This is the patched Noto Sans Cham used in my tests: |
Firefox 50 doesn't have the most recent harfbuzz. An update to harfbuzz 1.3.3 landed for Firefox 52 (https://bugzilla.mozilla.org/show_bug.cgi?id=1313097), so you might like to test with Firefox Developer Edition (https://www.mozilla.org/en-US/firefox/developer/) to see whether this affects the result. |
Confirmed: Firefox Developer Edition 52.0a2 has the same result as Chrome's (how recent HarfBuzz behaves). So those HarfBuzz-related issues (issue 1 in my first comment) are because of an older version of HarfBuzz in the current stable release of Firefox. |
Thanks for checking that :-) Re issue 1, did you try numeral + mark? In Win 10, USE is not accepting that in Notepad, but it's fine in Edge. It still leaves the medial La + medial Wa combination, and the below-vowel + AA29 combination needing to be fixed, which seem now to be a USE issue, and I've informed Andrew G. |
The second last syllable in my test string is |
Distinguishing medial consonants from Consonant_Subjoined seems a bad idea for SE Asian scripts. While I'm not sure I can get two consecutive medials in Tai Tham, I get similar rendering problems as can be seen in the word for 'iron' at http://www.wrdingham.co.uk/lanna/renderer_test.htm#surprises. The problem also shows up in MS Edge. Tai Tham also suffers from the vowel ordering issue. |
Can we go ahead and relax the medial consonant behaviour to allow more than one? I tested with this and it looks good to me: hb-ot-shape-complex-use-machine.rl
|
This is a USE issue. Andrew Glass is aware and plans to implement an override to allow two medials in the below-base slot. |
It's an issue with the USE spec, but harfbuzz and every other shaper needs to implement the spec somehow. This is a harfbuzz implementation for this particular problem which we can easily do now just like we did an override for Chakma split vowels. Andrew can update the spec, but harfbuzz still needs to implement it at some point. Sooner is always nice especially when we have fonts in the pipeline that need it. |
Awesome, I see :) |
@behdad Any way we can get these two issues fixed ASAP? We want to get this Cham delivered, but it needs correct shaping. My local build of Firefox with a patched harfbuzz has been great for my own testing, but I'd like to get harfbuzz updated for real. Just to recap, the two issues @ohbendy originally reported that are still present:
My local quick fix for these was just to totally relax the strictness of the USE spec in hb-ot-shape-complex-use-machine.rl:
Of course if we want to make these explicit exceptions it needs more, but this works as a quick fix. |
Which thread was that? I can review and adjust HarfBuzz. |
I was in touch with Andrew privately by email a while ago.
Ben Mitchell
The Fontpad
Follow me on:
Twitter
Flickr
Behance
LinkedIn
…On 14 Apr 2017, at 00:08, Behdad Esfahbod ***@***.***> wrote:
This is a USE issue. Andrew Glass is aware and plans to implement an override to allow two medials in the below-base slot.
Which thread was that? I can review and adjust HarfBuzz.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
|
Ok I just emailed Andrew to get a clarification of what he will be changing, so I can match. |
Yeah, don't like to loosen up the regex more than needed. |
Here's Andrew's response to me in private: It’s not my recollection that I committed USE to a particular solution for Cham. ... In principle, I think we may have been too restrictive on the medials. The real need here was to limit the prebase medials, the others aren’t such a problem. So I think that the Cham case would be fixed that way. I don’t expect any negative impact on other writing systems based on change the limit, since it is merely relaxing a restriction. Another option was to play games with the overrides. That is cheaper to implement, but isn’t really the right solution and doesn’t help other writing systems. I keep hoping to have time to think about the problems for Tai Tham. Ideally (from my perspective) , there would be one set of cluster model changes to fix the Cham medials and the monosyllabic Tai Tham issues. Focusing on Tai Tham is on my spare-time calendar but about fourth on the list. I’m not keen at the point, on trying to address Tai Tham polysyllabic clusters. This is more of a philosophical debate on orthographic clusters and reading order – to be hashed out at some point with Richard and others. |
Part of #376 Also see https://github.com/roozbehp/unicode-data/issues/6 Test added, using NotoSansCham built from Noto Phase III sources.
Ok, fixed the U+AA29 issue. Working on other issue. |
This behaves in the same way as Burmese အ or Thai อ in that it can stand alone as a glottal stop consonant, or take diacritic marks as a vowel carrier (functioning like an independent vowel). In my tests, typing any diacritic mark after it is an invalid combination and dotted circles are inserted to carry the mark. The consonant ꨀ can also appear with medial ya ꨳ (AA33) or medial wa (AA36) with a diacritic, but these combinations generate dotted circles.
Unicode Cham proposal N3120 suggests "that applications permit ꨀ to bear any of the vowel signs”.
In one text, the independent vowel ꨄ (u+AA04) is modified by a preceding ꨰ vowel (u+AA30).
In several manuscripts, ꨃ (u+AA03) is modified with vowel lengthener ◌ꨩ (u+AA29).
N3120 mentions “Four of the other independent vowels are also attested bearing matras”, so generalising this means any combination of independent vowel plus diacritic should be allowed.
Combination of vowel ◌ꨭ (AA2D) with vowel lengthener ◌ꨩ (u+AA29)
This generates a dotted circle, though N3120 notes this combination is a long -uu- vowel, with AA29 typed/stored last.
Numerals with diacritics
I’ve found evidence of this in a number of manuscripts, but numeral + mark has created shaping errors in my tests with dotted circles being inserted.
Initially I thought the four issues above were USE bugs, but Andrew Glass advised me it's Firefox/Harfbuzz as "on Windows, independent vowels, consonants and digits are all given the same base class and will permit the standard clusters to form".
A consonant can carry medial La (AA35) and medial Wa (AA36) together, but USE disallows this combination. Andrew Glass mentions this can be remedied, so Harfbuzz will also need to allow for this combination.
N3120 mentions "Three medial clusters occur: ◌ꨴꨶ -rwa, ◌ꨵꨳ -lya, and ◌ꨵꨶ -lwa”
The text was updated successfully, but these errors were encountered: