-
Notifications
You must be signed in to change notification settings - Fork 608
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reordering of hamza above and fatha #509
Comments
I think this is result of HarfBuzz performing Unicode normalization on the input which re-orders the combining marks (because Unicode combining classes for Arabic marks are serioisly broken). There was a discussion on the mailing list a while ago but no solution was implemented. One way to work around this is to insert U+034F COMBINING GRAPHEME JOINER between the hamza and fatha to prevent their reordering:
Note this this re-ordering can happen if any layer performed text normalization, so use of CGJ is safer anyway. |
If you think the combining classes are seriously broken (and I agree), then that's an argument for not doing any normalization in text rendering. Yes, inserting CGJ makes the text more useful, and works around Harfbuzz's normalization, but as I mentioned, that text does not render properly with Windows 7, so it's not a completely satisfactory path. |
From Unicode point of view that can be seen as a font bug since two canonically equivalent strings should be rendered the same, U+064A,U+064E,U+0670,U+0653,U+0640,U+0654,U+064E,U+0627 and U+064A,U+064E,U+0670,U+0653,U+0640,U+064E,U+0654,U+0627 are canonically equivalent. HarfBuzz might be rendering them wrong, but at least it renders them the same which is not the case with Uniscribe. |
We do normalization using our custom-tailored combining classes. That said, no set of classes is adequate for Arabic. Arabic needs something more involved, and Roozbeh had a proposal, which we haven't implemented yet: I'll read it again and see 1. if it fixes your case, and 2. how hard it is to implement. |
Two points:
|
Yes, that's broken as well. Here's a quick fix I can commit while we explore more:
|
Input: <U+064A,U+064E,U+0670,U+0653,U+0640,U+0654,U+064E,U+0627>
Font: KFGQPC Uthmanic Script HAFS (available at http://fonts.qurancomplex.gov.sa/?page_id=42). (you will need the fix for issue 505, or a version of the font with the GSUB lookups de-interleaved).
With Uniscribe, the final glyphs are in the order [hamza above, fatha], and mkmk positions them correctly (second and third glyphs in the log):
1: <U+064A,U+064E,U+0670,U+0653,U+0640,U+0654,U+064E,U+0627>
1: [
{"g":"afii57415.zz04","cl":7,"dx":0,"dy":0,"ax":481,"ay":0},
{"g":"afii57454","cl":4,"dx":25,"dy":975,"ax":0,"ay":0},
{"g":"uni0654","cl":4,"dx":-50,"dy":50,"ax":0,"ay":0},
{"g":"afii57440","cl":4,"dx":0,"dy":0,"ax":650,"ay":0},
{"g":"uni0670_uni0653","cl":0,"dx":75,"dy":400,"ax":0,"ay":0},
{"g":"afii57454","cl":0,"dx":750,"dy":1125,"ax":0,"ay":0},
{"g":"afii57450.calt","cl":0,"dx":0,"dy":0,"ax":1331,"ay":0}]
With Harfbuzz, the final glyphs are reordered [fatha, hamza above] and mkmk no longer operates on them:
1: <U+064A,U+064E,U+0670,U+0653,U+0640,U+0654,U+064E,U+0627>
1: [{"g":"afii57415.zz04","cl":7,"dx":0,"dy":0,"ax":481,"ay":0},
{"g":"uni0654","cl":4,"dx":-50,"dy":50,"ax":0,"ay":0},
{"g":"afii57454","cl":4,"dx":75,"dy":500,"ax":0,"ay":0},
{"g":"afii57440","cl":4,"dx":0,"dy":0,"ax":650,"ay":0},
{"g":"uni0670_uni0653","cl":0,"dx":75,"dy":400,"ax":0,"ay":0},
{"g":"afii57454","cl":0,"dx":750,"dy":1125,"ax":0,"ay":0},
{"g":"afii57450.calt","cl":0,"dx":0,"dy":0,"ax":1331,"ay":0}]
I am guessing that Harfbuzz is doing a reordering based on canonical equivalence (since ccc(hamza above) = 230 and ccc(fatha) = 30), while Uniscribe does not.
Adding a CGJ after the hamza above prevents Harfbuzz's reordering and leads to the expected result; however, that totally breaks the rendering with Uniscribe on Windows 7, so it's not an entirely pleasant workaround.
The text was updated successfully, but these errors were encountered: