Reordering of hamza above and fatha #509

emuller-amazon · 2017-07-14T17:21:46Z

Input: <U+064A,U+064E,U+0670,U+0653,U+0640,U+0654,U+064E,U+0627>
Font: KFGQPC Uthmanic Script HAFS (available at http://fonts.qurancomplex.gov.sa/?page_id=42). (you will need the fix for issue 505, or a version of the font with the GSUB lookups de-interleaved).

With Uniscribe, the final glyphs are in the order [hamza above, fatha], and mkmk positions them correctly (second and third glyphs in the log):

1: <U+064A,U+064E,U+0670,U+0653,U+0640,U+0654,U+064E,U+0627>

1: [
{"g":"afii57415.zz04","cl":7,"dx":0,"dy":0,"ax":481,"ay":0},
{"g":"afii57454","cl":4,"dx":25,"dy":975,"ax":0,"ay":0},
{"g":"uni0654","cl":4,"dx":-50,"dy":50,"ax":0,"ay":0},
{"g":"afii57440","cl":4,"dx":0,"dy":0,"ax":650,"ay":0},
{"g":"uni0670_uni0653","cl":0,"dx":75,"dy":400,"ax":0,"ay":0},
{"g":"afii57454","cl":0,"dx":750,"dy":1125,"ax":0,"ay":0},
{"g":"afii57450.calt","cl":0,"dx":0,"dy":0,"ax":1331,"ay":0}]

With Harfbuzz, the final glyphs are reordered [fatha, hamza above] and mkmk no longer operates on them:

1: <U+064A,U+064E,U+0670,U+0653,U+0640,U+0654,U+064E,U+0627>

1: [{"g":"afii57415.zz04","cl":7,"dx":0,"dy":0,"ax":481,"ay":0},
{"g":"uni0654","cl":4,"dx":-50,"dy":50,"ax":0,"ay":0},
{"g":"afii57454","cl":4,"dx":75,"dy":500,"ax":0,"ay":0},
{"g":"afii57440","cl":4,"dx":0,"dy":0,"ax":650,"ay":0},
{"g":"uni0670_uni0653","cl":0,"dx":75,"dy":400,"ax":0,"ay":0},
{"g":"afii57454","cl":0,"dx":750,"dy":1125,"ax":0,"ay":0},
{"g":"afii57450.calt","cl":0,"dx":0,"dy":0,"ax":1331,"ay":0}]

I am guessing that Harfbuzz is doing a reordering based on canonical equivalence (since ccc(hamza above) = 230 and ccc(fatha) = 30), while Uniscribe does not.

Adding a CGJ after the hamza above prevents Harfbuzz's reordering and leads to the expected result; however, that totally breaks the rendering with Uniscribe on Windows 7, so it's not an entirely pleasant workaround.

khaledhosny · 2017-07-15T00:26:27Z

I think this is result of HarfBuzz performing Unicode normalization on the input which re-orders the combining marks (because Unicode combining classes for Arabic marks are serioisly broken). There was a discussion on the mailing list a while ago but no solution was implemented. One way to work around this is to insert U+034F COMBINING GRAPHEME JOINER between the hamza and fatha to prevent their reordering:

hb-unicode-encode U+064A,U+064E,U+0670,U+0653,U+0640,U+0654,U+034F,U+064E,U+0627 | \
hb-view 'UthmanicHafs1 Ver09.otf'

Note this this re-ordering can happen if any layer performed text normalization, so use of CGJ is safer anyway.

emuller-amazon · 2017-07-15T01:16:07Z

If you think the combining classes are seriously broken (and I agree), then that's an argument for not doing any normalization in text rendering.

Yes, inserting CGJ makes the text more useful, and works around Harfbuzz's normalization, but as I mentioned, that text does not render properly with Windows 7, so it's not a completely satisfactory path.

khaledhosny · 2017-07-15T01:37:48Z

From Unicode point of view that can be seen as a font bug since two canonically equivalent strings should be rendered the same, U+064A,U+064E,U+0670,U+0653,U+0640,U+0654,U+064E,U+0627 and U+064A,U+064E,U+0670,U+0653,U+0640,U+064E,U+0654,U+0627 are canonically equivalent. HarfBuzz might be rendering them wrong, but at least it renders them the same which is not the case with Uniscribe.

behdad · 2017-08-09T23:19:13Z

If you think the combining classes are seriously broken (and I agree), then that's an argument for not doing any normalization in text rendering.

We do normalization using our custom-tailored combining classes. That said, no set of classes is adequate for Arabic. Arabic needs something more involved, and Roozbeh had a proposal, which we haven't implemented yet:
http://unicode.org/L2/L2014/14127-arabic-marks-order.pdf

I'll read it again and see 1. if it fixes your case, and 2. how hard it is to implement.

roozbehp · 2017-08-10T00:09:16Z

Two points:

My documents has since advanced and UTC agreed in its last meeting that it should advance to a proposed draft UTR. Latest version is at http://www.unicode.org/L2/L2017/17253-arabic-ordering.pdf
Eric, do you see the same problem with just the sequence U+0640,U+0654,U+064E (taweel, combining hamza above, fatha>? That's a simple-enough sequence, and if HarfBuzz doesn't work on that, there's something easily fixable, even if Behdad can't get to do all of L2/17-253.

behdad · 2017-08-10T00:22:48Z

Eric, do you see the same problem with just the sequence U+0640,U+0654,U+064E (taweel, combining hamza above, fatha>? That's a simple-enough sequence, and if HarfBuzz doesn't work on that, there's something easily fixable, even if Behdad can't get to do all of L2/17-253.

Yes, that's broken as well. Here's a quick fix I can commit while we explore more:

diff --git a/src/hb-unicode-private.hh b/src/hb-unicode-private.hh
index aa86a72c..34513e13 100644
--- a/src/hb-unicode-private.hh
+++ b/src/hb-unicode-private.hh
@@ -105,6 +105,10 @@ HB_UNICODE_FUNCS_IMPLEMENT_CALLBACKS_SIMPLE
   inline unsigned int
   modified_combining_class (hb_codepoint_t unicode)
   {
+    /* XXX This hack belongs to the Arabic shaper:
+     * Put HAMZA ABOVE in the same class as SHADDA. */
+    if (unlikely (unicode == 0x0654u)) unicode = 0x0651u;
+
     /* XXX This hack belongs to the Myanmar shaper. */
     if (unlikely (unicode == 0x1037u)) unicode = 0x103Au;

Part of #509

behdad added a commit that referenced this issue Aug 10, 2017

Treat HAMZA ABOVE similar to SHADD for sorting purposes

5a33057

Part of #509

khaledhosny mentioned this issue Oct 3, 2017

Review Proposed Draft: Unicode Technical Report #53: Unicode Arabic Mark Ordering Algorithm w3c/alreq#143

Closed

behdad closed this as completed in ab8d70e Oct 4, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reordering of hamza above and fatha #509

Reordering of hamza above and fatha #509

emuller-amazon commented Jul 14, 2017

khaledhosny commented Jul 15, 2017

emuller-amazon commented Jul 15, 2017

khaledhosny commented Jul 15, 2017

behdad commented Aug 9, 2017

roozbehp commented Aug 10, 2017

behdad commented Aug 10, 2017

Reordering of hamza above and fatha #509

Reordering of hamza above and fatha #509

Comments

emuller-amazon commented Jul 14, 2017

khaledhosny commented Jul 15, 2017

emuller-amazon commented Jul 15, 2017

khaledhosny commented Jul 15, 2017

behdad commented Aug 9, 2017

roozbehp commented Aug 10, 2017

behdad commented Aug 10, 2017