Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reordering of hamza above and fatha #509

Closed
emuller-amazon opened this issue Jul 14, 2017 · 6 comments
Closed

Reordering of hamza above and fatha #509

emuller-amazon opened this issue Jul 14, 2017 · 6 comments

Comments

@emuller-amazon
Copy link
Contributor

Input: <U+064A,U+064E,U+0670,U+0653,U+0640,U+0654,U+064E,U+0627>
Font: KFGQPC Uthmanic Script HAFS (available at http://fonts.qurancomplex.gov.sa/?page_id=42). (you will need the fix for issue 505, or a version of the font with the GSUB lookups de-interleaved).

With Uniscribe, the final glyphs are in the order [hamza above, fatha], and mkmk positions them correctly (second and third glyphs in the log):

1: <U+064A,U+064E,U+0670,U+0653,U+0640,U+0654,U+064E,U+0627>

1: [
{"g":"afii57415.zz04","cl":7,"dx":0,"dy":0,"ax":481,"ay":0},
{"g":"afii57454","cl":4,"dx":25,"dy":975,"ax":0,"ay":0},
{"g":"uni0654","cl":4,"dx":-50,"dy":50,"ax":0,"ay":0},
{"g":"afii57440","cl":4,"dx":0,"dy":0,"ax":650,"ay":0},
{"g":"uni0670_uni0653","cl":0,"dx":75,"dy":400,"ax":0,"ay":0},
{"g":"afii57454","cl":0,"dx":750,"dy":1125,"ax":0,"ay":0},
{"g":"afii57450.calt","cl":0,"dx":0,"dy":0,"ax":1331,"ay":0}]

test2 txt_uthmanichafs1 ver09 otf_uniscribe

With Harfbuzz, the final glyphs are reordered [fatha, hamza above] and mkmk no longer operates on them:

1: <U+064A,U+064E,U+0670,U+0653,U+0640,U+0654,U+064E,U+0627>

1: [{"g":"afii57415.zz04","cl":7,"dx":0,"dy":0,"ax":481,"ay":0},
{"g":"uni0654","cl":4,"dx":-50,"dy":50,"ax":0,"ay":0},
{"g":"afii57454","cl":4,"dx":75,"dy":500,"ax":0,"ay":0},
{"g":"afii57440","cl":4,"dx":0,"dy":0,"ax":650,"ay":0},
{"g":"uni0670_uni0653","cl":0,"dx":75,"dy":400,"ax":0,"ay":0},
{"g":"afii57454","cl":0,"dx":750,"dy":1125,"ax":0,"ay":0},
{"g":"afii57450.calt","cl":0,"dx":0,"dy":0,"ax":1331,"ay":0}]

test2 txt_fixed ttf_ot

I am guessing that Harfbuzz is doing a reordering based on canonical equivalence (since ccc(hamza above) = 230 and ccc(fatha) = 30), while Uniscribe does not.

Adding a CGJ after the hamza above prevents Harfbuzz's reordering and leads to the expected result; however, that totally breaks the rendering with Uniscribe on Windows 7, so it's not an entirely pleasant workaround.

@khaledhosny
Copy link
Collaborator

I think this is result of HarfBuzz performing Unicode normalization on the input which re-orders the combining marks (because Unicode combining classes for Arabic marks are serioisly broken). There was a discussion on the mailing list a while ago but no solution was implemented. One way to work around this is to insert U+034F COMBINING GRAPHEME JOINER between the hamza and fatha to prevent their reordering:

hb-unicode-encode U+064A,U+064E,U+0670,U+0653,U+0640,U+0654,U+034F,U+064E,U+0627 | \
hb-view 'UthmanicHafs1 Ver09.otf'

Note this this re-ordering can happen if any layer performed text normalization, so use of CGJ is safer anyway.

@emuller-amazon
Copy link
Contributor Author

If you think the combining classes are seriously broken (and I agree), then that's an argument for not doing any normalization in text rendering.

Yes, inserting CGJ makes the text more useful, and works around Harfbuzz's normalization, but as I mentioned, that text does not render properly with Windows 7, so it's not a completely satisfactory path.

@khaledhosny
Copy link
Collaborator

From Unicode point of view that can be seen as a font bug since two canonically equivalent strings should be rendered the same, U+064A,U+064E,U+0670,U+0653,U+0640,U+0654,U+064E,U+0627 and U+064A,U+064E,U+0670,U+0653,U+0640,U+064E,U+0654,U+0627 are canonically equivalent. HarfBuzz might be rendering them wrong, but at least it renders them the same which is not the case with Uniscribe.

@behdad
Copy link
Member

behdad commented Aug 9, 2017

If you think the combining classes are seriously broken (and I agree), then that's an argument for not doing any normalization in text rendering.

We do normalization using our custom-tailored combining classes. That said, no set of classes is adequate for Arabic. Arabic needs something more involved, and Roozbeh had a proposal, which we haven't implemented yet:
http://unicode.org/L2/L2014/14127-arabic-marks-order.pdf

I'll read it again and see 1. if it fixes your case, and 2. how hard it is to implement.

@roozbehp
Copy link
Collaborator

Two points:

  1. My documents has since advanced and UTC agreed in its last meeting that it should advance to a proposed draft UTR. Latest version is at http://www.unicode.org/L2/L2017/17253-arabic-ordering.pdf

  2. Eric, do you see the same problem with just the sequence U+0640,U+0654,U+064E (taweel, combining hamza above, fatha>? That's a simple-enough sequence, and if HarfBuzz doesn't work on that, there's something easily fixable, even if Behdad can't get to do all of L2/17-253.

@behdad
Copy link
Member

behdad commented Aug 10, 2017

Eric, do you see the same problem with just the sequence U+0640,U+0654,U+064E (taweel, combining hamza above, fatha>? That's a simple-enough sequence, and if HarfBuzz doesn't work on that, there's something easily fixable, even if Behdad can't get to do all of L2/17-253.

Yes, that's broken as well. Here's a quick fix I can commit while we explore more:

diff --git a/src/hb-unicode-private.hh b/src/hb-unicode-private.hh
index aa86a72c..34513e13 100644
--- a/src/hb-unicode-private.hh
+++ b/src/hb-unicode-private.hh
@@ -105,6 +105,10 @@ HB_UNICODE_FUNCS_IMPLEMENT_CALLBACKS_SIMPLE
   inline unsigned int
   modified_combining_class (hb_codepoint_t unicode)
   {
+    /* XXX This hack belongs to the Arabic shaper:
+     * Put HAMZA ABOVE in the same class as SHADDA. */
+    if (unlikely (unicode == 0x0654u)) unicode = 0x0651u;
+
     /* XXX This hack belongs to the Myanmar shaper. */
     if (unlikely (unicode == 0x1037u)) unicode = 0x103Au;

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants