U+0670 superscript alef should be written with horizontal spacing when input after fathah #217

adamiturabi · 2021-04-17T10:03:59Z

When U+0670 superscript alef is input after a fathah, I believe it should be written with horizontal spacing after the fathah. This image should illustrate what I mean:

However, even in the "better image" the vertical positioning is not correct. It is too high in ذلك and too low in هذا. (I faked it with U+0202f in the former and tatweel in the latter.)

Thank you for your continued support to this great typeface.

khaledhosny · 2021-04-17T10:47:38Z

In the first case, the small alef should be placed on tatweel هَـٰذا, in the second case it should be placed on a non-breaking space ذَ ٰلك (U+00A0, though copying from here might change it into regular space):

I can support NNBSP (U+202F) as an alternate base as well.

The issue arises from the face that small alef (an other small letters) have different rules in Quranic orthography (some times they are used as combining marks and other times as standalone letters that don’t affect letter joining, but Unicode recommends using a base character in the second case instead of encoding an alternate set of small letters with different properties).

adamiturabi · 2021-04-17T17:07:36Z

Thanks for the quick reply. The main use that I see in the Quran is that superscript alef is used

On vowel letters و and ى. For example عَلَىٰ ، صَلَوٰة
After a fathah, in which case it is floating like in هذا and ذلك

It seems to me that both usages will be supported if the sequence U+064E->U+0670 offsets the superscript alef horizontally without affecting letter joining. Otherwise if U+0670 is input without U+064E then it may be placed vertically on top of the letter.

Calibri Arabic handles it this way as far as I can see.

Unless there is other usage besides the above two which won't be handled?

Thanks again.

khaledhosny · 2021-04-17T23:18:12Z

The problem is that depending on the presence of fatha is a hack and goes against Unicode making the small alef a combining mark. Using a charcters as a seat is more reliable (you can have هـٰذا without requiring a fatha) and actually what Unicode recommends (I can’t find the text right now, so you will have to take my word on this). I used to do the fatha hack but was later convinced better not do it. Using kashida/nbsp/nnbsp is likely to work on more font (even if with less optimal rendering) than the fatha hack.

adamiturabi · 2021-04-18T06:48:40Z

Thanks again for the detailed explanation. I happily take your word on it. However, do you think it will be harmless to add the fatha+dagger alef hack (in addition to explicit input over tatweel/nbsp) as it will separate character input from glyph typesetting, which, as I understand, is an underlying tenet of Unicode philosophy.

It will also make the behavior similar to how Amiri deals with inline hamza in words like خطيءة. Amiri correctly joins the ي with the ة, unlike most other fonts, which require superscript hamza over a tatweel.

moyogo · 2021-04-18T09:19:20Z

I think the use of tatweel and no-break space was proposed in L2/09-358 and there is a UTC action item 139-A60 for a formal proposal.

The Unicode 13.0.0 chapter 9 doesn't mention this use of tatweel or no-break space but this is similar to the use of hamza above on tatweel.

khaledhosny · 2021-04-20T14:29:11Z

it will separate character input from glyph typesetting, which, as I understand, is an underlying tenet of Unicode philosophy.

I already had this before (even before Calibri Arabic was design) but I removed it for the reasons above, and I’d rather people followed a standard way to encode this sequence (with reasonable fallback for fonts that don’t handle it nicely) rather than depend on font-specific hacks. I’d have preferred a more semantic way to encode this, but Unicode seems to be reluctant (the cleanest way would be a separate character, and I encourage you to work on a proposal to Unicode if you feel sstrongly enough about this issue).

It will also make the behavior similar to how Amiri deals with inline hamza in words like خطيءة.

This is also another non-standard feature of Amiri that I wish to drop at some point for the exact same reasons.

I think the use of tatweel and no-break space was proposed in L2/09-358 and there is a UTC action item 139-A60 for a formal proposal.

Thanks @moyogo for the links.

adamiturabi · 2021-04-21T08:01:03Z

Thank you @khaledhosny and @moyogo .

I’d have preferred a more semantic way to encode this, but Unicode seems to be reluctant (the cleanest way would be a separate character, and I encourage you to work on a proposal to Unicode if you feel sstrongly enough about this issue).

There are a couple of Unicode documents by Thomas Milo discussing this issue:
https://unicode.org/L2/L2014/14109-inline-chars.pdf
https://unicode.org/L2/L2013/13226-koran-ortho.pdf

The Unicode 13.0.0 chapter 9 doesn't mention this use of tatweel or no-break space but this is similar to the use of hamza above on tatweel.

It will also make the behavior similar to how Amiri deals with inline hamza in words like خطيءة.
This is also another non-standard feature of Amiri that I wish to drop at some point for the exact same reasons.

There are rare cases in non-Quranic script where superscript hamzah over a tatweel character will not suffice. For example, لَءَّال la22aal (pearl-seller) will break the mandatory lam-alef ligature if written with tatweel: لَـَّٔال.

It seems a complicated situation that you are definitely more qualified to address. As a user, however, semantic encoding is quite nice to have.

Thanks for discussing.

adamiturabi · 2021-04-28T01:20:38Z

@khaledhosny @moyogo I hope it’s ok if I re-open this discussion a bit. I appreciate your point about not wanting to have a font-specific hack.

Doing some research, I found this description of U+034F ͏COMBINING GRAPHEME JOINER (CGJ): https://en.wikipedia.org/wiki/Combining_Grapheme_Joiner

The discussion on the rendering of Hebrew diacritics seems quite relevant.

Could we use CGJ in the case of dagger alef and hamza? Here is how it could potentially be used:

Dagger alef:

Input sequence	Rendering
heh + dagger
heh + fatha + dagger
heh + fatha + CGJ + dagger
heh + CGJ + dagger
heh + dagger + thal
heh + CGJ + dagger + thal
heh + fatha + CGJ + thal
thal + dagger
thal + CGJ + dagger
thal + fatha + CGJ + dagger
waw + dagger
waw + CGJ + dagger

This way one common method can be used for both joining characters and non-joining characters (dal, thal, waw, etc.). Instead of using tatweel for joining characters and NBSP for non-joining characters. Also, we are not relying on the presence of fatha to determine whether to horizontally offset the dagger. (I now appeciate your point about wanting to have هـٰذا displayed without a fatha on the heh.)

Floating hamza

The implementation for hamza is a bit muddier since, uni0621 standalone hamza is now expected by users to break the joining of characters.

But one possible method could be to use CGJ with uni0654 “hamza above”. If CGJ comes before uni0654 then it will appear above the baseline without affecting the joining of the previous character to the next character.

Input sequence	Rendering
Meem + fatha + lam + fatha + CGJ + hamza above + fathatan + alef
lam + fatha + CGJ + hamza above + shaddah + fatha + alef
sheen + yeh + CGJ + hamza above + alef

If you think this idea has merit, I can try creating a formal proposal. Please let me know what you think, as I'm only a user and haven't studied Unicode development in detail.

Thank you.

khaledhosny · 2021-05-02T00:24:51Z

Using CGJ is not a bad idea. I don't personally care much what method should be used, all I care about is standardized way that can represent the text reliably. Any solution can be made to produce the same output by the font.

adamiturabi · 2021-05-03T01:22:01Z

I've written a draft proposal here: https://github.com/adamiturabi/arabic-inline-unicode/blob/main/index.pdf

I'd appreciate it if you could take a look. Also, if you could mention it to others who might be interested in this implementation and who might be able to give it some traction.

Thanks.

khaledhosny · 2021-05-12T04:06:22Z

Looks good. Few comments:

It is not that fonts consider hamza to be breaking (non-joining) character, it is Unicode that specifies this and OpenType shaping engines enforce it (as they rely on Unicode for joining behaviour). Fonts that want to change this behaviour will have to jump through many hoops to achieve it.
The use of ـئـ for medial seat-less hamza is specified by Arabic Academy in Cairo (مجمع اللغة العربية), as part of its effort to “simplify” hamza rules, it is not just people being lazy.
For dagger alef, comparison can be made between it and small waw and small yeh, where both have combining and non-combining variants atomically encoded.

adamiturabi · 2021-05-14T04:05:23Z

Thank you. I've incorporated your feedback. You can see the diffs here: adamiturabi/arabic-inline-unicode@7a5179d

The updated PDF is in the same location: https://github.com/adamiturabi/arabic-inline-unicode/blob/main/index.pdf

Regarding your last point, I wasn't sure exactly what you meant by making a comparison. Because we are not proposing a separate encoding for breaking dagger alef. But according to the CGJ scheme, the breaking "small waw" and "small yeh" won't technically be needed any more. So I've mentioned that.

Also, attempting to tag @roozbehp here.

roozbehp · 2021-05-15T07:42:03Z

What is the question for me?

adamiturabi · 2021-05-15T13:42:53Z

What is the question for me?

@roozbehp Thanks for responding. I see that you have some prior work regarding proposing the handling of Arabic inline characters:

I think the use of tatweel and no-break space was proposed in L2/09-358 and there is a UTC action item 139-A60 for a formal proposal.

I've written a document on this issue and a proposed solution, matching one in L2/09-358R, here: https://github.com/adamiturabi/arabic-inline-unicode/blob/main/index.pdf

It will be great if you can provide feedback and recommend how to proceed w.r.t. proposing a solution to Unicode.

khaledhosny closed this as completed in 9bed23a Apr 17, 2021

adamiturabi mentioned this issue May 2, 2021

TATWEEL and HAMZA ABOVE break lam-alef ligature rastikerdar/vazirmatn#197

Open

khaledhosny added the quran label Mar 12, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

U+0670 superscript alef should be written with horizontal spacing when input after fathah #217

U+0670 superscript alef should be written with horizontal spacing when input after fathah #217

adamiturabi commented Apr 17, 2021

khaledhosny commented Apr 17, 2021

adamiturabi commented Apr 17, 2021

khaledhosny commented Apr 17, 2021

adamiturabi commented Apr 18, 2021 •

edited

Loading

moyogo commented Apr 18, 2021

khaledhosny commented Apr 20, 2021

adamiturabi commented Apr 21, 2021

adamiturabi commented Apr 28, 2021 •

edited

Loading

khaledhosny commented May 2, 2021

adamiturabi commented May 3, 2021 •

edited

Loading

khaledhosny commented May 12, 2021

adamiturabi commented May 14, 2021 •

edited

Loading

roozbehp commented May 15, 2021

adamiturabi commented May 15, 2021 •

edited

Loading

U+0670 superscript alef should be written with horizontal spacing when input after fathah #217

U+0670 superscript alef should be written with horizontal spacing when input after fathah #217

Comments

adamiturabi commented Apr 17, 2021

khaledhosny commented Apr 17, 2021

adamiturabi commented Apr 17, 2021

khaledhosny commented Apr 17, 2021

adamiturabi commented Apr 18, 2021 • edited Loading

moyogo commented Apr 18, 2021

khaledhosny commented Apr 20, 2021

adamiturabi commented Apr 21, 2021

adamiturabi commented Apr 28, 2021 • edited Loading

Dagger alef:

Floating hamza

khaledhosny commented May 2, 2021

adamiturabi commented May 3, 2021 • edited Loading

khaledhosny commented May 12, 2021

adamiturabi commented May 14, 2021 • edited Loading

roozbehp commented May 15, 2021

adamiturabi commented May 15, 2021 • edited Loading

adamiturabi commented Apr 18, 2021 •

edited

Loading

adamiturabi commented Apr 28, 2021 •

edited

Loading

adamiturabi commented May 3, 2021 •

edited

Loading

adamiturabi commented May 14, 2021 •

edited

Loading

adamiturabi commented May 15, 2021 •

edited

Loading