Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Overlapping Hebrew vowel signs #12

Open
dscorbett opened this issue Apr 15, 2021 · 7 comments
Open

Overlapping Hebrew vowel signs #12

dscorbett opened this issue Apr 15, 2021 · 7 comments

Comments

@dscorbett
Copy link

Fonts

NotoRashiHebrew-Regular.otf
NotoSansHebrew-Regular.otf
NotoSerifHebrew-Regular.otf

Where the fonts came from, and when

Site: https://github.com/googlefonts/noto-fonts/blob/81b283b55b3e5b80ec0e410d4b246d3573e1c7de/unhinted/otf/NotoRashiHebrew/NotoRashiHebrew-Regular.otf
Site: https://github.com/googlefonts/noto-fonts/blob/81b283b55b3e5b80ec0e410d4b246d3573e1c7de/unhinted/otf/NotoSansHebrew/NotoSansHebrew-Regular.otf
Site: https://github.com/googlefonts/noto-fonts/blob/81b283b55b3e5b80ec0e410d4b246d3573e1c7de/unhinted/otf/NotoSerifHebrew/NotoSerifHebrew-Regular.otf
Date: 2021-04-15

Font versions

Noto Rashi Hebrew: Version 1.002
Noto Sans Hebrew: Version 3.000
Noto Serif Hebrew: Version 2.000

Issue

Various inflections of “Jerusalem” in the Tanakh include two vowel signs side by side after the lamed, but in Noto, the vowel signs overlap. See Firefox bug 662055 or ask @bdenckla for more information.

Character data

לְַמלְָמלִַםלִָם
U+05DC HEBREW LETTER LAMED
U+05B0 HEBREW POINT SHEVA
U+05B7 HEBREW POINT PATAH
U+05DE HEBREW LETTER MEM
U+05DC HEBREW LETTER LAMED
U+05B0 HEBREW POINT SHEVA
U+05B8 HEBREW POINT QAMATS
U+05DE HEBREW LETTER MEM
U+05DC HEBREW LETTER LAMED
U+05B4 HEBREW POINT HIRIQ
U+05B7 HEBREW POINT PATAH
U+05DD HEBREW LETTER FINAL MEM
U+05DC HEBREW LETTER LAMED
U+05B4 HEBREW POINT HIRIQ
U+05B8 HEBREW POINT QAMATS
U+05DD HEBREW LETTER FINAL MEM

Screenshots

לְַמלְָמלִַםלִָם
לְַמלְָמלִַםלִָם
לְַמלְָמלִַםלִָם

@bdenckla
Copy link

bdenckla commented Apr 15, 2021

Thanks for at-including me on this, @dscorbett. Whenever I see a bug I always wonder whether and how it could have been caught automatically. Are the Noto fonts tested by fontbakery? If so, these bugs could be "formally" caught by passing the handy test string you made to a collision check done by @simoncozens' collidoscope. Simon and I are both using collidoscope under fontbakery on our own projects but I don't think there is any "official" use of collidoscope in fontbakery yet.

On the substance of the bugs themselves, I'm not sure how important these bugs are, because I'm not sure how common it is to expect these special cases to be handled in fonts for non-Biblical Hebrew. (I'm guessing the Noto fonts only strive to support non-Biblical Hebrew.)

These special cases do not include trope (aka cantillation or accent) marks. Thus one could say they are not Biblical Hebrew, according to one definition of Biblical Hebrew. These special cases contain only vowel marks.

But the way that these special cases use the vowel marks is specific to a Biblical situation known as ketiv/qere. A ketiv/qere situation occurs when what is traditionally read aloud for a word (its qere) is incompatible with the consonants that are traditionally written for that word (its ketiv). Ketiv/qere situations are notated in a variety of ways, but in the cases we're concerned with here, the way they are notated is by putting the vowel marks of the (implied) qere on the consonants of the ketiv. So that's how you end up with the "illegal" situation of two vowels on a single consonant (lamed): in the (implied) qere, there's a yod consonant after the lamed that renders the situation at least sensible if not "legal."

While fonts for non-Biblical Hebrew should probably have some degree of support for vowel marks (though even that is debatable), it may be unreasonable to expect such fonts to handle this weird, Biblically-specific use of vowel marks.

@marekjez86
Copy link

@dscorbett : thank you -- keep these coming :-) no matter if they are non-Biblical Hebrew or Biblical Hebrew ... Sooner or later I'd like Noto to support Biblical Hebrew
@bdenckla : we are using fontbakery when testing the newly built fonts but not using @simoncozens' collidoscope. Thanks for pointing this out. Good to know. Sooner or later we might want to start using it

@bdenckla
Copy link

bdenckla commented Apr 26, 2021

Note that the example string is given in normalized (e.g. NFD) order, which is an order not corresponding to the desired RTL visual mark (vowel) order. Some shapers (notably HarfBuzz) will transiently re-order the code points of this string to the desired order before the font ever sees it. But other shapers (notably MS DirectWrite (descended from MS UniScribe)) will leave the string alone.

Thus, what behavior one should expect from the example string depends on the shaper in use.

Thus, this bug could be improved in the following ways:

  1. Specify what shaper (and while you're at it, OS & application) was used to generate the actual outputs (image).
  2. Specify what the expected behavior looks like. This can be done using a more Biblically-capable font like Taamey Frank CLM, using the same shaper, OS & application.
  3. Possibly show actual & expected under a differently-behaving shaper/OS/application "triple." For example I think you will find that both actual and expected look bad under DirectWrite/Windows/MS Word, where again "expected" is defined by a font like Taamey Frank CLM.
  4. Possibly include outputs (images) for a version of the example string where the marks are in the desired RTL visual mark (vowel) order. This version of the example string will behave the same as the original example string on any normalizing "triple." Examples of normalizing triples would be any triple where HarfBuzz is the shaper or where the application is a web browser. So it would really only be interesting to see the output on a "triple" like DirectWrite/Windows/MS Word.
  5. Possibly include outputs (images) for a version of the example string where the marks are in the desired RTL visual mark (vowel) order and this order uses CGJ to make the order robust to normalization. I.e. IMO it is important that Hebrew fonts, if they claim to handle these cases, also handle them in robust (CGJ-containing) versions of these strings.

@bdenckla
Copy link

bdenckla commented Apr 26, 2021

Making some of my suggestions above concrete, here is the original example string and 2 other versions of it, rendered in Taamey Frank CLM under the "triple" of DirectWrite/Windows/MS Word:
image
The two other versions of the example string shown above are:

  1. Code points in desired RTL order, i.e. "extra vowel" (sheva or _hiriq) after "ah vowel" (patach or qamats).
  2. Code points in desired RTL order and with CGJ before the "extra vowel," making that order robust to normalization.

Note that beyond the basic issues of visual order and collision, there are more subtle micro-positioning issues at play here, not all of which are well-handled by Taamey Frank CLM (TFC). In particular, TFC lacks rules to handle the sheva cases the same as the hiriq cases, and lacks rules to handle the CGJ cases the same as the non-CGJ cases.

@moshfrid
Copy link

moshfrid commented Jun 4, 2022

There is another overlapping issue when it comes to the letter kuf sofit with a vowel underneath it (i.e. ךָּ, which is decently common), but only in the NotoSerifHebrew Version.
image

@bdenckla
Copy link

bdenckla commented Jun 8, 2022

Just to clarify what @moshfrid reports above (assuming I understand it), it is referring to the code point sequence:

  • ‎05DA HEBREW LETTER FINAL KAF
  • ‎05BC HEBREW POINT DAGESH OR MAPIQ
  • ‎05B8 HEBREW POINT QAMATS

Or, hopefully equivalently, it refers to the code point sequence with the two "HEBREW POINT" elements reversed (this would be the normalized form (order), which most people find unintuitive but apparently not the authors of the Unicode standard, unless the combining classes they chose were not what they intended).

And, to further clarify, the appearance is expected to be something like this:

image

(Tastes may differ as to the details, for instance tastes may differ as to whether the qamats should float upward a bit compared to this example, but of course all agree that collision (overlap) between the qamats and the dagesh should not occur!)

This construction appears about a hundred times in the Hebrew Bible; I don't have the exact number. An early example is the word אַרְאֶֽךָּ׃, shown as an image below:

image

This example is from בראשית יב,א (Genesis 12:1).

@simoncozens simoncozens transferred this issue from notofonts/noto-fonts Jun 20, 2022
@Elendil03
Copy link

Elendil03 commented Dec 30, 2022

The thing is: Noto basically already supports Biblical Hebrew. Cantillation marks, unique letters such as the nun hafuḵa (U+5C6) have practically no use in regular Hebrew.
It'd be a shame to pass up this opportunity, especially since there is virtually no free Hebrew serif font that supports Biblical Hebrew and has multiple weights (which would be needed to highlight individual words or phrases in a Biblical text, for example). Of course, there are a couple of super-specific exceptions in Biblical Hebrew (https://www.win.tue.nl/~aeb/natlang/hebrew/hebrew_bible.html), but none of these are nearly as common as the issue a hand.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants