-
Notifications
You must be signed in to change notification settings - Fork 624
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Kannada Ra-Virama-ZWJ gives different results from Windows 10 #435
Comments
I apologize, I did not see #341 before posting. I did test in Word 2016, on Windows 10, and Word gives the same result as Notepad on Windows 10. |
According to discussion in #341 this sequence is undefined in Kannada, hence not marking as bug, but enhancement request for matching what Windows does. |
But it looks to us like Uniscribe is reordering the broken sequence (Ra,Virama,ZWJ) into the correct one (Ra,ZWJ,Virama) before processing lookups in the font. I've asked @PeterCon to confirm, before we implement. |
Here's what Peter Constable wrote to me: For Kannada character sequences < RA, VIRAMA, consonant >, there’s potential ambiguity as to whether the display should be [ gRA, gConsonant.subjoined ] or [ gConsonant, gReph ]. On pages 499-500 of Unicode 10.0 (section 12.8 — http://www.unicode.org/versions/Unicode10.0.0/ch12.pdf), it specifies that a sequence of < RA, ZWJ, VIRAMA, consonant > be used to represent text that needs to be rendered [gRA, gConsonant.subjoined ]. However, things were not always specified that way. If you look back in Unicode 4.0, section 9.8 (http://www.unicode.org/versions/Unicode4.0.0/ch09.pdf), it was actually specified the other way around: the sequence < RA, VIRAMA, ZWJ, Consonant > was specified to represent [ gRA, gConsonant.subjoined ]. This was changed in Unicode 5 after it came to light that there were inconsistent specifications for different Indic scripts. This all arose as a result of various things involving Indic scripts all happening in 2004, one of which was publication of a draft for a Sri Lanka standard (http://www.unicode.org/cgi-bin/GetMatchingDocs.pl?L2/04-131). This came up for discussion at the UTC meeting in June 2004 (see http://www.unicode.org/cgi-bin/GetL2Ref.pl?99-C37). At the time, I was working on updates to the Indic shaping engine in Uniscribe, and I was aware of some of the inconsistencies (e.g., the draft for the Sri Lanka standard specifying the opposite of what Unicode had specified for Kannada RA and for Bangla ya-phalaa). So, I was given a UTC action item to prepare a doc regarding the general issue for Indic scripts. That eventually led to issuing Public Review Issue #37 (http://www.unicode.org/review/pr-37.pdf), which proposed having a consistent specification of ZWJ sequences across Indic scripts. (See the last page for specific changes for Kannada and Bangla.) That proposal was adopted at the UTC meeting in August 2004 (http://www.unicode.org/cgi-bin/GetL2Ref.pl?100-C22). So, Kannada sequences < RA, VIRAMA, ZWJ, Consonant > aren’t currently recommended, but earlier on they were. To accommodate previously-existing docs, Uniscribe does have a special-case behaviour:
|
HarfBuzz differed from Microsoft when displaying Ra,H,ZWJ sequences. Details at harfbuzz/harfbuzz#435
The character sequence Ra-Virama-ZWJ in Kannada script gives a different rendering from Windows 10. Specifically, with HarfBuzz the virama is visibly rendered, just like if ZWJ was replaced by ZWNJ. In Windows 10, the virama causes a sub form to appear.
The file harfbuzz.png was generated from hb-view on Ubuntu Xenial, using the latest git sources. The file windows10.png was generated on Windows 10 using Notepad. The font in both cases was Noto Sans Kannada (Regular), version 1.04. The source text is the file renderdiff.txt. The word in the second line of the example comes from the Kannada Wikipedia wordlist that was used to test HarfBuzz.
I also tested with the fonts
with various combinations of Notepad, LibreOffice 5.1, LibreOffice 5.2, Windows 10, Windows 7, Ubuntu Xenial with HarfBuzz as packaged by Ubuntu and also compiled from source. The difference seemed to be was HarfBuzz doing the OpenType rendering, or was Microsoft DirectWrite (I don't think Uniscribe or USE would have been involved) doing the rendering.
renderdiff.txt
The text was updated successfully, but these errors were encountered: