Kannada Ra-Virama-ZWJ gives different results from Windows 10 #435

devosb · 2017-03-04T01:45:21Z

The character sequence Ra-Virama-ZWJ in Kannada script gives a different rendering from Windows 10. Specifically, with HarfBuzz the virama is visibly rendered, just like if ZWJ was replaced by ZWNJ. In Windows 10, the virama causes a sub form to appear.

The file harfbuzz.png was generated from hb-view on Ubuntu Xenial, using the latest git sources. The file windows10.png was generated on Windows 10 using Notepad. The font in both cases was Noto Sans Kannada (Regular), version 1.04. The source text is the file renderdiff.txt. The word in the second line of the example comes from the Kannada Wikipedia wordlist that was used to test HarfBuzz.

I also tested with the fonts

Noto Sans Kannada
Noto Sans Kannada UI
Noto Serif Kannada
Nirmala UI (Windows 10 only)
Tunga (Windows 10 and 7 only)
Lohit Kannada

with various combinations of Notepad, LibreOffice 5.1, LibreOffice 5.2, Windows 10, Windows 7, Ubuntu Xenial with HarfBuzz as packaged by Ubuntu and also compiled from source. The difference seemed to be was HarfBuzz doing the OpenType rendering, or was Microsoft DirectWrite (I don't think Uniscribe or USE would have been involved) doing the rendering.

renderdiff.txt

devosb · 2017-03-06T15:27:43Z

I apologize, I did not see #341 before posting. I did test in Word 2016, on Windows 10, and Word gives the same result as Notepad on Windows 10.

behdad · 2017-07-14T13:46:04Z

According to discussion in #341 this sequence is undefined in Kannada, hence not marking as bug, but enhancement request for matching what Windows does.

behdad · 2017-10-03T13:39:09Z

But it looks to us like Uniscribe is reordering the broken sequence (Ra,Virama,ZWJ) into the correct one (Ra,ZWJ,Virama) before processing lookups in the font. I've asked @PeterCon to confirm, before we implement.

behdad · 2017-10-10T18:10:27Z

Here's what Peter Constable wrote to me:

For Kannada character sequences < RA, VIRAMA, consonant >, there’s potential ambiguity as to whether the display should be [ gRA, gConsonant.subjoined ] or [ gConsonant, gReph ]. On pages 499-500 of Unicode 10.0 (section 12.8 — http://www.unicode.org/versions/Unicode10.0.0/ch12.pdf), it specifies that a sequence of < RA, ZWJ, VIRAMA, consonant > be used to represent text that needs to be rendered [gRA, gConsonant.subjoined ].

However, things were not always specified that way. If you look back in Unicode 4.0, section 9.8 (http://www.unicode.org/versions/Unicode4.0.0/ch09.pdf), it was actually specified the other way around: the sequence < RA, VIRAMA, ZWJ, Consonant > was specified to represent [ gRA, gConsonant.subjoined ]. This was changed in Unicode 5 after it came to light that there were inconsistent specifications for different Indic scripts.

This all arose as a result of various things involving Indic scripts all happening in 2004, one of which was publication of a draft for a Sri Lanka standard (http://www.unicode.org/cgi-bin/GetMatchingDocs.pl?L2/04-131). This came up for discussion at the UTC meeting in June 2004 (see http://www.unicode.org/cgi-bin/GetL2Ref.pl?99-C37). At the time, I was working on updates to the Indic shaping engine in Uniscribe, and I was aware of some of the inconsistencies (e.g., the draft for the Sri Lanka standard specifying the opposite of what Unicode had specified for Kannada RA and for Bangla ya-phalaa). So, I was given a UTC action item to prepare a doc regarding the general issue for Indic scripts. That eventually led to issuing Public Review Issue #37 (http://www.unicode.org/review/pr-37.pdf), which proposed having a consistent specification of ZWJ sequences across Indic scripts. (See the last page for specific changes for Kannada and Bangla.) That proposal was adopted at the UTC meeting in August 2004 (http://www.unicode.org/cgi-bin/GetL2Ref.pl?100-C22).

So, Kannada sequences < RA, VIRAMA, ZWJ, Consonant > aren’t currently recommended, but earlier on they were.

To accommodate previously-existing docs, Uniscribe does have a special-case behaviour:

             // For compatibility with legacy useage in Kannada,

             // Ra+h+ZWJ must behave like Ra+ZWJ+h...

HarfBuzz differed from Microsoft when displaying Ra,H,ZWJ sequences. Details at harfbuzz/harfbuzz#435

behdad mentioned this issue Jul 14, 2017

Kannada text issue with ZERO WIDTH JOINER #341

Closed

behdad self-assigned this Jul 14, 2017

behdad added the enhancement label Jul 14, 2017

behdad added the Android label Jul 14, 2017

behdad closed this as completed in fa48ccb Oct 12, 2017

devosb added a commit to nlci/knda-font-badami that referenced this issue Oct 16, 2017

Test data for a HarfBuzz bug

023340b

HarfBuzz differed from Microsoft when displaying Ra,H,ZWJ sequences. Details at harfbuzz/harfbuzz#435

adrianwong mentioned this issue May 24, 2019

[Indic] Kannada "Ra, Halant, ZWJ" legacy behaviour n8willis/opentype-shaping-documents#61

Closed

dscorbett mentioned this issue Dec 5, 2019

Kannada <RA ZWNJ VIRAMA CONSONANT> wrong form #2018

Open

MayuraVerma mentioned this issue Dec 6, 2019

Kannada <VIRAMA CONSONANT> wrong form #2064

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Kannada Ra-Virama-ZWJ gives different results from Windows 10 #435

Kannada Ra-Virama-ZWJ gives different results from Windows 10 #435

devosb commented Mar 4, 2017

devosb commented Mar 6, 2017

behdad commented Jul 14, 2017

behdad commented Oct 3, 2017

behdad commented Oct 10, 2017

Kannada Ra-Virama-ZWJ gives different results from Windows 10 #435

Kannada Ra-Virama-ZWJ gives different results from Windows 10 #435

Comments

devosb commented Mar 4, 2017

devosb commented Mar 6, 2017

behdad commented Jul 14, 2017

behdad commented Oct 3, 2017

behdad commented Oct 10, 2017