Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using a nukta with Tamil script does not render well #738

Open
devosb opened this issue Mar 3, 2021 · 8 comments
Open

Using a nukta with Tamil script does not render well #738

devosb opened this issue Mar 3, 2021 · 8 comments
Assignees

Comments

@devosb
Copy link

devosb commented Mar 3, 2021

For some languages written in the Tamil script, a nukta is needed. Starting with L2/15-256 a nukta was added to the Grantha block at U+1133B COMBINING BINDU BELOW, right next to an existing nukta U+1133C GRANTHA SIGN NUKTA.

I am trying to support at least one of these nuktas in the font Thiruvalluvar (latest builds). Notepad (and Word, both using DirectWrite for shaping) show

notepad

while Chrome (and other applications using HarfBuzz, including Edge) show

chrome

Both screen shots are from Windows 10 (updated yesterday)

Edition: Windows 10 Pro
Version: 20H2
Installed on: ‎2020-‎09-‎01
OS build: 19042.844
Experience: Windows Feature Experience Pack 120.2212.551.0

The text is the same in both images. In each line, the text before and after the dash is the same, except U+1133B is used before the dash, U+1133C comes after the dash for the nukta. Code points are (before the dash) for the first line

0b95 0bc6 1133b
0b95 1133b 0bc6
0b95 1133b 0bc6 1133b

and the second line (also before the dash)

0ba4 1133b 0bcd 0b95 1133b 0bbe 0b95 0bbf 1133b 0b9e 0bc1 1133b 0b95 0bc1 1133b 0bcd 0b95 0bc1 1133b

This data comes from a HTML file, more test data is also in the repo.

The shaping with HarfBuzz is perfect. The shaping with DirectWrite has dotted circles, and 1133C does not position, but 1133B seems to do so. Using 1133B produces fewer dotted circles.

I tried creating a ligature (ka_umatra_nukta_virama) as a test case (you can see my efforts commented out) and various substitution features (ccmp, nukt, akhn, calt, rclt, haln) but nothing seemed to work. The ligature for testing was not checked into the repo as it did not work.

Is there something I am missing with the font, or is there an issue with DirectWrite? The OpenType code is at least somewhat correct, as HarffBuzz displays what I want it to display. Both nuktas are listed in Script_Extensions as Grantha,Tamil but have different Script properties (1133B is Inherited, 1133C is Grantha). I wonder if this Script classification accounts for the difference in behaviour between the two nuktas.

The font also has Graphite tables, but they would not have been used in either test case. The two different codepoints do look different, the font supports multiple character variants. This issue is focused on the reordering, substitution, and positioning that DirectWrite is doing (or not doing). Once those items are resolved, I can add the glyph variants to whatever codepoint works best.

@devosb
Copy link
Author

devosb commented Apr 14, 2021

Adobe InDesign has similar but not exactly the same issue.

@devosb
Copy link
Author

devosb commented Aug 17, 2021

In trying different codepoints to see what might shape correctly, additional codepoints were added. So now there are three different codepoints for the nukta.

  1. U+1133C GRANTHA SIGN NUKTA
  2. U+1133B COMBINING BINDU BELOW
  3. U+0323 COMBINING DOT BELOW

Some of that languages that use this font also need a dot above the text. Visually, the VIRAMA looks like what is desired. However, DirectWrite (and maybe HarfBuzz) do not shape conjuncts well if you have CONSONANT, MATRA, VIRAMA. This is not too surprising, since the VIRAMA is a vowel killer, and why would you have a vowel (the MATRA) and a VIRAMA? In this case, the dot above is another sound. So two other codepoints were added to the font to represent the dot above the text. Thus there are also three options for the dot above

  1. U+0BCD TAMIL SIGN VIRAMA
  2. U+0B82 TAMIL SIGN ANUSVARA
  3. U+0307 COMBINING DOT ABOVE

It seems like the best (but not perfect shaping) is obtained with U+1133B and U+0307.

@devosb
Copy link
Author

devosb commented Aug 17, 2021

While it would be nice to have the languages listed in the original Unicode proposal (which @PeterCon can help with) be part of the OT spec the bigger issue is getting the shaping to work.

@PeterCon
Copy link
Collaborator

PeterCon commented Oct 7, 2021

Thanks for this product feedback. It would be good for you to also report this using the Windows Feedback app in Windows 10.

@devosb
Copy link
Author

devosb commented Oct 8, 2021

I did use the Feedback app to report this issue about 5 months ago, it was upvoted about 3 months ago.

@devosb
Copy link
Author

devosb commented Jun 2, 2022

The issue does not seem to be fixed with testing today with Windows 11 Insider. Details of Windows are

Edition: Windows 11 Pro Insider Preview
Version: 22H2
Installed on: ‎2022-‎06-‎02
OS build: 25131.1000
Experience: Windows Feature Experience Pack 1000.25131.1000.0

@devosb
Copy link
Author

devosb commented Jul 6, 2022

Still does not work today.

Edition: Windows 11 Pro Insider Preview
Version: 22H2
Installed on: ‎2022-‎07-‎06
OS build: 25151.1010
Experience: Windows Feature Experience Pack 1000.25151.1010.0

@PeterCon
Copy link
Collaborator

PeterCon commented Jul 7, 2022

Thanks, @devosb . An issue confronting us is that, currently, Grantha is shaped in the Universal Shaping Engine while Tamil is shaped in the older Indic engine.

@miloush miloush self-assigned this Jul 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants