Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shorthand format controls should not be ignored #503

Closed
dscorbett opened this issue Jul 2, 2017 · 7 comments
Closed

Shorthand format controls should not be ignored #503

dscorbett opened this issue Jul 2, 2017 · 7 comments

Comments

@dscorbett
Copy link
Collaborator

Duployan is a very complicated script and OpenType does not support it. Therefore, Duployan fonts processed by HarfBuzz will have simple non-interacting glyphs. The shorthand format controls (U+1BCA0 through U+1BCA3) are required for proper spelling, so if a font has glyphs for them, HarfBuzz should display them. Dotted square fallback glyphs are better than nothing.

@behdad
Copy link
Member

behdad commented Jul 10, 2017

Ok, doing a short research, I think I agree with you. But prefer that Unicode documented them that way, and ideally, provided a sample representative glyph.

Are these allowed everywhere? I suppose not. Do they make a semantic difference? I suppose yes. It sounds to me like these should not really be Default_Ignorable to begin with. They are probably like ZWNJ, badly encoded.

For example, some Indic scripts have such representative glyph shapes for lone virama whereas in actual writing such a shape does not exist.

cc @roozbehp

@roozbehp
Copy link
Collaborator

Are these allowed everywhere? I suppose not. Do they make a semantic difference? I suppose yes. It sounds to me like these should not really be Default_Ignorable to begin with. They are probably like ZWNJ, badly encoded.

I don't understand that logic. Default_Ignorable is independent of where something is allowed and semantic differences. It means "if you don't know what to do with it, don't show a tofu".

That doesn't mean I agree or disagree with the property assignment at the moment, but I know that Unicode folks discuss Default_Ignorable properties in detail for new characters.

Anyway, since the origjnal reporter is basically talking about fallback mode and not properly-shaped text, I suggest we (continue to?) follow Unicode at the moment. This issue should be litigated at UTC level.

@dscorbett
Copy link
Collaborator Author

I agree that default ignorable code points should not produce tofu; I am only requesting that these controls be displayed if the font has glyphs for them.

@behdad
Copy link
Member

behdad commented Jul 16, 2017

Are these allowed everywhere? I suppose not. Do they make a semantic difference? I suppose yes. It sounds to me like these should not really be Default_Ignorable to begin with. They are probably like ZWNJ, badly encoded.

I don't understand that logic. Default_Ignorable is independent of where something is allowed and semantic differences. It means "if you don't know what to do with it, don't show a tofu".

That's not my recollection of what Default_Ignorable means. I believe it means: if you don't know what to do with this, ignore it. Ie. NOT just for display, but for all other processes, including search. Hence my question re making semantic difference.

That doesn't mean I agree or disagree with the property assignment at the moment, but I know that Unicode folks discuss Default_Ignorable properties in detail for new characters.

Anyway, since the origjnal reporter is basically talking about fallback mode and not properly-shaped text, I suggest we (continue to?) follow Unicode at the moment. This issue should be litigated at UTC level.

Agreed. @dscorbett can you submit feedback to UTC, requesting clarification regarding rendering of these if the font does not have smarts? Possibly also nominal shapes for these. You can do that here:

http://www.unicode.org/reporting.html

@behdad
Copy link
Member

behdad commented Jul 16, 2017

I agree that default ignorable code points should not produce tofu; I am only requesting that these controls be displayed if the font has glyphs for them.

We currently don't have that kind of behavior for anything. It's either, show, or hide.

@dscorbett
Copy link
Collaborator Author

I have submitted feedback to the UTC.

@roozbehp
Copy link
Collaborator

That's not my recollection of what Default_Ignorable means. I believe it means: if you don't know what to do with this, ignore it. Ie. NOT just for display, but for all other processes, including search. Hence my question re making semantic difference.

From http://www.unicode.org/versions/Unicode10.0.0/ch05.pdf:

Default Ignorable Code Point. The list of characters which should be ignored for display in fallback rendering is given by a character property: Default_Ignorable_Code_Point (DI). Those characters include almost all format characters, all variation selectors, and a few other exceptional characters, such as Hangul fillers. The exact list is defined in DerivedCoreProperties.txt in the Unicode Character Database.

Also:

Default Ignorable Code Points. Normally, characters outside the repertoire of supported characters for an implementation would be graphical characters displayed with a fallback glyph, such as a black box. However, certain special-use characters, such as format controls or variation selectors, do not have visible glyphs of their own, although they may have an effect on the display of other characters. When such a special-use character is not supported by an implementation, it should not be displayed with a visible fallback glyph, but instead simply not be rendered at all. The list of such characters which should not be rendered with a fallback glyph is defined by the Default_Ignorable_Code_Point property in the Unicode Character Database. For more information, see Section 5.21, Ignoring Characters in Processing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants