Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why is there a requirement that 'cvXX' feature has the same number of variants for all glyphs? #778

Closed
khaledhosny opened this issue May 28, 2021 · 18 comments

Comments

@khaledhosny
Copy link

khaledhosny commented May 28, 2021

Quoting the spec:

Within each 'cvXX' feature, the number of variants should be identical for all glyphs.

That requirement seems to be peculiar, what is the rationale behind this and is there any implementation that actually enforce this requirement? I have built several fonts with cvXX features and hadn’t paid attention to this requirement, do I need to revise my fonts?


Document Details

Do not edit this section. It is required for docs.microsoft.com ➟ GitHub issue linking.

@khaledhosny khaledhosny changed the title Why there is a requirement that 'cvXX' feature has the same number of variants for all glyphs Why is there a requirement that 'cvXX' feature has the same number of variants for all glyphs? May 28, 2021
@psb1558
Copy link

psb1558 commented Jul 25, 2021

I am very interested in this topic and hope it won't be forgotten. I suppose the purpose of this requirement is to encourage font makers to design rationally, so there should be matching variants on all design axes. An example posted in a discussion of my JuniusX font was the way Charis SIL handles v with hook, where there are matching variants for lowercase, uppercase, modifier, and small cap on cv62:

sub uni028B from [uni028B.StraightLft uni028B.StraightLftHighHook];
sub uni1DB9 from [uni1DB9.StraightLft uni1DB9.StraightLftHighHook];
sub uni01B2 from [uni01B2.StraightLft uni01B2.StraightLftHighHook];
sub uni028B.sc from [uni028B.StraightLft.sc uni028B.StraightLftHighHook.sc];

But I work in the Middle Ages, where the material doesn’t always cooperate. For example, the Medieval Unicode Font Initiative (MUFI) specification has upper- and lowercase variants of “Insular A,” but not of “Uncial A,” “Open-Top A,” etc. A designer could supply the missing variants so as to make the cvNN features correct, but that would be a little foolish, since the reason they don’t appear in the MUFI spec is that they don’t occur in medieval manuscripts, and so they would (probably) never get used.

I tied myself in knots trying to come up with a way to conform to the OpenType spec while dealing with the MUFI situation, and ended up pleasing no one—so for now JuniusX doesn’t conform to the OpenType spec. I’d like to know if the font will break in some future software environment.

@tiroj
Copy link

tiroj commented Jul 25, 2021

The requirement is that for each input glyph in an individual cvxx feature, there should be the same number of output variant glyphs. The rationale is two-fold: a) the features are intended for variants of individual characters, not for sets of characters (for which the ssxx features would be appropriate), so are expected to be used for e.g. uppercase A and its diacritic forms, and not for both upper- and lowercase characters unless they happen to follow the same pattern of variants; and b) having the same number of variants for each input glyph in the feature allows for the feature to be applied across a body of text with the same enumerated variant producing predictable results on all glyphs affected by the feature.

If you find yourself dealing with characters that have different numbers of variants, that is an indication that they do not belong in the same Character Variant feature.

@tiroj
Copy link

tiroj commented Jul 25, 2021

@khaledhosny Can you provide examples of your use of cvxx features with different numbers of variants for the input glyphs in a single cvxx feature? I’m having trouble understanding what sort of things would go into such a feature.

It may be that there is a gap between cvxx and ssxx—ignoring the practical issue of the limited number of registered ssxx features—in terms of design. The cvxx features are defined in terms of input—‘What character-related glyphs share one or more variant forms?’—, and the ssxx features are defined in terms of output—‘What set of non-character-related glyphs share a related variant form?’ There isn’t an obvious place to answer ‘What set of non-character-related glyphs share one or more variant forms?’

@psb1558
Copy link

psb1558 commented Jul 25, 2021

Thanks, John, this is very helpful. I was taking the most permissive possible understanding of this passage:

In practice, if a variation applies to a character in a bicameral script, then the casing-pair character may have the same variation. Also, Unicode includes pre-composed characters for certain base + mark combinations, hence a single abstract character may be incorporated into a number of Unicode characters. Therefore, a variation for a particular abstract character may be applicable to several related Unicode characters. The Character Variant features can be used for sets of related characters in these cases.

But it's going to be more complicated in some cases, and I have a concern about how all this is going to look to a user. For example, here are all the variants for Aa in the MUFI spec:
image
Now it appears that I should have three CVXX features for this system: one for case 1, where you have both an upper- and a lowercase variant, one for cases 2-5, where you have only lowercase variants, and one for cases 6-7, where you have only uppercase variants. It's going to look complicated to the end user.

But to be fair, what I've got now is also complicated, maybe irrational, and I care a good bit about your second rationale, allowing "for the feature to be applied across a body of text with the same enumerated variant producing predictable results on all glyphs affected by the feature."

There doesn't appear to be a solution lacking a downside to the problems presented by (e.g.) MUFI.

@tiroj
Copy link

tiroj commented Jul 25, 2021

I think you only need two cvxx features: one for uppercase A with three variants and one for lowercase a with five variants. Yes, one of the variants in each case happens to be of similar form, but this doesn’t mean you must put them in the same feature, only that you could, all else being equal. In this case, I would say that the unequivalent variant sets overrides the option to put upper- and lowercase in the same cvxx feature.

@psb1558
Copy link

psb1558 commented Jul 25, 2021

Okay, that makes sense. So the A system might look like this:
image
By grouping variants that are part of case-pairs at the beginning of each sequence one can coordinate their indexes and introduce a little more rationality to the system.

@PeterConstable
Copy link

is there any implementation that actually enforce this requirement?

It's a recommendation, not a requirement. As such, no implementation should ever be enforcing this.

The feature was recorded as having been registered by Microsoft, but it was originally requested by SIL. @tiroj has captured the gist of the reasoning behind the recommendation: When SIL was creating fonts with broad Latin coverage (Gentium, Doulos, etc.), they had glyph variants for several characters, including cases in which, say, "a" (my hypothetical) plus all of the precomposed accented forms had a systematic set of variants; or "b" and it's barred forms; etc. For sake of discussion, let me refer to these as shared-base sets. And they knew that, for some language's orthography some mix of these variants would be preferred, but they couldn't predict what languages might need what mix. So they wanted a set of features that could be defined on a per-shared-base-set basis, with the feature(s) applied across entire documents. That way, e.g., "a" and any of its precomposed accented forms would get the appropriate variants throughout the document.

If anyone thinks the description of the features can be improved, please suggest wording.

@psb1558
Copy link

psb1558 commented Jul 26, 2021

Thanks, @PeterConstable, for the explanation, and for the interesting background re: SIL.

I don't know if there's a widely shared understanding of words like "should" in technical specifications. Several of us have misunderstood the sentence quoted by @khaledhosny above as stating a requirement, not a recommendation. Perhaps there would be less misunderstanding if it read, "It is recommended that the number of variants be identical for all glyphs within each 'cvXX' feature."

@tiroj
Copy link

tiroj commented Jul 26, 2021

@PeterConstable Is it defined anywhere what the behaviour should be if the number of enumerated variants in a cvxx feature is not the same for the input glyphs in the lookup? So, for example, if these lower- and uppercase variants were included in a single cvxx feature

what would be the correct output if a block of text containing both |a| and |A| characters were selected and the character variant feature applied with the enumerated variant 4 or 5? Which form of the uppercase |A| should be displayed in that situation, the default form because there is no corresponding enumerated variant? or the enumerated variant 3 because that is the highest, closest enumerated variant for that character?

@PeterConstable
Copy link

I don't know if there's a widely shared understanding of words like "should" in technical specifications... Perhaps there would be less misunderstanding if it read, "It is recommended..."

In general, I'd say there's a widely shared understanding: "should" is used for recommendations; requirements are typically stated with words like "must" or "shall". The OT spec consistently (AFAIK) uses "must" and "should" in these ways. Saying "it is recommended" for every recommendation would be verbose.

@PeterConstable
Copy link

Is it defined anywhere what the behaviour should be if the number of enumerated variants in a cvxx feature is not the same for the input glyphs in the lookup?

If it's not stated in the feature description, or logically implied, then it's not defined anywhere.

IIRC, the idea SIL was after was that entire documents could be marked up to indicate particular alternates from particular 'cvXX' features. E.g., for 'cv01', use the fourth alternate throughout. Two lines of reasoning follow:

  • That would really only work if, in 'cv01', all of the starting glyphs have the same number of alternates and the alternates are consistently arranged—e.g., the fourth alternates of all glyphs are the same kind of alternate and should always co-occur.

  • The 'cvXX' feature description recommends use of type 3 lookups (alternate substitution). If for one starting glyph there are four or more alternates but for another starting glyph there are only three or less, what should be application do for the latter when the fourth alternate is being requested? The GSUB spec doesn't define what behaviour is expected. I suspect that's because it was assumed no app would ever request the nth alternate when n is past the last alternate. (I don't know whether it was assumed that type 3 lookups would always be accompanied by a glyph palette UI.)

@psb1558
Copy link

psb1558 commented Jul 26, 2021

In general, I'd say there's a widely shared understanding: "should" is used for recommendations

In this case no revision is necessary, if a recommendation is your intention. But you seem also to be saying that there is no defined behavior when a font doesn't follow the recommendation, so I wonder if it should be "must" or "shall."

In the cases @tiroj is talking about, Harfbuzz displays the default character instead of the last in the sequence, but I don't know about the other engines.

@PeterConstable
Copy link

But you seem also to be saying that there is no defined behavior when a font doesn't follow the recommendation, so I wonder if it should be "must" or "shall."

If a font was created with a different number of alternates within a given 'cvXX' feature, it could be workable if the content author knew the details and marked up individual runs as needed rather than the entire document. Still workable, but not the ideal.

@psb1558
Copy link

psb1558 commented Jul 26, 2021

If a font was created with a different number of alternates within a given 'cvXX' feature, it could be workable if the content author knew the details and marked up individual runs as needed rather than the entire document. Still workable, but not the ideal.

It's sounding deeply inadvisable. This is the kind of guidance I was hoping for. Thanks!

@tiroj
Copy link

tiroj commented Jul 26, 2021

In the cases @tiroj is talking about, Harfbuzz displays the default character instead of the last in the sequence, but I don't know about the other engines.

I think that is the most sensible approach and a good general principle: if the requested enumerated variant of a glyph is not available in the font, display the default form. This produces predictable results not only in the case under discussion here but also if a user changes to a different font.

@khaledhosny
Copy link
Author

Thanks everyone. I think I have now a clearer understanding of how cvXX features are supposed to be used. I think I was using them more like salt feature, which I know see would make them rather redundant.

@PeterCon
Copy link
Collaborator

PeterCon commented Nov 4, 2021

To help clarify, the follow revision is proposed for OT 1.9:

Recommended implementation: A 'cvXX' lookup table maps the GID for the default form of a character to the GIDs for stylistic alternatives of that character. Each 'cvXX' feature uses alternate (GSUB lookup type 3) substitutions. (If there is only one variant for a character, a single-substitution lookup, type 1, can also be used.). A given 'cvXX' feature acts on a single glyph or on multiple glyphs for closely-related characters that have corresponding variants. Within each 'cvXX' feature, the number of variants should be identical for all glyphs, and the ordering of glyphs in lookup arrays should correspond such that the nth variants of all glyphs are corresponding variants.

@PeterCon PeterCon added this to the OpenType 1.9 milestone Nov 4, 2021
@PeterCon
Copy link
Collaborator

PeterCon commented Dec 9, 2021

Addressed in OpenType 1.9. Closing.

@PeterCon PeterCon closed this as completed Dec 9, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants