Why is there a requirement that 'cvXX' feature has the same number of variants for all glyphs? #778

khaledhosny · 2021-05-28T12:41:07Z

Quoting the spec:

Within each 'cvXX' feature, the number of variants should be identical for all glyphs.

That requirement seems to be peculiar, what is the rationale behind this and is there any implementation that actually enforce this requirement? I have built several fonts with cvXX features and hadn’t paid attention to this requirement, do I need to revise my fonts?

Document Details

⚠ Do not edit this section. It is required for docs.microsoft.com ➟ GitHub issue linking.

ID: 4da11419-0864-96ff-7f67-296680843fc1
Version Independent ID: 1d46dac1-d559-ce3e-2f6f-f47cf8cd9e0f
Content: Registered features, a-e (OpenType 1.8.4) - Typography
Content Source: typographydocs/opentype/spec/features_ae.md
Product: typography
GitHub Login: @PeterCon
Microsoft Alias: alib

psb1558 · 2021-07-25T16:21:12Z

I am very interested in this topic and hope it won't be forgotten. I suppose the purpose of this requirement is to encourage font makers to design rationally, so there should be matching variants on all design axes. An example posted in a discussion of my JuniusX font was the way Charis SIL handles v with hook, where there are matching variants for lowercase, uppercase, modifier, and small cap on cv62:

sub uni028B from [uni028B.StraightLft uni028B.StraightLftHighHook];
sub uni1DB9 from [uni1DB9.StraightLft uni1DB9.StraightLftHighHook];
sub uni01B2 from [uni01B2.StraightLft uni01B2.StraightLftHighHook];
sub uni028B.sc from [uni028B.StraightLft.sc uni028B.StraightLftHighHook.sc];

But I work in the Middle Ages, where the material doesn’t always cooperate. For example, the Medieval Unicode Font Initiative (MUFI) specification has upper- and lowercase variants of “Insular A,” but not of “Uncial A,” “Open-Top A,” etc. A designer could supply the missing variants so as to make the cvNN features correct, but that would be a little foolish, since the reason they don’t appear in the MUFI spec is that they don’t occur in medieval manuscripts, and so they would (probably) never get used.

I tied myself in knots trying to come up with a way to conform to the OpenType spec while dealing with the MUFI situation, and ended up pleasing no one—so for now JuniusX doesn’t conform to the OpenType spec. I’d like to know if the font will break in some future software environment.

tiroj · 2021-07-25T18:07:49Z

The requirement is that for each input glyph in an individual cvxx feature, there should be the same number of output variant glyphs. The rationale is two-fold: a) the features are intended for variants of individual characters, not for sets of characters (for which the ssxx features would be appropriate), so are expected to be used for e.g. uppercase A and its diacritic forms, and not for both upper- and lowercase characters unless they happen to follow the same pattern of variants; and b) having the same number of variants for each input glyph in the feature allows for the feature to be applied across a body of text with the same enumerated variant producing predictable results on all glyphs affected by the feature.

If you find yourself dealing with characters that have different numbers of variants, that is an indication that they do not belong in the same Character Variant feature.

tiroj · 2021-07-25T18:15:49Z

@khaledhosny Can you provide examples of your use of cvxx features with different numbers of variants for the input glyphs in a single cvxx feature? I’m having trouble understanding what sort of things would go into such a feature.

It may be that there is a gap between cvxx and ssxx—ignoring the practical issue of the limited number of registered ssxx features—in terms of design. The cvxx features are defined in terms of input—‘What character-related glyphs share one or more variant forms?’—, and the ssxx features are defined in terms of output—‘What set of non-character-related glyphs share a related variant form?’ There isn’t an obvious place to answer ‘What set of non-character-related glyphs share one or more variant forms?’

psb1558 · 2021-07-25T19:07:32Z

Thanks, John, this is very helpful. I was taking the most permissive possible understanding of this passage:

In practice, if a variation applies to a character in a bicameral script, then the casing-pair character may have the same variation. Also, Unicode includes pre-composed characters for certain base + mark combinations, hence a single abstract character may be incorporated into a number of Unicode characters. Therefore, a variation for a particular abstract character may be applicable to several related Unicode characters. The Character Variant features can be used for sets of related characters in these cases.

But it's going to be more complicated in some cases, and I have a concern about how all this is going to look to a user. For example, here are all the variants for Aa in the MUFI spec:

Now it appears that I should have three CVXX features for this system: one for case 1, where you have both an upper- and a lowercase variant, one for cases 2-5, where you have only lowercase variants, and one for cases 6-7, where you have only uppercase variants. It's going to look complicated to the end user.

But to be fair, what I've got now is also complicated, maybe irrational, and I care a good bit about your second rationale, allowing "for the feature to be applied across a body of text with the same enumerated variant producing predictable results on all glyphs affected by the feature."

There doesn't appear to be a solution lacking a downside to the problems presented by (e.g.) MUFI.

tiroj · 2021-07-25T19:32:37Z

I think you only need two cvxx features: one for uppercase A with three variants and one for lowercase a with five variants. Yes, one of the variants in each case happens to be of similar form, but this doesn’t mean you must put them in the same feature, only that you could, all else being equal. In this case, I would say that the unequivalent variant sets overrides the option to put upper- and lowercase in the same cvxx feature.

psb1558 · 2021-07-25T19:50:27Z

Okay, that makes sense. So the A system might look like this:

By grouping variants that are part of case-pairs at the beginning of each sequence one can coordinate their indexes and introduce a little more rationality to the system.

PeterConstable · 2021-07-25T20:27:40Z

is there any implementation that actually enforce this requirement?

It's a recommendation, not a requirement. As such, no implementation should ever be enforcing this.

The feature was recorded as having been registered by Microsoft, but it was originally requested by SIL. @tiroj has captured the gist of the reasoning behind the recommendation: When SIL was creating fonts with broad Latin coverage (Gentium, Doulos, etc.), they had glyph variants for several characters, including cases in which, say, "a" (my hypothetical) plus all of the precomposed accented forms had a systematic set of variants; or "b" and it's barred forms; etc. For sake of discussion, let me refer to these as shared-base sets. And they knew that, for some language's orthography some mix of these variants would be preferred, but they couldn't predict what languages might need what mix. So they wanted a set of features that could be defined on a per-shared-base-set basis, with the feature(s) applied across entire documents. That way, e.g., "a" and any of its precomposed accented forms would get the appropriate variants throughout the document.

If anyone thinks the description of the features can be improved, please suggest wording.

psb1558 · 2021-07-26T10:05:07Z

Thanks, @PeterConstable, for the explanation, and for the interesting background re: SIL.

I don't know if there's a widely shared understanding of words like "should" in technical specifications. Several of us have misunderstood the sentence quoted by @khaledhosny above as stating a requirement, not a recommendation. Perhaps there would be less misunderstanding if it read, "It is recommended that the number of variants be identical for all glyphs within each 'cvXX' feature."

tiroj · 2021-07-26T16:27:15Z

@PeterConstable Is it defined anywhere what the behaviour should be if the number of enumerated variants in a cvxx feature is not the same for the input glyphs in the lookup? So, for example, if these lower- and uppercase variants were included in a single cvxx feature

what would be the correct output if a block of text containing both |a| and |A| characters were selected and the character variant feature applied with the enumerated variant 4 or 5? Which form of the uppercase |A| should be displayed in that situation, the default form because there is no corresponding enumerated variant? or the enumerated variant 3 because that is the highest, closest enumerated variant for that character?

PeterConstable · 2021-07-26T16:36:34Z

I don't know if there's a widely shared understanding of words like "should" in technical specifications... Perhaps there would be less misunderstanding if it read, "It is recommended..."

In general, I'd say there's a widely shared understanding: "should" is used for recommendations; requirements are typically stated with words like "must" or "shall". The OT spec consistently (AFAIK) uses "must" and "should" in these ways. Saying "it is recommended" for every recommendation would be verbose.

PeterConstable · 2021-07-26T16:52:07Z

Is it defined anywhere what the behaviour should be if the number of enumerated variants in a cvxx feature is not the same for the input glyphs in the lookup?

If it's not stated in the feature description, or logically implied, then it's not defined anywhere.

IIRC, the idea SIL was after was that entire documents could be marked up to indicate particular alternates from particular 'cvXX' features. E.g., for 'cv01', use the fourth alternate throughout. Two lines of reasoning follow:

That would really only work if, in 'cv01', all of the starting glyphs have the same number of alternates and the alternates are consistently arranged—e.g., the fourth alternates of all glyphs are the same kind of alternate and should always co-occur.
The 'cvXX' feature description recommends use of type 3 lookups (alternate substitution). If for one starting glyph there are four or more alternates but for another starting glyph there are only three or less, what should be application do for the latter when the fourth alternate is being requested? The GSUB spec doesn't define what behaviour is expected. I suspect that's because it was assumed no app would ever request the nth alternate when n is past the last alternate. (I don't know whether it was assumed that type 3 lookups would always be accompanied by a glyph palette UI.)

psb1558 · 2021-07-26T17:05:07Z

In general, I'd say there's a widely shared understanding: "should" is used for recommendations

In this case no revision is necessary, if a recommendation is your intention. But you seem also to be saying that there is no defined behavior when a font doesn't follow the recommendation, so I wonder if it should be "must" or "shall."

In the cases @tiroj is talking about, Harfbuzz displays the default character instead of the last in the sequence, but I don't know about the other engines.

PeterConstable · 2021-07-26T17:20:19Z

But you seem also to be saying that there is no defined behavior when a font doesn't follow the recommendation, so I wonder if it should be "must" or "shall."

If a font was created with a different number of alternates within a given 'cvXX' feature, it could be workable if the content author knew the details and marked up individual runs as needed rather than the entire document. Still workable, but not the ideal.

psb1558 · 2021-07-26T17:39:14Z

If a font was created with a different number of alternates within a given 'cvXX' feature, it could be workable if the content author knew the details and marked up individual runs as needed rather than the entire document. Still workable, but not the ideal.

It's sounding deeply inadvisable. This is the kind of guidance I was hoping for. Thanks!

tiroj · 2021-07-26T17:49:19Z

In the cases @tiroj is talking about, Harfbuzz displays the default character instead of the last in the sequence, but I don't know about the other engines.

I think that is the most sensible approach and a good general principle: if the requested enumerated variant of a glyph is not available in the font, display the default form. This produces predictable results not only in the case under discussion here but also if a user changes to a different font.

khaledhosny · 2021-09-10T19:29:21Z

Thanks everyone. I think I have now a clearer understanding of how cvXX features are supposed to be used. I think I was using them more like salt feature, which I know see would make them rather redundant.

PeterCon · 2021-11-04T22:29:00Z

To help clarify, the follow revision is proposed for OT 1.9:

Recommended implementation: A 'cvXX' lookup table maps the GID for the default form of a character to the GIDs for stylistic alternatives of that character. Each 'cvXX' feature uses alternate (GSUB lookup type 3) substitutions. (If there is only one variant for a character, a single-substitution lookup, type 1, can also be used.). A given 'cvXX' feature acts on a single glyph or on multiple glyphs for closely-related characters that have corresponding variants. Within each 'cvXX' feature, the number of variants should be identical for all glyphs, and the ordering of glyphs in lookup arrays should correspond such that the n^th variants of all glyphs are corresponding variants.

PeterCon · 2021-12-09T00:37:33Z

Addressed in OpenType 1.9. Closing.

khaledhosny mentioned this issue May 28, 2021

Duplicate Alternates in Character Variants psb1558/Junicode-font#56

Open

khaledhosny changed the title ~~Why there is a requirement that 'cvXX' feature has the same number of variants for all glyphs~~ Why is there a requirement that 'cvXX' feature has the same number of variants for all glyphs? May 28, 2021

PeterCon added the OpenType spec label Jun 1, 2021

psb1558 mentioned this issue Sep 28, 2021

LATIN CAPITAL LETTER I has a dot above psb1558/Junicode-font#68

Open

PeterCon added this to the OpenType 1.9 milestone Nov 4, 2021

PeterCon closed this as completed Dec 9, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why is there a requirement that 'cvXX' feature has the same number of variants for all glyphs? #778

Why is there a requirement that 'cvXX' feature has the same number of variants for all glyphs? #778

khaledhosny commented May 28, 2021 •

edited

psb1558 commented Jul 25, 2021

tiroj commented Jul 25, 2021 •

edited

tiroj commented Jul 25, 2021

psb1558 commented Jul 25, 2021

tiroj commented Jul 25, 2021

psb1558 commented Jul 25, 2021

PeterConstable commented Jul 25, 2021

psb1558 commented Jul 26, 2021

tiroj commented Jul 26, 2021

PeterConstable commented Jul 26, 2021

PeterConstable commented Jul 26, 2021

psb1558 commented Jul 26, 2021

PeterConstable commented Jul 26, 2021

psb1558 commented Jul 26, 2021

tiroj commented Jul 26, 2021

khaledhosny commented Sep 10, 2021

PeterCon commented Nov 4, 2021

PeterCon commented Dec 9, 2021

Why is there a requirement that 'cvXX' feature has the same number of variants for all glyphs? #778

Why is there a requirement that 'cvXX' feature has the same number of variants for all glyphs? #778

Comments

khaledhosny commented May 28, 2021 • edited

Document Details

psb1558 commented Jul 25, 2021

tiroj commented Jul 25, 2021 • edited

tiroj commented Jul 25, 2021

psb1558 commented Jul 25, 2021

tiroj commented Jul 25, 2021

psb1558 commented Jul 25, 2021

PeterConstable commented Jul 25, 2021

psb1558 commented Jul 26, 2021

tiroj commented Jul 26, 2021

PeterConstable commented Jul 26, 2021

PeterConstable commented Jul 26, 2021

psb1558 commented Jul 26, 2021

PeterConstable commented Jul 26, 2021

psb1558 commented Jul 26, 2021

tiroj commented Jul 26, 2021

khaledhosny commented Sep 10, 2021

PeterCon commented Nov 4, 2021

PeterCon commented Dec 9, 2021

khaledhosny commented May 28, 2021 •

edited

tiroj commented Jul 25, 2021 •

edited