Allow ScriptLangTag values with implied script subtag #978

nedley · 2022-10-06T22:11:54Z

We are in the process of migrating out-of-font metadata to 'meta' tables, and most of the values being migrated have a language subtag but an implied script subtag. We would appreciate it if the spec allowed for ScriptLangTags with a language but no script subtag, in which case a likely script would inferred using a process like the one described in UTS #35.

Document Details

⚠ Do not edit this section. It is required for learn.microsoft.com ➟ GitHub issue linking.

ID: d8f6ed4e-64e4-2176-f903-c2778419d391
Version Independent ID: bb311713-0340-4c84-180d-b0016a18fefe
Content: meta — Meta table (OpenType 1.9) - Typography
Content Source: typographydocs/opentype/spec/meta.md
Product: typography
GitHub Login: @PeterCon
Microsoft Alias: alib

PeterCon · 2022-10-07T18:02:48Z

In general, this wouldn't be a breaking change for fonts—i.e., any existing fonts would remain conformant. For existing applications, it could mean new fonts provide a dlng or slng value that gets ignored—sub-optimal, but not a terrible problem in the long run.

A bigger concern, though, is maintenance over time: making sure that values put in fonts today continue to be valid and match the expectations of applicatons into the indefinite future.

A problem will exist if the assumptions as to what script is implicitly inferred from a language subtag changes over time. If language tags existed in 1900, "tr" would have been assumed to imply Arabic script, and an Arabic-script font designed for Turkish might have used dlng=tr. But by 1950, that inference would be wrong.

You point to UTS #35, but I don't think that will be a good reference for determining when a script subtag can be omitted: CLDR uses suppress-script data from BCP 47, but also uses "heuristically-derived" values "based on the default content data, [and] the population data". Crucially, the likelyScript values "may change over time". The use of such heuristics suggests low expectation for stability, and significant risk that font data that's useful and conformant today might become un-useful or even non-conformant tomorrow.

BCP 47 suppress-script data may be better. It would be easy to describe and easy for font developers to find and explorer (go to https://www.iana.org/assignments/language-subtag-registry/language-subtag-registry and search for "suppress-script"). The suppress-script values can be changed, though proposed changes have to go through a human review process via the ietf-languages mail list, and so probably have a better chance of stability. The fact that linguistic realities can change over time creates uncertainty, however: the record for "tr" has Suppress-Script: Latn today, but would have had a different value in the past (if BCP 47 existed).

The primary intent in the spec of the script subtag the one required element was to allow for tags that only include a script subtag, which often makes sense as a way to character what fonts can support or are designed for. But it also has a benefit of ensuring forward compatibility for font data.

If an implementation is using something like ICU that expects tags that omit likely script subtags, it seems like it would be easy enough to detect these cases in the meta table and remove the script subtag. That doesn't seem like a bad tradeoff for providing better stability for font data.

With all that in mind, do you still think it would be an improvement to remove that constraint?

nedley · 2022-10-07T18:22:07Z

Point taken that the likeliest script for a language can change over time, but note also that determining scripts supported by the font is a mechanical process given the character set so an implementation can identify the correct one(s) as needed. Again, our aim is not to suggest that bare language tags are preferred moving forward but it is awkward to see that our historical use of these fields should be ignored as nonconformant when they are still accurate, if incomplete.

PeterCon · 2022-10-07T18:30:53Z

determining scripts supported by the font is a mechanical process given the character set

True, though part of the point of having metadata is to inform without needing to do a detailed analysis of the font.

At any rate, it sounds like the main concern is to relax the constraint allowing some existing fonts to be considered conformant. Would you be OK with allowing scriptlangtags to have only a language subtag but recommending that a script subtag should always be included?

nedley · 2022-10-07T18:39:51Z

We would be fine with that resolution, yes.

tiroj · 2022-10-07T20:11:49Z

A problem will exist if the assumptions as to what script is implicitly inferred from a language subtag changes over time. If language tags existed in 1900, "tr" would have been assumed to imply Arabic script, and an Arabic-script font designed for Turkish might have used dlng=tr. But by 1950, that inference would be wrong.

More immediate examples can be found in Central Asia, where several languages changed script twice during the 20th Century, at least one language changed script three times during the same period, and several of which are in the process of transitioning to a new script right now.

nedley · 2022-10-07T20:24:58Z

Suffice it to say the fonts and data we are dealing with do not suffer from problems of this nature…

[edit: By which I mean languages where the likeliest script is transitioning.]

PeterCon · 2022-10-07T21:53:52Z

Suffice it to say the fonts and data we are dealing with do not suffer from problems of this nature…

Sure, but the spec still needs to anticipate the general case. My main concerns would be establishing what will provide for longer term stability and then not introducing ambiguity with some font developers getting the impression it's fine to do things that aren't conducive to longer term stability.

nedley · 2022-10-07T22:26:27Z

We are not proposing any change that implies a preference for implied/suppressed script. But if software is to accommodate existing fonts on our platforms it needs to be aware of this until such time as we can make the necessary modifications.

PeterCon · 2022-10-07T22:27:48Z

Got it; I'll work on some wording...

PeterCon · 2022-10-08T01:32:02Z

See OT 1.9.1 alpha for draft revisions addressing this issue.

I've relaxed ScriptLangTag syntax as you requested but with clear statements that a tag without a script subtag is strongly discouraged and that applications are permitted to ignore such tags (allowing existing implementations that do so to remain conformant).

PeterCon · 2022-10-08T01:32:51Z

As we discussed offline, I also added some clarification regarding the intended use and distinction between slng and dlng.

dscorbett · 2022-10-08T17:13:57Z

The new ScriptLangTag grammar disallows some previously allowed tags. If a tag include includes any region, variant, extension, or private use subtags, it must now include a language subtag. The first line:

ScriptLangTag = language | script | language "-" script

should be:

ScriptLangTag = (language | script | language "-" script)

PeterConstable · 2022-10-08T22:17:42Z

Of course. It was my first thought to wrap in parens, but then I didn't do that. Thanks for the catch.

Fixed.

PeterCon self-assigned this Oct 7, 2022

PeterCon added the OpenType spec label Oct 7, 2022

PeterCon added this to the OpenType 1.9.1 milestone Oct 7, 2022

PeterCon closed this as completed Oct 8, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow ScriptLangTag values with implied script subtag #978

Allow ScriptLangTag values with implied script subtag #978

nedley commented Oct 6, 2022

PeterCon commented Oct 7, 2022

nedley commented Oct 7, 2022

PeterCon commented Oct 7, 2022

nedley commented Oct 7, 2022

tiroj commented Oct 7, 2022

nedley commented Oct 7, 2022 •

edited

PeterCon commented Oct 7, 2022

nedley commented Oct 7, 2022

PeterCon commented Oct 7, 2022

PeterCon commented Oct 8, 2022

PeterCon commented Oct 8, 2022

dscorbett commented Oct 8, 2022

PeterConstable commented Oct 8, 2022

Allow ScriptLangTag values with implied script subtag #978

Allow ScriptLangTag values with implied script subtag #978

Comments

nedley commented Oct 6, 2022

Document Details

PeterCon commented Oct 7, 2022

nedley commented Oct 7, 2022

PeterCon commented Oct 7, 2022

nedley commented Oct 7, 2022

tiroj commented Oct 7, 2022

nedley commented Oct 7, 2022 • edited

PeterCon commented Oct 7, 2022

nedley commented Oct 7, 2022

PeterCon commented Oct 7, 2022

PeterCon commented Oct 8, 2022

PeterCon commented Oct 8, 2022

dscorbett commented Oct 8, 2022

PeterConstable commented Oct 8, 2022

nedley commented Oct 7, 2022 •

edited