Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Locale formats #6358

Open
aschempp opened this issue Apr 23, 2015 · 2 comments
Open

Locale formats #6358

aschempp opened this issue Apr 23, 2015 · 2 comments
Assignees
Labels

Comments

@aschempp
Copy link
Member

aschempp commented Apr 23, 2015

[RFC] Locale formats

For several days I now read on the topic of Locales in regards to our current handling and the Symfony way. I'm writing down here what I found so we're all up to date and to solve confusion once and for all. Loosely related to contao/core-bundle#190 and contao/core-bundle#171.

Background

In computing, a locale is a set of parameters that defines the user's language, country and any special variant preferences that the user wants to see in their user interface.

https://en.wikipedia.org/wiki/Locale

This is probably familiar to everyone of us.

Locale formats

These two formats are relevant for our case.

  1. IETF Language Tag, specified in BCP 47, (aka RFC 5646, RFC 4646, RFC 3066, RFC 1766).
    Used in:
    • HTTP Accept-Language header
    • XML & HTML documents (e.g. <html lang="">)
  2. Locale ID, according to the International Components for Unicode (ICU).
    Used in:
    • PHP Intl extension
    • Symfony Intl component, a replacement layer for the PHP intl extension
    • Recommended format for the Symfony Translation component
    • Returned by Symfony Request::getLanguages() (after parsing the HTTP Accept-Language header)

Transifex uses a Locale ID to represent regional languages (e.g. zh_TW) but a Language Tag to represent language scripts (e.g. zh-Hant) :-(

Differences between Language Tag and Locale ID

As far as I understand, there is no major difference for our use case.

  • A Language Tag uses a dash (-) as delimiter between language, script and region.
  • A Locale ID uses an underscore (_) as delimiter between language, script and country.
  • In a Locale ID, the third subtag is always a country (according to ISO 3166) whereas in a Language Tag it can also be an UN M.49 region code.

A Locale ID also does allow to specify more details on locales like the currency, calendar or collation. However, they are (currently) not relevant for our case.

Structure of Language Tag and Locale ID

Apart from the differences noted above, both Language Tag and Locale ID are very similar in their format:

  1. The language subtag specified using a two- or three-letter lowercase code (using ISO 639-1 or ISO 639-2).
  2. An optional script subtag (specified in ISO 15924)
    (Examples: Latn = Latin, Cyrl = Cyrillic, Hans = Chinese Simplified, Hant = Chinese Traditional)
  3. The country or region subtag, commonly using a two letter ISO 3166 country code.

Best practice is to add subtags only if they add relevant information. As an example, it's not recommended to write en-Latn because english is almost always written in latin characters.

Situation in Contao

Just so that everyone is on the same track, I'm quickly writing down the current Contao approach:

  • A page language (tl_page.language) is a Language Tag. It can either be two characters ISO 639-1 code (de, en) or a five characters language and country (de-DE, de-CH, en-US).
  • Contrary, language files are using Locale ID. The folder name can either be two characters ISO 639-1 code (de, en) or a five character language and country (de_DE, de_CH, en_US).
  • The languages list (system/config/languages.php) is also using Locale ID, which means the same applies for member and user language (tl_member.language / tl_user.language).

The writing of both formats is case sensitive. The $GLOBALS['TL_LANGUAGE'] variable is inherited from the page language and therefore is a Language Tag.

As our language folders are a Locale ID, we're converting the representation everywhere where we try to match a page language to a language folder (str_replace('-', '_', $lang)). Because we're relying on Transifex, the package format is somewhat predefined.

Situation in Symfony

Translation

The Translation component accepts numerous formats for the translation files (see [1], [2]). It simply tries to find a file with the given locale.

However, loading fallbacks (zh for zh_CN) only works when using a Locale ID with underscore (see [1]). It does, however, not support three level fallbacks (loads zh_Hant but not zh when locale is zh_Hant_TW).

Request

The request has methods for setLocale and getLocale. They are NOT related to the _locale attribute (see HttpKernel). On a call to setLocale, the property is forwarded to the PHP Intl subsystem if available.

The good news is: the PHP Intl component does happily accept both dash (-) or underscore (_) as delimiter and correctly detects country, script and region.

Intl

The Symfony Intl component provides fallback information if PHP Intl is not available. Same as PHP Intl, it uses Locale IDs (see [1]). I have not tested it, but I doubt it will work with Language Tags.

HttpKernel / Magic

The HttpKernel component contains some dependency injection magic in regards to locale handling. There are two listeners in place:

  1. The LocaleListener triggers on kernel.request (priority 16) and calls Request::setLocale if a _locale attribute is found in the current request (see [1]).
  2. The TranslatorListener triggers on kernel.request (priority 10) and sets the locale for the default translator from Request::getLocale (see [1]).

Problems

Frontend

In our current implementation (Contao 4.0.0-beta1), the request _locale attribute is parsed from the route path, which means it will be a Language Tag according to the setting in tl_page.language. If zh-TW is set as the page language, the Translator would not find the zh_TW language pack (and it would not load zh either).

If routes do not contain locales (contao.prefix_locale is false), the fallback language will be used for the Symfony Translator. Contao will use the best-matching language from Accept-Language header to find an appropriate page, and use that language then.

Backend

Backend paths do not contain language information. For Symfony Translation, the fallback language (en) will always be used. Contao will use the best-matching language from Accept-Language header.
After a user login, the user's language (tl_user.language) will be used to load Contao languages.

Conclusion

For Symfony, it does not matter what Locale format you use, because it's just a framework. As long as your routes and translation files do match, the Translator will happily load what your URL contains.

The current Contao implementation is not really compatible with Symfony Translation though. It would only be possible to load translations by using Language Code files (messages.de-CH.xliff), which is neither the recommended way nor the default in Contao or supported by Transifex.

Solutions

TBD

Tools

@leofeyer
Copy link
Member

What about this ticket? Can it be closed?

@aschempp
Copy link
Member Author

No, That's a general info ticket (same as contao/core-bundle#122) wich information on things we want to improve somewhen in a future release.

@leofeyer leofeyer changed the title [RFC] Locale formats Locale formats Nov 6, 2015
leofeyer referenced this issue Jul 6, 2021
Description
-----------

| Q                | A
| -----------------| ---
| Fixed issues     | Fixes #1957

Since Contao 4.10, the URL prefix is no longer tied to the page language. With this PR, the page language can be any valid Locale ID. A Locale ID can include language, script, country and additional information (e.g. `de_Latn_CH@currency=EUR`),  see https://github.com/contao/core-bundle/issues/233.

Locale ID and ICU information can later be used a lot more things like rewriting the number formatting to use the ICU information (decimal point according to language & country).

Commits
-------

c45065b Use Locale ID for tl_page.language
832f980 Migrate the existing page languages
4372b7e Format locale instead of string replace
7fa7cf7 Support old locale style in legacy routing
f5b0c0c Added support for full locale routing
af35755 Do not migrate the page language
4b5c98a Updated hint for page language field
6d303ea Use a listener for the page language callback
40bfc17 Use method to calculate locale priority
0c22b1b CS
bc4abe5 Fixed tests
95293af Feedback adjustments
3a853b8 Migrate the page languages
de7b8d7 Always convert page language and make sure it starts with two letters
c2ca955 Only support correctly formatted language folders
e149b37 Fix user and member language fields
300d40f Fixed rebase issues
edfff9a CS
bfb0cf6 Merge branch '4.x' into feature/locale
3ad4a00 CS
leofeyer referenced this issue in contao/core-bundle Jul 6, 2021
Description
-----------

| Q                | A
| -----------------| ---
| Fixed issues     | Fixes contao/contao#1957

Since Contao 4.10, the URL prefix is no longer tied to the page language. With this PR, the page language can be any valid Locale ID. A Locale ID can include language, script, country and additional information (e.g. `de_Latn_CH@currency=EUR`),  see https://github.com/contao/core-bundle/issues/233.

Locale ID and ICU information can later be used a lot more things like rewriting the number formatting to use the ICU information (decimal point according to language & country).

Commits
-------

c45065b5 Use Locale ID for tl_page.language
832f980f Migrate the existing page languages
4372b7e3 Format locale instead of string replace
7fa7cf79 Support old locale style in legacy routing
f5b0c0c5 Added support for full locale routing
af357553 Do not migrate the page language
4b5c98a0 Updated hint for page language field
6d303ea6 Use a listener for the page language callback
40bfc173 Use method to calculate locale priority
0c22b1bb CS
bc4abe59 Fixed tests
95293af0 Feedback adjustments
3a853b8f Migrate the page languages
de7b8d7c Always convert page language and make sure it starts with two letters
c2ca955d Only support correctly formatted language folders
e149b37a Fix user and member language fields
300d40f2 Fixed rebase issues
edfff9ad CS
bfb0cf68 Merge branch '4.x' into feature/locale
3ad4a000 CS
@leofeyer leofeyer transferred this issue from contao/core-bundle Sep 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants