Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Locales: Add support for variants of a locale #226

Closed
ocean90 opened this issue Jan 21, 2016 · 50 comments
Closed

Locales: Add support for variants of a locale #226

ocean90 opened this issue Jan 21, 2016 · 50 comments

Comments

@ocean90
Copy link
Member

@ocean90 ocean90 commented Jan 21, 2016

Mentioned this some time ago in #glotpress.

Some ideas on how we could store variants in GP_Locales:

  1. Define new GP_Locale with variant in slug:
$de_formal = new GP_Locale();
$de_formal->english_name = 'German (Formal)';
$de_formal->native_name = 'Deutsch (Sie)';
$de_formal->lang_code_iso_639_1 = 'de';
$de_formal->country_code = 'de';
$de_formal->wp_locale = 'de_DE_formal';
$de_formal->slug = 'de/formal';
$de_formal->google_code = 'de';
$de_formal->facebook_locale = 'de_DE';
  1. Clone existing GP_Locale and add variant as a new property:
$de_formal = clone $de;
$de_formal->english_name = 'German (Formal)';
$de_formal->native_name = 'Deutsch (Sie)';
$de_formal->wp_locale = 'de_DE_formal';
$de_formal->variant = 'formal';
  1. Define variants as a new property of the existing GP_Locale
$de = new GP_Locale();
$de->english_name = 'German';
$de->native_name = 'Deutsch';
$de->lang_code_iso_639_1 = 'de';
$de->country_code = 'de';
$de->wp_locale = 'de_DE';
$de->slug = 'de';
$de->google_code = 'de';
$de->facebook_locale = 'de_DE';
$de->variants = array(
    'formal' => array(
        'english_name' => 'German (Formal)',
        'native_name' => 'Deutsch (Sie)',
    ),
);

What we're currently doing on api.w.org side: https://meta.trac.wordpress.org/browser/sites/trunk/api.wordpress.org/public_html/translations/lib.php?rev=2319&marks=54-68#L53

@toolstack
Copy link
Contributor

@toolstack toolstack commented Mar 13, 2016

I might go for option number 4, a combination of 2 & 3 above:

$de = new GP_Locale();
$de->english_name = 'German';
$de->native_name = 'Deutsch';
$de->lang_code_iso_639_1 = 'de';
$de->country_code = 'de';
$de->wp_locale = 'de_DE';
$de->slug = 'de';
$de->google_code = 'de';
$de->facebook_locale = 'de_DE';
$de->variants = array(
    'default' => 'de',
    'formal' => 'de_formal',
    ),
);

$de_formal = clone $de;
$de_formal->english_name = 'German (Formal)';
$de_formal->native_name = 'Deutsch (Sie)';
$de_formal->wp_locale = 'de_DE_formal';

That keeps the current structure of the locale's data in tact, making it backwards compatible, but also still contain a way to find all the variants for a given locale.

@ocean90
Copy link
Member Author

@ocean90 ocean90 commented Mar 13, 2016

@toolstack I think the important part of this is to answer the question on how we want to store the data in a set. For example, what's the locale value in

function gp_locales_by_project_dropdown( $project_id, $name_and_id, $selected_slug = null, $attrs = array() ) {
?

Or how do we want to display a hierarchy, see

public function name_with_locale( $separator = '→') {
.

hierarchy

@ocean90 ocean90 closed this Mar 13, 2016
@ocean90 ocean90 reopened this Mar 13, 2016
@toolstack
Copy link
Contributor

@toolstack toolstack commented Mar 13, 2016

In a set, using option 4 above, each variant would have it's own slug so the current storage system would work without modification.

As for how to display them, I'd likewise leave the current system in place, which would create German and German (Formal) as entries in the list based on the english_name of the locale.

This would also continue to allow for variants to have unique translation set slugs.

@ocean90
Copy link
Member Author

@ocean90 ocean90 commented Mar 13, 2016

In a set, using option 4 above, each variant would have it's own slug so the current storage system would work without modification.

Can you elaborate that? Because that's the main point. What is the locale and the slug for "German" and "German (Formal)"? (locale = slug of a GP_Locale, slug = slug of a translation set for a GP_Locale)

See also https://github.com/GlotPress/GlotPress-WP/blob/develop/gp-includes/routes/translation.php#L126-L128 which currently uses the locale part to identify the GP_Locale.

@toolstack
Copy link
Contributor

@toolstack toolstack commented Mar 13, 2016

Slight revision to option 4 then,

$de_formal = clone $de;
$de_formal->english_name = 'German (Formal)';
$de_formal->native_name = 'Deutsch (Sie)';
$de_formal->wp_locale = 'de_DE_formal';
$de_formal->slug = 'de_formal';

The slugs should be different.

@ocean90
Copy link
Member Author

@ocean90 ocean90 commented Mar 13, 2016

So you have /glotpress/projects/project/de/default/ for "German" and /glotpress/projects/project/de_formal/default/ for "German (Formal)" (previously: /glotpress/projects/project/de/formal/)?

@toolstack
Copy link
Contributor

@toolstack toolstack commented Mar 13, 2016

Yes, so you could still have different translation set slugs for each of the variants.

@ocean90
Copy link
Member Author

@ocean90 ocean90 commented Mar 13, 2016

That looks a bit odd, but in terms of BC the best way to go. But the slug should be de-formal with a - instead.

@toolstack
Copy link
Contributor

@toolstack toolstack commented Mar 13, 2016

Yea, the dash is probably better.

@pedro-mendonca
Copy link
Member

@pedro-mendonca pedro-mendonca commented Jul 12, 2017

Hi,
After seeing the current development on the GlotPress blog, I would like to make some comments.

First of all, I'm very glad that real Variants with fallback are coming to see the light of day.

Second, as pt_PT Translation Editor, I must say that the proposed relationship root/variant between pt_PT/pt_BR doesn't make any sense. These are completely different locales, with independent guidelines and translation teams, and I believe that a fallback between it's not the apropriate scenario.

As discussed in WCEU, Polyglots, etc, we do need variants for pt_PT locale, but all pt_PT variants, not related with any other locale. Example: The AO90 Variant already requested in this post.

At last, I think the color difference between root/variant isn't very much clear, I suggest it should be much more different so there is no doubt about it.

Thanks for this very nice work!

@sheilagomes
Copy link

@sheilagomes sheilagomes commented Jul 12, 2017

As pt_BR GTE, I agree with Pedro Mendonça on all counts. The "Root" color is too close to the "Current" green, and may lead to confusion, and pt_BR follows the AO90, not always the case in pt_PT.
There are many differences in writing between these locales, both in spelling and grammar, but there are also similarities enough for it to be easy to let things slip by, which would affect consistency and general quality of translation.

@vitormadeira
Copy link

@vitormadeira vitormadeira commented Jul 12, 2017

As a Portuguese native speaker and a frequent Wikipedia editor, this move reminds me of what is going on with the Wikipedia editing features, as in Wikipedia, it is now possible to have a common root text and have kind of a fallback when the user uses the locale switcher. I wonder if that concept might be somehow inspirational for translating projects such as WordPress, who knows?

I know that actually there are many differences between the way that Portuguese and Brazilian teams decide to translate the original American English strings into their locales, but I suppose that in some near future it would be great to see a kind of approach on both sides of the Atlantic and try to get Portugal and Brazil translations a bit closer - even considering that many differences would have to remain, as there are some real differences between both ways of speaking and writing the same language that must remain different.

I know that many Portuguese and Brazilian users do not think like this, but I really wanted to leave this statement, as after all, the Portuguese language is the same, and I totally presume that teams could work on a common base and then local teams could modify the most important details that would need more attention. (kind of what en_GB already has to do when they "translate" their strings from American English.)

As for the colours, yes I too would like to remark that it needs attention there.

@pedro-mendonca
Copy link
Member

@pedro-mendonca pedro-mendonca commented Jul 12, 2017

@vitormadeira this is not the place to discuss wether or not the languages should be merged.
This is the place where we must explain the devs how these locales current relate to each other, and what the Translation Teams need. And as mentioned by @sheilagomes these are totally different and independent locales, with no fallback relation between them.
Please lets keep the discussion only on the functionallity.
The fact is that pt_PT and pt_BR aren't in any sense root/variant.
I believe that this will probably happen with other root/variant relations proposed, like spannish speaking locales.
I think the Variants are much more a feature inside the same locale than between locales.

@Soean
Copy link

@Soean Soean commented Jul 12, 2017

Hey,
I can't find any infos for the formal versions (de_DE -> de_DE_formal) in the blog post. Are they supported as well?

@vitormadeira
Copy link

@vitormadeira vitormadeira commented Jul 12, 2017

So sorry Pedro, but I believe that this is a democratic place and that if I want to express an opinion regarding this matter, I have the very same rights to express my cons / pros to it just as you have the very same right. Not really quite an "open source" inspired reply over there...

@pedro-mendonca
Copy link
Member

@pedro-mendonca pedro-mendonca commented Jul 12, 2017

@vitormadeira my comment is strictly about the technical approach for the current topic.
Forking this discussion on how the languages might or could be related someday, in my opinion, generates too much noise around the actual technical topic being discussed here, and that is the reason I say that this isn't the place for having that discussion.
Regards

@toolstack
Copy link
Contributor

@toolstack toolstack commented Jul 12, 2017

@pedro-mendonca

Second, as pt_PT Translation Editor, I must say that the proposed relationship root/variant between pt_PT/pt_BR doesn't make any sense. These are completely different locales, with independent guidelines and translation teams, and I believe that a fallback between it's not the apropriate scenario.

Is it that there is no relationship between the languages or no relationship between the teams that work on translate.w.org. Here we're interested to know if the languages have a relationship, how that eventually gets deployed to translate.w.org is of course a different question 😄

As discussed in WCEU, Polyglots, etc, we do need variants for pt_PT locale, but all pt_PT variants, not related with any other locale. Example: The AO90 Variant already requested in this post.

Once the PR is in place, these kinds of new variants can be put in to place easily.

At last, I think the color difference between root/variant isn't very much clear, I suggest it should be much more different so there is no doubt about it.

I choose a colour that's close to the approved translations to visually keep them related, however I'm open to any suggestions for a different choice.

@vitormadeira
Copy link

@vitormadeira vitormadeira commented Jul 12, 2017

Well, just as @GaryJones said, both Brazilian and Portuguese teams would need someone to approve all the strings before they could be inherited from the 'parent' Portuguese strings onto the variants locales.

Technically, it would be just the very same as it is now: The original strings would be the same, but locale differences would be made on the locale variants without automatic inheriting (needing someone to review and approve the strings).

@sheilagomes
Copy link

@sheilagomes sheilagomes commented Jul 12, 2017

A doubt: why doesn't this feature enter as a fuzzy translation marked in the same way they are (in purple), and with some indication it's root-based?

@GaryJones
Copy link

@GaryJones GaryJones commented Jul 12, 2017

@vitormadeira I think the point being made by the others, is that there is no "Portuguese" parent locale, in the same way there's not an "English" (no country) locale.

Where the English variants differ, and why our situation can't be compared to the Portuguese situation, is that the fallback for plugins is the original plugin strings, and for ~95% of plugins, that happens to be something very close to English (United States).

@vitormadeira
Copy link

@vitormadeira vitormadeira commented Jul 12, 2017

As Greg Ross stated on the GlotPress post, this would mean concentrating and helping efforts:

Correct, all strings in the variant can be uniquely translated independent of the root translation. Once a unique translation for the variant is saved, it will no longer inherit the root’s translation of that string.

https://glotpress.blog/2017/07/12/glotpress-and-locale-variants/#comment-58031

@toolstack
Copy link
Contributor

@toolstack toolstack commented Jul 12, 2017

Let me see if this helps (or hurts) a bit.

Think of it this way, if a language is a variant of a root, it doesn't mean they have to be 100% identical ( or even 80% or less), but it should mean that a reader of the variant can understand the meaning of the root translation.

Put another way using our Portuguese example above, would a Brazilian user rather have a Portugal translation or the untranslated English string show up on their screen?

Or would having the Portugal translation as the starting point for the Brazilian translation be helpful?

Something else to keep in mind is that while translate.w.org is by far the largest GP install, it's not the only one so if possible the choices made here should be useful for installs with 20+ translators for a langauge and installs with just 1.

@toolstack
Copy link
Contributor

@toolstack toolstack commented Jul 12, 2017

@sheilagomes

A doubt: why doesn't this feature enter as a fuzzy translation marked in the same way they are (in purple), and with some indication it's root-based?

I did consider using fuzzy, but it's really a separate state as the translation has been approved in the root locale.

Marking both "fuzzy" and "root" the same way could become confusing.

@zedejose
Copy link
Contributor

@zedejose zedejose commented Jul 12, 2017

would a Brazilian user rather have a Portugal translation or the untranslated English string show up on their screen?

Probably have the unstranslated version (and the same is true for a pt_PT user)

would having the Portugal translation as the starting point for the Brazilian translation be helpful?

No, and vice-versa

@vitormadeira
Copy link

@vitormadeira vitormadeira commented Jul 12, 2017

Great approach @toolstack.

I have no doubt that if a Brazilian citizen that can't read English, would understand some 90% to 99% of a website that would be displayed Portugal Portuguese and vice versa if a Portuguese citizen who would not know how to read English, would totally prefer a Brazilian translated website.

As a manner of fact, I just compared the Brazilian and the Portuguese translation for WordPress v4.8 and the difference is really small! I got almost 74% identical results:
https://copyleaks.com/compare/two-documents/a566a653-6bf9-44e4-80c0-ff5db6db2aa3
(I hope this URL is public!)
-> Also here: https://www.diffchecker.com/FVUpzAX8

As I said before, if Wikipedia did it, it would not be that difficult to accept that Portugal and Brazil teams could work together on making the small switches they would need to do translating onto their locales.

People around here are thinking on just WordPress, but GlotPress is a bit more than just WordPress. That should also be considered on this debate.

@toolstack
Copy link
Contributor

@toolstack toolstack commented Jul 12, 2017

@GaryJones

Is there any way for English (*) GTEs to filter / view the inherited strings, in the same way that Waiting strings are flagged up? If not, it means that all of a sudden, all of the plugins will be marked as 100%, without us having implicitly checked the strings.

Not yet, but it is something I expect to add before we merge the PR.

It would be good to be able to view the inherited strings, and then bulk apply what would be Copy from Original, so that it confirms that they have been checked, even though that won't affect the percentage complete (and everyone can pull down en_* in the meantime anyway).

Doing this would negate the benefit of the variants as you would effectively just be doing an import from the root locale.

You do bring up an interesting point of how to know if a root string has been reviewed for a variant, that may need some more thought.

@pedro-mendonca
Copy link
Member

@pedro-mendonca pedro-mendonca commented Jul 12, 2017

Agree, maybe some kind of logic as follows:
A translation field and a checkbox.
The translation editor might either fill the translation field or check the checkbox marking the root as valid to the variant.

@robertsky
Copy link

@robertsky robertsky commented Jul 12, 2017

This is for Chinese languages.

As far as I know, China and Singapore Chinese have about 90% common words with new words or terms deferring to China's. China and Singapore are using the simplified chinese character set. Hong Kong and Taiwan are using different set of characters (traditional Chinese characters) altogether and are completely independent of each other.

If it is possible, can I request to have zh-cn as the root of zh-sg, as the similarities between the two locales are at least 90% (at least when referring to dictionaries)? My concern is the main locale itself. Whoever take charge of it would affect zh-sg, which is at the moment 100% not translated.

If it is not possible, it is fine as well. I am raising a request to be a GTE for zh-sg. At the very least, I intend to import zh-cn WP translations (and at least the two plugins that I am PTE at the moment), and as time permits, update the zh-sg glossary of the local variants.

@xavivars
Copy link

@xavivars xavivars commented Jul 13, 2017

Let me give my point if view (that may or may not be useful for the case of Portuguese).

I'm GTE for Catalan. But I'm also involved in many other open source translation to Catalan and to *Valencian (Catalan variant spoken in the Valencia region,in Spain,the one I speak and write on a daily basis). In this case, both variants are pretty close, but there are some fundamental differences that we try to apply to all our translations. Even more, there's been a recent ortographic change done by the Catalan (from Catalunya) language institution that hasn't been approved by the Valencian one,and that makes that now words written in Catalunya are misspells in Valencia, and the other way around.

But despite of the differences, the important point here is not how close the translated strings are between each other, but what are the chances that someone from Brazil prefers a Portuguese (PT) translation instead of an English one, and viceversa. I can agree that from the translator point of view,it may be useful to have the English sentence as an starting point,but is really hard to believe that most of the Portuguese speaking users (in any variant), and the same thing could apply to Spanish variants (as a Spanish native speaker, I'm 100% sure in this case), would prefer English string inside a Portuguese piece of software than other variant's Portuguese strings.

I wouldn't really focus that much on what's happening right now on WordPress core (that all variants can be translated at 100%,as they are now) where strings may be different in both variants, and focus instead on plugins/themes: where this is for sure a bigger pain (unstranslated stuff) and where users could benefit of having something closer to their language (a variant of the same language) than to English.

My five cents

@vitormadeira
Copy link

@vitormadeira vitormadeira commented Jul 13, 2017

@xavivars your message is quite important, in my point of view, as it somehow resembles what might become available for the two main Portuguese variants (Portugal vs Brazil). Of course we have our differences that must remain and be respected by both of each county users, but having some "premade" material (I mean translations) would be really much better than not having anything at all.

This is software translation! So we, as WP translators, must think on users in the first place (as you said - "focus less on that happens in WordPress core") and let go a bit of our "dev" / "webmasters" / "web designers" way of thinking.

As a manner of fact, I believe that whatever path this feature takes, if Portugal and Brazil users could share some root work, it would give pretty much agility on getting better final work.

One other idea that comes to mind, could be changing the way that variants workflow would work.
For example: for the two Portuguese variants, and considering a new theme/plugin that arrives into the WP themes/plugins directory (so, zero strings translated yet).
If a Brazilian user translates some strings, and the Portuguese (Portugal) variant has no translations yet, then it would be precious to have the "suggested" translation from the first folks who got there.
Then a PTE / GTE would have just to see what would need attention and modify / leave as it is and only after this human checking, the strings would be available on the Locale.

And the same vice-versa: If it would be a Portuguese translator who comes first, the strings would be available for the Brazil team to review and change / approve following their local guidelines.

This would be a tremendous help for collaboration and speed up translations - and not as someone said above, trying to "merge" both variants.

I would like to finalise quoting your precious words: "is really hard to believe that most of the Portuguese speaking users (in any variant), (...) would prefer English string inside a Portuguese piece of software than other variant's Portuguese strings." -> this is so true!

But also stating that if that would not be possible (because of local teams not wanting to) maybe my idea on changing the translation workflow might become a bit closer of what both teams would prefer?

@toolstack
Copy link
Contributor

@toolstack toolstack commented Jul 14, 2017

@robertsky thanks for the info on the Chinese languages, I'll update the code.

@toolstack
Copy link
Contributor

@toolstack toolstack commented Jul 14, 2017

@vitormadeira In a very limited case of two variants effectively being peers of each other, each could be the root of the other, this wouldn't work with 3 or more variants of course, but there's no reason with two it could not be done.

@toolstack toolstack modified the milestones: 3.0, Future Aug 21, 2017
toolstack added a commit that referenced this issue Oct 8, 2017
toolstack added a commit that referenced this issue Oct 17, 2017
@ocean90 ocean90 modified the milestones: 3.0, Future Apr 8, 2018
@ocean90
Copy link
Member Author

@ocean90 ocean90 commented Apr 8, 2018

Moving this out of 3.0 pending further discussions and to give us more time to focus on CLDR first.

@toolstack
Copy link
Contributor

@toolstack toolstack commented Apr 9, 2018

Note sure more time is what's needed here, the PR has been sitting for months without further discussion taking place.

I'd rather delay 3.0 and land this than just put it off.

@toolstack toolstack modified the milestones: Future, 3.0 Oct 15, 2018
toolstack added a commit that referenced this issue Oct 25, 2018
@ocean90 ocean90 mentioned this issue Mar 28, 2019
3 of 3 tasks complete
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
10 participants
You can’t perform that action at this time.