Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Localizing strings (translations) #136

Closed
Reinmar opened this issue Mar 3, 2016 · 24 comments
Closed

Localizing strings (translations) #136

Reinmar opened this issue Mar 3, 2016 · 24 comments

Comments

@Reinmar
Copy link
Member

Reinmar commented Mar 3, 2016

We had a short discussion with @fredck and we came up with something like this.

There will be editor.t() function also available in the View class (it will be injected to the constructor). The function will accept two strings:

  1. the string to be localized in English,
  2. optional disambiguation comment for the first string.

Usage:

t( 'Close' );                  // as 'to close' (no disambiguation means that it's the typical meaning of 'close')
t( 'Close', 'distance' );      // as 'close to me'

Strings identification

Unlike in CKEditor 4 the language strings will not be identified by ids, but the developer will provide the whole text when calling t():

t( 'Press this button to close the dialog.' );

In the source language files, the translations will be referenced by those full original texts:

{
    "Press this button to close the dialog.": "Naciśnij ten guzik by zamknąć okno dialogowe."
}

Building

The builder will scan for t() usage and will replace them with simpler t( num ) calls. That will better optimise the contents of language files, because now they won't need to contain duplicated strings (in English and the target languages) as well as no ids:

[
    "Naciśnij ten guzik by zamknąć okno dialogowe.",
    "Click and drag to move the dialog." // this string didn't have translation so its source is included
]

Those translations will be then referenced from code as t( 0 ) and t( 1 ).

If the disambiguation comment was provided, it will be of course treated as another language entry, generating a different number.

Thanks to the builder mechanism we will be able to produce optimised language files with only these strings which are required. However, it means that t() must always be called with a string. You cannot do t( myVariable ).

Disambiguation

The disambiguation comment is needed for cases where we noticed that some string already exists in the translations, but we want to use it now in a different meaning. This is unlikely in case of a full sentences but may happen in case of single words like "close" or "bold".

The developer who plans to use some string should check if such string is already used and if so, verify which meaning has a higher chance to be the default. The alternative (less frequent) meanings should be disambiguated.

Additional params

It happens sometimes that we need to interpolate some language strings. E.g. "There are %0 items.". `t() could accept those values as a last param in for of an array (to distinguish it from the disambiguation string):

t( 'There are %0 items.', [ items.length ] );
@wimleers
Copy link

wimleers commented Mar 3, 2016

Pinged @goba, I'm sure he'll have thoughts on this.

@wimleers
Copy link

wimleers commented Mar 3, 2016

Note that what you call "disambiguation", Drupal calls "context". i.e. the context in which a string is used.

@goba
Copy link

goba commented Mar 3, 2016

Indeed Drupal (and gettext in general) has a notion of context, which is a disambiguation string to explain the meaning of (usually) shorter ambiguous strings. This usage of t() is very similar to Drupal. Also your idea to try to make people not t() variables. Not sure what are you gaining exactly vs. the translation system in previous versions?

@Reinmar
Copy link
Member Author

Reinmar commented Mar 3, 2016

Not sure what are you gaining exactly vs. the translation system in previous versions?

I think that the most significant improvement is the size of the build. We'll get rid of language string ids (which in CKE4 are e.g. "invalidHtmlLength"), but more important we'll be able to get rid of unused language strings. The latter will be very important, because most likely there will be one language file for a whole package (CKEditor 5 will be built of multiple npm packages) and each package can contain multiple features. The developer will be able to choose feature he/she wants and it will be done on a much more granular level than in CKEditor 4. E.g. there will be a bascistyle package with bold, italic, underline, etc. features. The developer will be able to choose only basicstyles/bold, so translations for strings used in the other features should not be added to the build.

@fredck
Copy link
Contributor

fredck commented Mar 3, 2016

Drupal calls "context"

I agree... "context" is the right way to call it.

@goba
Copy link

goba commented Mar 3, 2016

Sounds good! Did you think about singular/plural strings? Eg. 'Uploaded 3 images', 'Uploaded 1 image', etc.

@fredck
Copy link
Contributor

fredck commented Mar 3, 2016

Did you think about singular/plural strings? Eg. 'Uploaded 3 images', 'Uploaded 1 image', etc.

@goba, how are you guys handling this?

@pjasiun
Copy link

pjasiun commented Mar 3, 2016

What about tagged template strings? The code may looks like this:

t`There are ${items.length} items`

In the dev version t tag will build the tag (There are ${0} items) and translate it.

In the build version it could be replaced with: t0${items.length}``.

It is only matter of the code style, but since tagged template strings are available in ES6, I prefer to have tThere are ${items.length} items`` in the code instead of t( 'There are %0 items.', [ items.length ] );.

@goba
Copy link

goba commented Mar 3, 2016

So Drupal has a dedicated formatPlural(count, '1 item', '@count items') for this. Then languages may have any number of plurals. See eg. numbers ending in plurals in http://www.russianlessons.net/lessons/lesson11_main.php. So we have a representation of the rules for plurals AKA plural formulas (see http://localization-guide.readthedocs.org/en/latest/l10n/pluralforms.html) and translate the combination of the singular+plural original English to a variable number of translated strings. Then pick based on the language appropriate rules from that list.

@Reinmar
Copy link
Member Author

Reinmar commented Mar 3, 2016

@pjasiun: That may be a good idea. But I don't get the build format that you proposed:

t`0${items.length}`

We need first the language string number and later the values require to replace ${0}...${N} in the string taken from the language file. How would the t() function look?

@pjasiun
Copy link

pjasiun commented Mar 3, 2016

t`0${items.length}${bar}`

is exactly the same as

t( [ 0, '', '' ], items.length, bar );

@Reinmar
Copy link
Member Author

Reinmar commented Mar 3, 2016

I realised that the templates won't accept additional params (e.g. the context, or the plural form if we'll decide to implement it).

@pjasiun
Copy link

pjasiun commented Mar 3, 2016

I think that for a different context we could have a separate string (tButton{context:toolbar}``), and for plural we should have a separate string anyway, because we will not be able to predict how the string will looks in every language.

But you can always call t() directly if you have a special case which can not be handled by a template tag.

@AnnaTomanek
Copy link

I'll leave implementation details to you guys. I assume that the implementation will also take into account creating a tool that will be converting whatever we come up with into whatever Transifex needs. In any case, check out Transifex documentation to make sure this will not be too much of a hassle for us later, e.g. in http://docs.transifex.com/formats/

We definitely need to provide the context for any language string that we create - something that we now have in our meta files. When taken out of context (i.e. viewed in the translation tool and not in the working editor) language strings are often ambiguous and do not mean much to translators.

Also remember that some of our collaborators do not really use CKEditor at all - they are professional/hobbyist translators who help us for many different reasons and we are grateful for their help - we can't make their lives difficult by providing ambiguous stuff, this will only backfire on the quality and quantity of translations that we get.

@Reinmar
Copy link
Member Author

Reinmar commented Mar 3, 2016

@pjasiun

I think it's not gonna work or at least have a high chance of causing issues. The t() function will need to handle the params in the order in which template would pass them, what may make it unusable for manual calls. Plus, we need to use hacks like {context:toolbar}.

@Reinmar
Copy link
Member Author

Reinmar commented Mar 3, 2016

One thing that @AnnaTomanek wrote worries me – if most (if not all) translation strings need the context, we cannot pass that context to t() because the code will be unreadable. Those descriptions need to be kept outside, most likely in some meta file. To the t() function we need to pass the context only if it's needed to identify that language string – i.e. where its used twice with different meanings.

@fredck
Copy link
Contributor

fredck commented Mar 3, 2016

Those descriptions need to be kept outside, most likely in some meta file.

Agree and this could be updated by the builder when updating language files for translators.

@fredck
Copy link
Contributor

fredck commented Mar 3, 2016

I think going with plain t() functions calls would make it simpler and clearer. String templates don't fit exactly our needs here.

@goba
Copy link

goba commented Mar 3, 2016

Why do you need context if not to disambiguate?
On Mar 3, 2016 15:28, "Piotrek Koszuliński" notifications@github.com
wrote:

One thing that @AnnaTomanek https://github.com/AnnaTomanek wrote
worries me – if most (if not all) of translation strings need the context,
we cannot pass that context to t() because the code will be unreadable.
Those descriptions need to be kept outside, most likely in some meta file.
To the t() function we need to pass the context only if it's needed to
identify that language string – i.e. where its used twice with different
meanings.


Reply to this email directly or view it on GitHub
#136 (comment)
.

@Reinmar
Copy link
Member Author

Reinmar commented Mar 3, 2016

I'm not sure what is the sufficient data for translators. I'm referring to what we have in CKEditor 4: https://github.com/ckeditor/ckeditor-dev/blob/master/dev/langtool/meta/ckeditor.core/meta.txt

If we need all these descriptions we need to keep those in a separate file, because putting those in t() calls may make them awfully long and we would need to repeat those descriptions every time t() is used.

So this was one type of the context – the human-readable long descriptions of language strings.

Another context I see is where a single string is used with two meanings – then we need to clarify to which meaning the t() call refers to. Perhaps we can do that in many ways, but the one I initially thought was to add another word, so a short context:

t( 'button' );
t( 'button', 'clothing' ); 

// or some other format like:
 t( 'button[ctx:clothing]' )

This makes another type of a context.

I wonder if we could keep t() calls short, DRY, and still provide enough information for translators.

@goba
Copy link

goba commented Mar 3, 2016

Right, the button, clothing example is good. I don't think you would need
context for most strings in that meta.txt, they are long. If you don't
provide context (as in t('button')) because providing context would be too
long, then you have no context, so if you want to map it to something with
context, you would need to pick an intermediary identifier, which you seem
to be trying to get away from :)

On Thu, Mar 3, 2016 at 4:20 PM, Piotrek Koszuliński <
notifications@github.com> wrote:

I'm not sure what is the sufficient data for translators. I'm referring to
what we have in CKEditor 4:
https://github.com/ckeditor/ckeditor-dev/blob/master/dev/langtool/meta/ckeditor.core/meta.txt

If we need all these descriptions we need to keep those in a separate
file, because putting those in t() calls may make them awfully long and
we would need to repeat those descriptions every time t() is used.

So this was one type of the context – the human-readable long descriptions
of language strings.

Another context I see is where a single string is used with two meanings –
then we need to clarify to which meaning the t() call refers to. Perhaps
we can do that in many ways, but the one I initially thought was to add
another word, so a short context:

t( 'button' );
t( 'button', 'clothing' );

// or some other format like:
t( 'button[ctx:clothing]' )

This makes another type of a context.

I wonder if we could keep t() calls short, DRY, and still provide enough
information for translators.


Reply to this email directly or view it on GitHub
#136 (comment)
.

@fredck
Copy link
Contributor

fredck commented Mar 4, 2016

When it comes to the "plural" problem, I thought about a sample case we face with our dear lovable Polish. For those who didn't read it elsewhere, and example:

  • en: 1 file, 2 files, 3 files ...
  • pl: 1plik, 2/3/4 pliki, 5 plików ... 11 plików, 12 plików ... 21 plików, 22/23/24 pliki, 25 plików ...

While in English we have 2 cases with a simple rule, in Polish we have 3 cases with a complex algorithm to define which one to use (why not?!).

My proposal is making t() smarter so we can have t( number, '%n file', '%n files', 'context' ) (first param number + 2 mandatory plural forms in English + optional context) which uses a language specific function to determine which option to use. Each language implementation would be able to define its own function, defaulting to the English (most common) implementation.

Then, the source language file would endup with an entry like this (translators will have to be trained to do this right):

"%n file | %n files" : "%n plik | %n pliki | %n plików"

@goba
Copy link

goba commented Mar 4, 2016

Right, you definitely need a way to encapsulate that logic for looking up
the right variant. Drupal does this with a lookup table generated from the
math formula, because we do not allow translations to have their own logic,
but if you allow them to have logic or ship with a set of logic for the
languages you explicitly support, that works. As for the structure of the
source/target, it would be best to ask your translations what would work
for them. Drupal uses gettext .po files for translation transport, which
has a standard way to represent these and our UIs support plural variants
as well natively. Since you are making up your own more compact format, it
would be best to verify with the people you are already working with.

On Fri, Mar 4, 2016 at 1:40 PM, Frederico Caldeira Knabben <
notifications@github.com> wrote:

When it comes to the "plural" problem, I thought about a sample case we
face with our dear lovable Polish. For those who didn't read it elsewhere,
and example:

  • en: 1 file, 2 files, 3 files ...
  • pl: 1plik, 2/3/4 pliki, 5 plików ... 11 plików, 12 plików ... 21
    plików, 22/23/24 pliki, 25 plików ...

While in English we have 2 cases with a simple rule, in Polish we have 3
cases with a complex algorithm to define which one to use (why not?!).

My proposal is making t() smarter so we can have t( number, '%n file',
'%n files', 'context' ) (first param number + 2 mandatory plural forms in
English + optional context) which uses a language specific function to
determine which option to use. Each language implementation would be able
to define its own function, defaulting to the English (most common)
implementation.

Then, the source language file would endup with an entry like this
(translators will have to be trained to do this right):

"%n file | %n files" : "%n plik | %n pliki | %n plików"


Reply to this email directly or view it on GitHub
#136 (comment)
.

@Reinmar
Copy link
Member Author

Reinmar commented Jan 10, 2017

The implementation of translation service is being lead in ckeditor/ckeditor5#387 and the discussions started in this topic are continued there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants