-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Localizing strings (translations) #136
Comments
Pinged @goba, I'm sure he'll have thoughts on this. |
Note that what you call "disambiguation", Drupal calls "context". i.e. the context in which a string is used. |
Indeed Drupal (and gettext in general) has a notion of context, which is a disambiguation string to explain the meaning of (usually) shorter ambiguous strings. This usage of t() is very similar to Drupal. Also your idea to try to make people not t() variables. Not sure what are you gaining exactly vs. the translation system in previous versions? |
I think that the most significant improvement is the size of the build. We'll get rid of language string ids (which in CKE4 are e.g. "invalidHtmlLength"), but more important we'll be able to get rid of unused language strings. The latter will be very important, because most likely there will be one language file for a whole package (CKEditor 5 will be built of multiple npm packages) and each package can contain multiple features. The developer will be able to choose feature he/she wants and it will be done on a much more granular level than in CKEditor 4. E.g. there will be a |
I agree... "context" is the right way to call it. |
Sounds good! Did you think about singular/plural strings? Eg. 'Uploaded 3 images', 'Uploaded 1 image', etc. |
@goba, how are you guys handling this? |
What about tagged template strings? The code may looks like this:
In the dev version In the build version it could be replaced with: It is only matter of the code style, but since tagged template strings are available in ES6, I prefer to have |
So Drupal has a dedicated formatPlural(count, '1 item', '@count items') for this. Then languages may have any number of plurals. See eg. numbers ending in plurals in http://www.russianlessons.net/lessons/lesson11_main.php. So we have a representation of the rules for plurals AKA plural formulas (see http://localization-guide.readthedocs.org/en/latest/l10n/pluralforms.html) and translate the combination of the singular+plural original English to a variable number of translated strings. Then pick based on the language appropriate rules from that list. |
@pjasiun: That may be a good idea. But I don't get the build format that you proposed:
We need first the language string number and later the values require to replace |
is exactly the same as
|
I realised that the templates won't accept additional params (e.g. the context, or the plural form if we'll decide to implement it). |
I think that for a different context we could have a separate string ( But you can always call |
I'll leave implementation details to you guys. I assume that the implementation will also take into account creating a tool that will be converting whatever we come up with into whatever Transifex needs. In any case, check out Transifex documentation to make sure this will not be too much of a hassle for us later, e.g. in http://docs.transifex.com/formats/ We definitely need to provide the context for any language string that we create - something that we now have in our meta files. When taken out of context (i.e. viewed in the translation tool and not in the working editor) language strings are often ambiguous and do not mean much to translators. Also remember that some of our collaborators do not really use CKEditor at all - they are professional/hobbyist translators who help us for many different reasons and we are grateful for their help - we can't make their lives difficult by providing ambiguous stuff, this will only backfire on the quality and quantity of translations that we get. |
I think it's not gonna work or at least have a high chance of causing issues. The |
One thing that @AnnaTomanek wrote worries me – if most (if not all) translation strings need the context, we cannot pass that context to |
Agree and this could be updated by the builder when updating language files for translators. |
I think going with plain |
Why do you need context if not to disambiguate?
|
I'm not sure what is the sufficient data for translators. I'm referring to what we have in CKEditor 4: https://github.com/ckeditor/ckeditor-dev/blob/master/dev/langtool/meta/ckeditor.core/meta.txt If we need all these descriptions we need to keep those in a separate file, because putting those in So this was one type of the context – the human-readable long descriptions of language strings. Another context I see is where a single string is used with two meanings – then we need to clarify to which meaning the
This makes another type of a context. I wonder if we could keep |
Right, the button, clothing example is good. I don't think you would need On Thu, Mar 3, 2016 at 4:20 PM, Piotrek Koszuliński <
|
When it comes to the "plural" problem, I thought about a sample case we face with our dear lovable Polish. For those who didn't read it elsewhere, and example:
While in English we have 2 cases with a simple rule, in Polish we have 3 cases with a complex algorithm to define which one to use (why not?!). My proposal is making Then, the source language file would endup with an entry like this (translators will have to be trained to do this right):
|
Right, you definitely need a way to encapsulate that logic for looking up On Fri, Mar 4, 2016 at 1:40 PM, Frederico Caldeira Knabben <
|
The implementation of translation service is being lead in ckeditor/ckeditor5#387 and the discussions started in this topic are continued there. |
We had a short discussion with @fredck and we came up with something like this.
There will be
editor.t()
function also available in the View class (it will be injected to the constructor). The function will accept two strings:Usage:
Strings identification
Unlike in CKEditor 4 the language strings will not be identified by ids, but the developer will provide the whole text when calling
t()
:In the source language files, the translations will be referenced by those full original texts:
Building
The builder will scan for
t()
usage and will replace them with simplert( num )
calls. That will better optimise the contents of language files, because now they won't need to contain duplicated strings (in English and the target languages) as well as no ids:Those translations will be then referenced from code as
t( 0 )
andt( 1 )
.If the disambiguation comment was provided, it will be of course treated as another language entry, generating a different number.
Thanks to the builder mechanism we will be able to produce optimised language files with only these strings which are required. However, it means that
t()
must always be called with a string. You cannot dot( myVariable )
.Disambiguation
The disambiguation comment is needed for cases where we noticed that some string already exists in the translations, but we want to use it now in a different meaning. This is unlikely in case of a full sentences but may happen in case of single words like "close" or "bold".
The developer who plans to use some string should check if such string is already used and if so, verify which meaning has a higher chance to be the default. The alternative (less frequent) meanings should be disambiguated.
Additional params
It happens sometimes that we need to interpolate some language strings. E.g. "There are %0 items.". `t() could accept those values as a last param in for of an array (to distinguish it from the disambiguation string):
The text was updated successfully, but these errors were encountered: