Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Localisation support #39

Closed
desplesda opened this issue Apr 11, 2016 · 6 comments
Closed

Localisation support #39

desplesda opened this issue Apr 11, 2016 · 6 comments
Assignees

Comments

@desplesda
Copy link
Collaborator

So. I've been thinking about how Yarn can support localisations, and this is the most straightforward that I've come up with so far. Big thanks to Anna Kipnis of Double Fine and Ron Gilbert of Terrible Toybox for their advice that helped to figure this out.

I'm very interested in comments and discussion on this, because this is a large set of changes to Yarn as a language.

Executive Summary

This issue outlines a system for supporting localisation in Yarn. It includes both a high-level overview of how such a system can work, and the syntax changes needed in the Yarn language.

I propose that localisation work by providing a tool that moves out all localisable text - both lines, and localisable string values - into a per-language file, and replaces them with unique linecodes. Additional syntax should be added to the Yarn language to support this.

Localising lines

Given the following script:

<<if $knows_plot == true>>
    I know about the villain's plot!
<<endif>>

Localisation would involve converting the script into something like this:

<<if $knows_plot == true>>
    {0201}
<<endif>>

In this case, {0201} is a reference to a line that's kept in a separate file. The format of the file doesn't really matter, except insofar you could use it and the number "0201" and get back the string "I know about the villain's plot!". It could be a database of somekind, or an Excel spreadsheet - it doesn't particularly matter. We'll come back to this in a bit.

We can very easily create a tool that identifies and replaces lines with unique line numbers, and exports to a file. However, it isn't without its problems.

The serious disadvantage of doing this is that it adds a burden to the writer if they need to make changes to the script. With the text of the line separated from the logic of the Yarn program, it adds the hassle of switching between the script and the exported line file. We can mitigate this a little by adding a comment containing the text, like so:

<<if $knows_plot == true>>
    {0201} // I know about the villain's plot!
<<endif>>

Running the localisation system would update any comment like this to ensure that the comment attached to the line code correctly represents what's in the line file.

Localising lines with embedded values

So, that's simple lines. Let's now talk about lines that incorporate other values. Currently, we don't have any syntax for doing this at all; I've added notes in #25 to discuss that.

Using the markup proposed in #25, let's say we have a line that looks like this:

He's going to [$method_of_destruction] the [$target] on [$day_of_week]!

Pretty much any localisation system that permits embedding values in a line does so in a way that allows for re-ordering the embedded values. This is for sentences that present words in a different order than the original language:

This means that localising a line means converting the above example into something like this, for English:

He's going to $0 the $1 on $2!

And in this for German:

Er wird das $1 am $2 $0!

(Please excuse the potentially incorrect translation - I'm not a native German speaker.)

In the case of this line, we need to store in the per-language file both the line, and all of the localised values used in the line.

Given this, and the fact that the values in this example come from variables at runtime, we need a way to represent this linecode. I propose that it look like this:

{0202,$method_of_destruction,$target,$day_of_week} // He's going to {0} the {1} on {2}!

In this case, we're specifying line 0202, and providing a comma-separated list of the variables used to supply. The position of these variables is important, since it's used to specify which variable is used for each of the numbered values. A comment is also added by the localisation tool to indicate both the original line, and the context of the variables used in it.

Localising string values in expressions

We've just talked about embedding string variables in localised lines. However, the values used in these variables need to be localised as well.

The syntax for this could look like this:

<<set $target = {"the stadium"}>>

After localisation processing, the string is extracted and replaced with a code. Additionally, a comment is appended to the line, to indicate what string is represented.

<<set $target = {0541}>> // "the stadium"

By using the brace syntax in the same way as is used in lines, we reinforce the idea that numbers in braces represent localised strings, and their value will change based on which language file is being used.

Generating line files

Earlier, I mentioned that the specific format used to store lines doesn't actually matter. There are several formats that exist for storing localised text; these range from simple CSV and Excel spreadsheets to more sophisticated solutions like XLIFF.

It's my intent that Yarn Spinner try to stay within as limited a domain as it can. Users of Yarn may already be using some kind of localisation system that they're comfortable with, and we don't need to try to invent a new one.

I think that it would be OK for Yarn Spinner to put the burden of loading localised strings given a linecode onto the host application. Much as the Dialogue class requires that the host provide functions for implementing variable storage, we could do a similar thing for lines.

We could then provide a simple reference implementation that handles common cases like CSV files, for users who aren't currently using a localisation system already (which I expect will be most users), while more sophisticated systems could make use of their own systems or 3rd-party localisation systems.

Disadvantages

Linecodes can be obtuse. We may want to look at letting users define their own linecodes - perhaps a more human-readable representation.

The primary disadvantage of this system is that making changes to localised lines requires updating each of the localised files to make changes, or to blow away the entire line and generate a new entry in each of the localisation files. However, this is a disadvantage that pretty much any localisation system encounters.


So, those are my thoughts. What do you all think? Is there a use case that I've missed? Do you have any thoughts on how this could be better?

@fpiesche
Copy link

Worth noting that string substitution in general is asking for trouble when it comes to localization - in a lot of non-English languages, words change depending on their context, most frequently thanks to different genders on nouns associated. This caused no end of headaches on my previous games translation job - we'd have to constantly go back to the script folk and ask them to either rewrite lines or add enumerations to the script so we could adjust the substitution to a different string in an enumeration to match the word being substituted.

As an example: "A hungry {cat}" vs "A hungry {dog}" in German are "Eine hungrige {Katze}" and "Ein hungriger {Hund}" - and so the entire base phrase would need to be switched out depending on which animal is substituted in. This would also happen with your "Er wird das $0 am $2 $1" example - the base phrase here would have to change depending on the gender of the target noun and the person doing the destroying, and suddenly we're left with 8 possible base phrases. ("Er/sie/es wird den/die/das $0 am $2 $1").

I've never put enough thought into this to see if there's a sensible solution to this problem on the tech side short of generating all permutations and translating them separately, and to a degree this is down to the person writing the script and the translator to keep in mind and figure out together, but I figured it might be worth bringing up.

@desplesda
Copy link
Collaborator Author

Those are some really good points. Thanks!

Another problem with embedded values (which isn't directly related to localization) is that embedding values makes it extremely difficult to generate a voiceover script.

Do you think that it's worth restricting embedded values to those that Yarn Spinner can be aware of, and use that to generate permutations? For example, Yarn Spinner could refuse to localize a line that contains an embedded variable unless that variable is only ever assigned to string constants (and never functions) in the entire program...

@fpiesche
Copy link

Assigning a variable to functions might actually even be a valid way of working around the problem (by returning an appropriate string for the line it's on... somehow? I'm not sure how YS actually works under the hood) so I'd say keep that venue open.

As one option, would it be possible as it stands to translate lines as functions (plausibly with variables passed through as parameters), to make it possible for translations to perform complex operations to generate the "proper" sentence? If not, having the ability to do this would still require the developer and translator to work together fairly closely but would be easier than making per-language exceptions in the script for the source language...

@desplesda
Copy link
Collaborator Author

Honestly, I'm starting to think that localizing embedded values is an entirely separate problem that should be solved after the infrastructure required to support localising complete lines is added.

We could then gradually introduce support for these more complex situations after shaking out issues with the fundamental approach to how we're dealing with localization as a whole.

@desplesda
Copy link
Collaborator Author

For those interested: this feature is now under active development.

@desplesda
Copy link
Collaborator Author

This feature shipped in v1.1, so I'm closing it now. Just over four years since it was opened. Thanks to everyone who was part of this conversation ❤️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants