Support translations for 3rd party mods #505

olanti-p · 2021-05-04T20:27:58Z

Summary

SUMMARY: Infrastructure "New translation system with support for 3rd party mods"

Purpose of change

Allow 3rd party mods to ship their own MO files (CleverRaven/Cataclysm-DDA#25566).
Fix #495.
Fix plural forms for some languages on Android.

Describe the solution

Roll out custom runtime localization system (dubbed cata_libintl) that is compatible with gettext MO files.

Key differences from currently used GNU libintl:

Does not support translation domains.
BN only uses "default" translation domain, and rewriting toolchain to support multiple domains and rewriting the code to actually use them would be a huge and bugprone undertaking.
Does not support encoding conversion, operates on UTF-8 only.
BN only uses UTF-8, so that's fine.
Does not depend on locale / environment variables.
Dependence on locale / environment variables caused numerous problems with libintl in the past, as Cata aims to support multiple platforms and compilers. The most recent issue is Scaling factor x2 and x4 resets language to English #495, which seems to be caused by SDL fiddling with locale, and is absent when using cata_libintl.
Supports loading MO files from arbitrary paths.
Or from memory. This is currently used only in tests.
Supports loading multiple MO files into a single "domain".
To enable multi-MO support without the ability to use domains, the new system has to be able to merge MO files "on the fly". It is currently possible to do so manually via gettext utilities (use msgunfmt to decompile base game MO back into PO, then msgcat to concatenate with PO from a mod, then msgfmt to compile back into MO), but the process takes some time (20 seconds on my laptop) and has to be repeated each time game updates or mod list changes.

Also add documentation on how to translate 3rd party mods.

Describe alternatives you've considered

Call gettext utilities from within the game to merge MO files and use some caching algorithm to cut down on re-compilations.
This solution has multiple potential problems:
- It may require special logic to deal with dialect mismatch (I do know that en and en_US are treated differently by libintl, and the same probably applies to msgcat).
- It requires code for calling other process from within the game. The code that'll have to work on all platforms/compilers...
- It requires gettext utilities to be shipped with the game, or accessible to the game. Not a problem on Linux/OSX (may have to ask user to install from apt/brew/whatever), and a bit annoying on Windows (have to include them with the distribution), but I have no idea how it works on Android, or whether GNU gettext utilities have Android binaries.
- We'd have to keep track of various incompatibilities that are bound to show up on some platforms, and that's a whole separate headache I'd rather not deal with.
- On Android instead of libintl Cata uses custom fork of libintl-lite, which would be its own source of fun little inconsistencies someone would have to sort through.
Drop gettext completely and use some alternative localization system (e.g. tables of string id : translation as many games and software projects do), possibly with built-in merge function.
We have over 10 thousand strings in source code and over 30 thousand in JSONs. Moving from gettext would be a pain.
Use some existing lightweight library for working with gettext files.
The only one I've found that wouldn't be a pain to use is spirit-po, but it requires Boost.

Testing

All tests pass.

Benchmarks (Intel Core i5-3230M)

Filters: [libintl][benchmark]
N strings: 43625

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
cata_test is a Catch v2.12.0 host application.
Run with -? for options

-------------------------------------------------------------------------------
bench_get_translated_string
-------------------------------------------------------------------------------
cata_libintl_test.cpp:620
...............................................................................

benchmark name                       samples       iterations    estimated
                                     mean          low mean      high mean
                                     std dev       low std dev   high std dev
-------------------------------------------------------------------------------
get_all_strings                                100             1      6,2616 s 
                                        62,7425 ms    62,4491 ms    63,1855 ms 
                                        1,80806 ms    1,33141 ms    2,43433 ms 
                                                                               

File size: 9789435 bytes
N strings: 43625
-------------------------------------------------------------------------------
bench_parse_mo
-------------------------------------------------------------------------------
cata_libintl_test.cpp:651
...............................................................................

benchmark name                       samples       iterations    estimated
                                     mean          low mean      high mean
                                     std dev       low std dev   high std dev
-------------------------------------------------------------------------------
parse MO                                       100             1     2,46908 s 
                                        24,7708 ms    24,7168 ms    24,9087 ms 
                                        408,696 us    196,054 us    809,159 us 
                                                                               

File size: 9789435 bytes
N strings: 43625
-------------------------------------------------------------------------------
bench_asssemble_trans_lib
-------------------------------------------------------------------------------
cata_libintl_test.cpp:669
...............................................................................

benchmark name                       samples       iterations    estimated
                                     mean          low mean      high mean
                                     std dev       low std dev   high std dev
-------------------------------------------------------------------------------
parse MO + assemble lib                        100             1     6,14146 s 
                                        61,5096 ms    61,4134 ms    61,6845 ms 
                                         642,66 us    403,011 us    960,402 us 
                                                                               

===============================================================================
test cases: 3 | 3 passed
assertions: - none -

Testing / benchmarking "in the wild" didn't indicate any significant changes in cpu/memory usage, and MO parsing + string sorting always took less than 50 ms. Reading from disk took 0..500 ms depending on os / hard drive / filesystem cache.

Additional context

Low-priority stuff for future PRs:

Allow translating mod name/description (either by "peeking" into the mod's MO and taking translation from there, or by adding translated name/description to modinfo.json)
Once testing is finished, get rid of libintl code and remove it from build dependencies

This makes both libintl and cata_libintl try to use MO files where dialect is specified and matches language (e.g. fr_FR instead of just fr) before falling back to using dialect-agnostic version.

Cherry-picked from commit CleverRaven/Cataclysm-DDA@35a2d9e Co-authored-by: Zhilkin Serg <ZhilkinSerg@users.noreply.github.com>

Coolthulhu · 2021-05-07T20:01:53Z

data/raw/languages.json

@@ -1,14 +1,14 @@
 [
  {
    "//": "See language.h for documentation",
-    "id": "en",
+    "id": "en_US",


I'll probably have to adjust the Transifex settings regarding language names after this.
From what I can tell, it won't allow en_US to count as en, even if there is no other en.

When dialect is specified in language id, the game picks up both dialect-specific and generic MO files (both old and new implementations), so it should be fine as is

Coolthulhu · 2021-05-07T20:12:45Z

tests/cata_libintl_test.cpp

+//
+// This test reaffirms the assumption that both Transifex's and GNU's plf expressions
+// produce same values for integer numbers.
+TEST_CASE( "gnu_transifex_rules_equal", "[libintl][i18n][.]" )


Dot because slow?

Yeah, 9-10 seconds.

Coolthulhu · 2021-05-07T20:15:41Z

tests/data/cata_libintl/empty_file.mo

@@ -0,0 +1 @@
+��


Not exactly empty. Is it supposed to be "empty MO" with magic numbers at start, truly empty file that ended up not-empty, or something else?

The logic here is that if the file is empty, or is smaller than 4 bytes, it's guaranteed to not be able to hold a magic number, ergo be a MO file.
I should've renamed it to not_a_mo_file.mo or something similar.

Coolthulhu · 2021-05-07T21:10:16Z

Got it working with the docs, but it may be useful to have a "tl;dr" variant of the docs somewhere.
Just the commands, without the context, so that a modder/translator can paste them instead of having to understand things first.

cainiaowu · 2021-05-08T09:38:17Z

For translation string collision, maybe the current extraction script need to be able to add some context like mod id.

olanti-p · 2021-05-08T10:25:27Z

For translation string collision, maybe the current extraction script need to be able to add some context like mod id.

I thought about that; in this case, the game would also have to know under which context to look for translations.

The main roadblock is how the game handles data loaded from JSON.

There is no such "context" right now that could allow to differentiate between whether an item has been loaded from one mod or another, or whether half of its strings come from a completely different mod (e.g. the item's name has been modified via copy-from, but the description stays the same).

Someone would have to go over every type of data loadable from JSON (items, monsters, mutations...) and make sure that every translatable string receives the proper context. This is a lot of work, and is bound to produce various obscure bugs noone would notice until much later.

The ideal solution would be to move away from gettext's "string is a translation lookup key and is also an English translation" to "JSON files contain lookup keys, localization files contain all translations including English" schema used by many games and software projects, so that mod authors could append mod id to all lookup keys in both JSON and localization data, but with 30k+ strings in JSON that would take a lot of work for us and break mod compatibility for everyone else.

vorpal-void · 2021-05-09T00:13:38Z

Got it working with the docs, but it may be useful to have a "tl;dr" variant of the docs somewhere.
Just the commands, without the context, so that a modder/translator can paste them instead of having to understand things first.

@Coolthulhu I've tried to do a little translation with current docs, everything went fine! The most "complex" part is to run scripts properly. Here is simplified steps that most of the modders will be able to follow, this one is for Windows:

Install PoEdit
Install Python
Open Windows CMD and type pip install polib
Copy extract_json_strings.py and dedup_pot_file.py from the lang folder of repository to the mod folder you wish to translate
In the folder mod create batch file with next content:

CD /D %~dp0
md lang
python extract_json_strings.py -i "%cd%" -o "%cd%\lang\extracted_strings.pot" --project Mod_Translation
python dedup_pot_file.py "%cd%\lang\extracted_strings.pot"

After launching this .bat file there will be extracted_strings.pot in the mod's lang directory, which can be edited with Poedit
[Poedit instructions here, there are simple enough already]

the .bat file could be distributed along the scripts too, if it's applicable; in that case step 4 should include that file as well, and step 5 will be obsolete.

olanti-p · 2021-05-09T21:15:11Z

the .bat file could be distributed along the scripts too

Yeah, that should make it easier.

Apparently full paths on Windows freak out Poedit (as : is normally used as file_name:line separator):

Screenshots

and it refuses to work with relative paths for some unknown to me reason:

More screenshots

Same with ./effects.json, .\effects.json, ..\effects.json, .\\effects.json and ..\\effects.json

So seeing as there is no difference either way I think we can use relative path .\ to keep references short and cut down on POT/PO sizes:

python extract_json_strings.py -i .\ -o lang\extracted_strings.pot --project Mod_Translation
python dedup_pot_file.py lang\extracted_strings.pot

Then tweak extract_json_strings.py to deduce project name from first encountered modinfo.json, and the final thing would look like:

@echo off
if not exist lang md lang
python extract_json_strings.py -i .\ -o lang\extracted_strings.pot
python dedup_pot_file.py lang\extracted_strings.pot
echo Done!
pause
@echo on

Alternatively, we could incorporate dedup_pot_file.py into extract_json_strings.py and have -i .\ -o lang\extracted_strings.pot be defaults if no arguments are specified. It'd be somewhat messy from code side, but then extraction would be as simple as "drag-and-drop extract_json_strings.py into mod folder -> double click to run".

Firestorm01X2 · 2021-05-10T12:56:44Z

Link to wiki page:
https://github.com/cataclysmbnteam/Cataclysm-BN/wiki/3rd-party-mods-translation

olanti-p added 8 commits May 4, 2021 22:57

Mark all i18n tests with new tag

e557db2

Add dialect to all language ids

6ce753a

This makes both libintl and cata_libintl try to use MO files where dialect is specified and matches language (e.g. fr_FR instead of just fr) before falling back to using dialect-agnostic version.

Extract lang id -> path candidates conversion

7416ffa

Custom gettext implementation

b9cb292

cata_libintl tests

1706847

Tests for changing language with new translation system

9569132

Allow reloading translations from debug menu

3b12fa7

Add docs on mod translation workflow

2e7aa1d

olanti-p force-pushed the mod-translate-final branch from 6d8c6c2 to 2e7aa1d Compare May 5, 2021 12:29

Aivean and others added 2 commits May 6, 2021 16:57

Enable Catch2 benchmarks

b9d177c

Cherry-picked from commit CleverRaven/Cataclysm-DDA@35a2d9e Co-authored-by: Zhilkin Serg <ZhilkinSerg@users.noreply.github.com>

Use benchmark facilities provided by Catch2

df47fa2

Coolthulhu reviewed May 7, 2021

View reviewed changes

Coolthulhu merged commit fbf53c7 into cataclysmbnteam:upload May 7, 2021

olanti-p deleted the mod-translate-final branch May 8, 2021 10:25

olanti-p mentioned this pull request May 18, 2021

Simplify mod string extraction #530

Merged

BrettDong mentioned this pull request Sep 16, 2021

In-house translation library implementation to support loading multiple MO files CleverRaven/Cataclysm-DDA#51363

Merged

olanti-p mentioned this pull request Nov 27, 2021

Prioritize active mod translation files over base game language file? #1142

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support translations for 3rd party mods #505

Support translations for 3rd party mods #505

olanti-p commented May 4, 2021 •

edited by scarf005

Coolthulhu May 7, 2021

olanti-p May 8, 2021 •

edited

Coolthulhu May 7, 2021

olanti-p May 8, 2021

Coolthulhu May 7, 2021

olanti-p May 8, 2021 •

edited

Coolthulhu commented May 7, 2021

cainiaowu commented May 8, 2021

olanti-p commented May 8, 2021 •

edited

vorpal-void commented May 9, 2021 •

edited

olanti-p commented May 9, 2021 •

edited

Firestorm01X2 commented May 10, 2021

		@@ -0,0 +1 @@
		��

Support translations for 3rd party mods #505

Support translations for 3rd party mods #505

Conversation

olanti-p commented May 4, 2021 • edited by scarf005

Summary

Purpose of change

Describe the solution

Describe alternatives you've considered

Testing

Additional context

Coolthulhu May 7, 2021

Choose a reason for hiding this comment

olanti-p May 8, 2021 • edited

Choose a reason for hiding this comment

Coolthulhu May 7, 2021

Choose a reason for hiding this comment

olanti-p May 8, 2021

Choose a reason for hiding this comment

Coolthulhu May 7, 2021

Choose a reason for hiding this comment

olanti-p May 8, 2021 • edited

Choose a reason for hiding this comment

Coolthulhu commented May 7, 2021

cainiaowu commented May 8, 2021

olanti-p commented May 8, 2021 • edited

vorpal-void commented May 9, 2021 • edited

olanti-p commented May 9, 2021 • edited

Firestorm01X2 commented May 10, 2021

olanti-p commented May 4, 2021 •

edited by scarf005

olanti-p May 8, 2021 •

edited

olanti-p May 8, 2021 •

edited

olanti-p commented May 8, 2021 •

edited

vorpal-void commented May 9, 2021 •

edited

olanti-p commented May 9, 2021 •

edited