# Translate constants file

Try to translate a constants file using the Google Translate API.


## Preparation

Apart from the setup steps described in the _README_ file, the following steps should be performed to prepare the environment:

1. ensure that the _INPUT_, _OUTPUT_, and _TEMP_ directories are there
2. ensure that _INPUT_ contains a mock _strings.py_ file: the one present in the _lib/i18n_ **cannot** be used
3. if not already present, copy the above mentioned _INPUT/strings.py_ file in the _TEMP_ directory
4. copy the most recent version of _strings_base.py_ from the _lib/i18n_ folder in the source tree to the _INPUT_ folder
5. copy the most recent version of all _strings_XX.py_ files (being _XX_ two-letter language codes) from the _lib/i18n_ folder in the source tree to the _INPUT_ folder.

This should be enough to proceed to the actual workflow.


## Workflow

The usual workflow is:

1. build the template if needed, that is when the original base has been modified or the template does not exist yet
2. build the dictionary of non-translateable patterns and its reverse, using the `find_untranslateable()` and `invert_untranslateable(untranslateable)` utilities: both dictionaries will be used later
3. use the untranslateable dictionary to create a version of the module where the non translateable sequences are replaced by symbols that Google is not likely to translate (mostly numeric)
4. retrieve what has to be translated, that is the constant names list, by calling `what_to_translate()` and keeping its result
5. translate the constants using `do_translate(const_list)` where `const_list` is the result of the above step: this will provide a dictionary of translations (most of which, hopefully, have been retrieved from previous work which has already been verified and amended by a human)
6. translate the template to a localized module file, whose name will be _OUTPUT/strings_XX.py_ where _XX_ is the target language (in lowercase), using `build_translation(translations, rev_untranslateable)`, where `translations` is the dictionary provided by the previous step and `rev_untranslateable` is the reverse dictionary of symbols retrieved at step 2

Between steps 3 and 4 it might be useful to save the intermediate result and then store it using `save_translations(translations, untranslateable)`: in the first dictionary, the dictionary keys will always hold the intermediate `const_list` needed for step 3, and the dictionary itself is obviously the `translations` diictionary needed at step 4; the second dictionary holds the symbols used for non translateable sequences, that can be reverted using the appropriate utility.

See the **Example** cell below for a working example.

In [None]:
# Parameters

LANG_DEST = 'it'

# --------

INPUT_DIR = "INPUT"
OUTPUT_DIR = "OUTPUT"
CACHE_DIR = "TEMP"
TEMPLATE = "strings.template"

CONST_MODULE = "strings_base"
CONST_MODULE_FILE = f"{CONST_MODULE}.py"
TRANSLATED_MODULE_PATTERN = "strings_%s"
TRANSLATED_MODULE_FILE_PATTERN = f"{TRANSLATED_MODULE_PATTERN}.py"
TRANSLATED_CACHE_FILE_PATTERN = "translations_%s.pickle"

TIME_WAIT_TRANSLATION_MSEC = 250
LANG_SRC = 'en'

# --------

# Prelude

import os
import sys
import re
import time
import asyncio
import random
import pickle
import importlib
import pprint
import shutil

from googletrans import Translator


# we only look for ALL_CAPS constants
RE_VALIDNAME = re.compile("^[A-Z_][A-Z0-9_]*$")

# the pattern for non-translateable strings
# RE_NOXLATE = re.compile(r"((\{[A-Za-z_]+(:[0-9]+)?\})|(\[bold\]`\{[A-Za-z_]+(:[0-9]+)?\}`\[/\])|(`\{[A-Za-z_]+(:[0-9]+)?\}`))")
RE_NOXLATE = re.compile(r"((\[bold\].+\[/\])|(\{[A-Za-z_]+(:[0-9]+)?\})|(`\{[A-Za-z_]+(:[0-9]+)?\}`)|(ANY_[A-Z]+))")

RE_NOXLATE_LIST_PRIORITY = [
    re.compile(r"(\[bold\].+\[/\])"),
    re.compile(r"(`\{[A-Za-z_]+(:[0-9]+)?\}`)"),
    re.compile(r"(\{[A-Za-z_]+(:[0-9]+)?\})"),
]


# find untranslateable strings and build a dictionary of them to get a file
# which has numeric symbols in it instead of strings that Gogle might be
# tempted to translate, and revert the dictionary to perform correct swaps
# on the translated expressions; note that sorting the mappings is a way to
# retrieve the substitution priority
def find_untranslateable():
    unxlate = {}
    i = 0
    pattern = "0N{i:04}"
    for regex in RE_NOXLATE_LIST_PRIORITY:
        with open(os.path.join(INPUT_DIR, CONST_MODULE_FILE)) as f:
            for line in f.readlines():
                for elem in regex.finditer(line):
                    k = elem.group(0)
                    if k not in unxlate:
                        i += 1
                        unxlate[k] = pattern.format(i=i)
    return unxlate

def invert_untranslateable(untranslateable):
    rev_unxlate = {}
    for k in untranslateable:
        rev_unxlate[untranslateable[k]] = k
    return rev_unxlate


# the core function: use Google Translate API to translate strings
async def translate(s, target=LANG_DEST):
    async with Translator() as translator:
        result = await translator.translate(s, src=LANG_SRC, dest=target)
        return result.text


# utilities to dump/restore intermediate work
def save_translations(translations, untranslateable, target=LANG_DEST):
    with open(os.path.join(CACHE_DIR, TRANSLATED_CACHE_FILE_PATTERN % target), 'wb') as f:
        pickle.dump((translations, untranslateable), f)

def load_translations(target=LANG_DEST):
    with open(os.path.join(CACHE_DIR, TRANSLATED_CACHE_FILE_PATTERN % target), 'b') as f:
        translations, untranslateable = pickle.load(f)
    return translations


# an utility to force a double quote encompassed string representation
def dqrepr(s):
    if "'" in s:
        return repr(s)
    else:
        return '"' + repr("'" + s)[2:]


# rebuild template from the original `strings_base` module
def build_template():
    with open(os.path.join(INPUT_DIR, f"{CONST_MODULE}.py"), encoding="utf-8") as f:
        with open(os.path.join(INPUT_DIR, TEMPLATE), 'w', encoding="utf-8") as w:
            for line in f.readlines():
                if "=" in line and not line.strip().startswith("#"):
                    s, e = line.split('=', 1)
                    s = s.strip()
                    e = e.strip()
                    if RE_VALIDNAME.match(s):
                        if e.startswith("f"):
                            f = "f"
                        else:
                            f = ""
                        new = f"{s} = {f}{{{s}}}\n"
                    else:
                        new = ""
                else:
                    lstrip = line.strip()
                    if lstrip.startswith("#") or lstrip.startswith("from "):
                        new = line.rstrip() + '\n'
                    elif lstrip == "":
                        new = "\n"
                    else:
                        new = ""
                w.write(new)


# a freshly built template is used to decide what has to be translated
def what_to_translate():
    res = []
    with open(os.path.join(INPUT_DIR, TEMPLATE)) as f:
        for line in f.readlines():
            try:
                s, rest = line.split('=', 1)
                s = s.strip()
                if RE_VALIDNAME.match(s):
                    res.append(s)
            except Exception as e:
                pass
    return res


# build a version of the module in the cache dir, with non-translateable
# strings replaced by symbols
def build_notranslate_module(untranslateable):
    with open(os.path.join(INPUT_DIR, f"{CONST_MODULE}.py"), encoding="utf-8") as f:
        text = f.read()
    rev = invert_untranslateable(untranslateable)
    keys = list(rev.keys())
    keys.sort()
    for k in keys:
        text = text.replace(rev[k], k)
    with open(os.path.join(CACHE_DIR, f"{CONST_MODULE}.py"), 'w', encoding="utf-8") as w:
        w.write(text)

# build a version of the existing translation module in the cache dir, with
# non-translateable strings replaced by symbols
def build_notranslate_ex_module(untranslateable, target=LANG_DEST):
    to_cache = os.path.join(INPUT_DIR, f"{TRANSLATED_MODULE_PATTERN}.py" % target)
    if os.path.exists(to_cache):
        with open(to_cache, encoding="utf-8") as f:
            text = f.read()
        rev = invert_untranslateable(untranslateable)
        keys = list(rev.keys())
        keys.sort()
        for k in keys:
            text = text.replace(rev[k], k)
        with open(os.path.join(CACHE_DIR, f"{TRANSLATED_MODULE_PATTERN}.py" % target), 'w', encoding="utf-8") as w:
            w.write(text)


# perform the translation of the provided constant list, using the existing
# translations if any, and using the Google service for missing ones: return
# a dictionary of CONSTANTS: translations
async def do_translate(const_list, target=LANG_DEST, verbose=False):
    importlib.invalidate_caches()
    try:
        consts = importlib.import_module(f".{CONST_MODULE}", CACHE_DIR)
        to_translate_dic = { k: consts.__dict__[k] for k in const_list }
        del consts
    except Exception as e:
        if verbose:
            print("Error: %s" % e)
        to_translate_dic = {}
    try:
        existing = importlib.import_module(f".strings_{target}", CACHE_DIR)
        existing_translations = { k: existing.__dict__[k] for k in existing.__dict__ if RE_VALIDNAME.match(k) }
        del existing
    except Exception as e:
        if verbose:
            print("Error: %s" % e)
        existing_translations = {}
    to_translate_dic_res = {}
    for x in const_list:
        if x in existing_translations:
            to_translate_dic_res[x] = existing_translations[x]
        else:
            s = to_translate_dic[x]
            t = await translate(s, target)
            to_translate_dic_res[x] = t
            if verbose:
                print("%s = %s --> %s" % (x, repr(s), dqrepr(t)))
            time.sleep((TIME_WAIT_TRANSLATION_MSEC / 2.0 + random.randint(0, TIME_WAIT_TRANSLATION_MSEC)) / 1000)
    return to_translate_dic_res


# perform the translation on the template text and write a localized
# translation module
def build_translation(translations, unxlate_rev, target=LANG_DEST):
    translations_as_strings = dict({ k: dqrepr(translations[k]) for k in translations })
    with open(os.path.join(INPUT_DIR, TEMPLATE), encoding="utf-8") as f:
        s = f.read()
        with open(os.path.join(OUTPUT_DIR, TRANSLATED_MODULE_FILE_PATTERN % target), 'w', encoding="utf-8") as w:
            t = s.format(**translations_as_strings)
            for k in unxlate_rev:
                t = t.replace(k, unxlate_rev[k])
                t = t.replace(k.lower(), unxlate_rev[k])
            w.write(t)


# end of prelude.


## Test space

This cell is used to do quick tests on the above defined functions: it should be empty in the end.

In [14]:
# Nothing here


## Example

What follow is the exact workflow described above. The intermediate step is performed at the end because of async code.

> **Note:** Use the second `do_translate` line to avoid output.

In [13]:
LANG_DEST = 'it'

async def main():
    unxlate = find_untranslateable()
    unxlate_rev = invert_untranslateable(unxlate)
    build_template()
    build_notranslate_module(unxlate)
    build_notranslate_ex_module(unxlate, target=LANG_DEST)
    const_list = what_to_translate()
    translations = await do_translate(const_list, target=LANG_DEST, verbose=True)
    # translations = do_translate(const_list, target=LANG_DEST)
    build_translation(translations, unxlate_rev, target=LANG_DEST)
    return translations, unxlate

translations, unxlate = await main()
save_translations(translations, unxlate)

print("OK")

TEST_STRING = 'This is a string that can be used to test 0N0005 translation' --> "Questa è una stringa che può essere utilizzata per testare la traduzione 0N0005"
OK


## Human intervention needed

The translated file, _OUTPUT/strings_XX.py_, is surely far from being perfect: before inclusion in the **When** source tree it should be edited thoroughly and checked for errors and misunderstandings. This _jupyter-lab_ instance is powerful enough to allow for a pleasant editing experience. Make sure that the files in INPUT are left intact, otherwise they will need to be replaced with the original ones on subsequent runs.