Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add more English names and their translations to Simplified Chinese and Japanese #73077

Merged
merged 3 commits into from
Apr 18, 2024

Conversation

Qrox
Copy link
Contributor

@Qrox Qrox commented Apr 17, 2024

Summary

I18N "Add more English names and their translations to Simplified Chinese and Japanese"

Purpose of change

In #70279 some name snippets were removed without being added to data/names. Also, the current Simplified Chinese and Japanese name lists only contain very few entries.

Describe the solution

Merge the removed names with the list in data/names/en.json. Salvage the names translated before #70279.

First, the removed names from #70279 are appended to the corresponding lists in names/en.json, sorted, and made unique. The given names from MoM are not added, because they lack gender information required by the lists.

Then the family and given names from names/en.json are converted to translated names using the following script. .po files from the merge commit f255964 of #70279 are used to generate the .mo files before running the script.

The script I used to generate the list of translated names
import gettext
import json
import os

class EmptyTranslations(gettext.NullTranslations):
    def gettext(self, message):
        return ""

    def ngettext(self, singular, plural, n):
        return ""

    def pgettext(self, context, message):
        return ""

    def npgettext(self, context, singular, plural, n):
        return ""

with open('data/names/en.json', 'r', encoding='utf-8') as f:
    en_names = json.load(f)

for file in ['zh_CN.po', 'ja.po']:
    if not file.endswith('.po'):
        continue

    lang = file[:-3]
    print(f'Converting {lang}...')

    trans = gettext.translation('cataclysm-dda', 'lang/mo', [lang])
    trans.add_fallback(EmptyTranslations())

    lang_names = []
    for obj in en_names:
        if obj['usage'] in ['family', 'given']:
            new_obj = obj.copy()
            new_obj['name'] = [trans.gettext(name) for name in obj['name']]
            lang_names.append(new_obj)

    with open(f'data/names/{lang}.out.json', 'w', encoding='utf-8') as f:
        json.dump(lang_names, f, ensure_ascii=False, indent=2)

For the Japanese names, the generated translated names are appended to the corresponding lists in names/ja.json, sorted, and made unique. Translations that are not names are then removed (they are easy to locate after sorting, because the Japanese translators translated the names using half-width katakana exclusively).

For the Simplified Chinese names, the generated translated names are appended to the corresponding lists in names/zh_CN.json, sorted, and made unique. I then skimmed through the lists and removed translations that are not names.

Describe alternatives you've considered

Make all names Tom.

I decided to not split up this PR into smaller ones because the changes can be easily verified by following the above steps.

Testing

Just regular JSON changes, should be fine if the CI tests pass.

Additional context

I need help with adding the Russian translations, because some English names were translated to their literal meaning instead of names. It should be as simple as going through the list of changes and remove any strings that are not real names, so if someone would like to help please see #73083. The changes have 2775 insertions(+), 950 deletions(-) so it should take around 1 hour of reviewing.

Other languages seem to have very few translated names as of #70279 so they are not included in this PR.

Append names from the removed files into corresponding lists in names/en.json, sort the lists, and remove duplicate entries. The given names from MoM are not added, because they lack gender info required by the lists.
…0279

Family and given names from names/en.json are converted to translated names from the Japanese mo file before CleverRaven#70279 using a script. The translated names are then appended to lists in names/ja.json, sorted, and made unique. Translations that are not names are then removed. (They are easy to locate after sorting, because the Japanese translators translated names using half-width katakana exclusively.)
…verRaven#70279

Family and given names from names/en.json are converted to translated names from the Simplified Chinese mo file before CleverRaven#70279 using a script. The translated names are then appended to lists in names/zh_CN.json, sorted, and made unique. Translations that are not names are then removed manually.
@github-actions github-actions bot added [JSON] Changes (can be) made in JSON Translation I18n astyled astyled PR, label is assigned by github actions json-styled JSON lint passed, label assigned by github actions labels Apr 17, 2024
@Zireael07
Copy link
Contributor

English names were translated to their literal meaning instead of names

Can you clarify? My Russian is like A1 but I can spot the worst goofs and provide real names with some quick googling.

The most important thing is, do you want otchestvos too or not (e.g. do you want Aleksander or Aleksander Fyedorovich)?

@Qrox
Copy link
Contributor Author

Qrox commented Apr 17, 2024

English names were translated to their literal meaning instead of names

Some English names such as Silver, Moon are translated literally due to a lack of translation context, so I need to remove those strings from the generated lists.

There are three lists containing family names, male given names, and female given names, so otchestvos is not concerned here, if I understand correctly.

If you could review the list and remove strings that are not names I can push the commit now (and thanks for offering to do it!), but to replace those strings with real names requires me to redo the generation to list English names alongside, and I'm not sure if that's really worth it... the problematic strings are very few anyway, judging from the Chinese strings.

@Qrox Qrox changed the title Add more English, Simplified Chinese, and Japanese names (help wanted with Russian) Add more English names and their translations to Simplified Chinese and Japanese Apr 17, 2024
@github-actions github-actions bot added the BasicBuildPassed This PR builds correctly, label assigned by github actions label Apr 17, 2024
@Maleclypse Maleclypse merged commit 090cf29 into CleverRaven:master Apr 18, 2024
28 checks passed
@Qrox Qrox deleted the names branch April 19, 2024 04:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
astyled astyled PR, label is assigned by github actions BasicBuildPassed This PR builds correctly, label assigned by github actions [JSON] Changes (can be) made in JSON json-styled JSON lint passed, label assigned by github actions Translation I18n
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants