Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Object dump import patch cleaning tool #119

Merged
merged 4 commits into from
Jul 20, 2020
Merged

Conversation

AaronVanGeffen
Copy link
Member

The object string dumper and loader scripts introduced in #27 have greatly simplified the workflow for object translations. However, the barrier of entry for use by most translators is still too big. Recent work by @tupaschoal has been working towards making an automated workflow, bridging the Objects and Localisation repositories. However, the holdup has been that Python's method of serialising JSON differs in output from what is already present (#93). Rather than writing our own encoder, I propose the following solution.

I have written a simple Python class, PatchCleaner, that uses the unidiff package to disregard changes that arise in patches due to the formatting changes mentioned. In short, this means it only picks up the changes we need. I have added a heuristic to pick up on cases where a trailing comma was added to a line with an irrelevant language. This needs some field testing, but in my brief testing it has worked well.

The workflow is still suboptimal -- and the script doesn't take language and patch file as its arguments yet:

./language_dump.py -l nl-NL -d nl-NL_dump.json
$EDITOR nl-NL_dump.json
./language_load.py -l nl-NL -i nl-NL_dump.json
git diff > patch.diff
git stash
./language_clean_patch.py | git apply    # no arguments yet, sorry
git commit -m 'Updated object strings for nl-NL'

Again, this removes any need for cherry-picking the diff manually. This cherry-picking gets particularly cumbersome for rides and large scenery objects. Indeed, with this script, this workflow can now be completely automated!

As an aside: personally, I like to order the dump files before working on translations. We can avoid complicating language_dump.py by using jq to this end:

jq 'to_entries | sort_by(.value."reference-name") | from_entries' \
    < nl-NL_dump.json \
    > nl-NL_dump.json

Copy link
Member

@tupaschoal tupaschoal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a very good start, easy enough to follow the code and, from a quick glance, might actually do the trick. Nice job!

Some notes:

  • I've added it to the project for proper tracking
  • It doesn't seem hard at all to make this support target language, target file and input file

language_clean_patch.py Show resolved Hide resolved
language_clean_patch.py Outdated Show resolved Hide resolved
language_clean_patch.py Show resolved Hide resolved
language_clean_patch.py Outdated Show resolved Hide resolved
language_clean_patch.py Show resolved Hide resolved
Copy link
Member

@tupaschoal tupaschoal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great to me

Comment on lines +9 to +11
SUPPORTED_LANGUAGES = ["ar-EG", "ca-ES", "cs-CZ", "da-DK", "de-DE", "en-GB", "en-US", "es-ES",\
"fi-FI", "fr-FR", "hu-HU", "it-IT", "ja-JP", "ko-KR", "nb-NO", "nl-NL",\
"pl-PL", "pt-BR", "ru-RU", "sv-SE", "tr-TR", "zh-CN", "zh-TW"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the future we should probably have all scripts source the languages from a single place

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, definitely. I was also thinking we should probably move the language*.py tools into a Python package of some sort. Another item for the to do list…

@AaronVanGeffen AaronVanGeffen merged commit e991be9 into master Jul 20, 2020
@tupaschoal tupaschoal deleted the language-clean-patch branch July 20, 2020 21:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants