-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Object dump import patch cleaning tool #119
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like a very good start, easy enough to follow the code and, from a quick glance, might actually do the trick. Nice job!
Some notes:
- I've added it to the project for proper tracking
- It doesn't seem hard at all to make this support target language, target file and input file
d59dda0
to
d503251
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great to me
SUPPORTED_LANGUAGES = ["ar-EG", "ca-ES", "cs-CZ", "da-DK", "de-DE", "en-GB", "en-US", "es-ES",\ | ||
"fi-FI", "fr-FR", "hu-HU", "it-IT", "ja-JP", "ko-KR", "nb-NO", "nl-NL",\ | ||
"pl-PL", "pt-BR", "ru-RU", "sv-SE", "tr-TR", "zh-CN", "zh-TW"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the future we should probably have all scripts source the languages from a single place
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, definitely. I was also thinking we should probably move the language*.py tools into a Python package of some sort. Another item for the to do list…
The object string dumper and loader scripts introduced in #27 have greatly simplified the workflow for object translations. However, the barrier of entry for use by most translators is still too big. Recent work by @tupaschoal has been working towards making an automated workflow, bridging the Objects and Localisation repositories. However, the holdup has been that Python's method of serialising JSON differs in output from what is already present (#93). Rather than writing our own encoder, I propose the following solution.
I have written a simple Python class, PatchCleaner, that uses the
unidiff
package to disregard changes that arise in patches due to the formatting changes mentioned. In short, this means it only picks up the changes we need. I have added a heuristic to pick up on cases where a trailing comma was added to a line with an irrelevant language. This needs some field testing, but in my brief testing it has worked well.The workflow is still suboptimal -- and the script doesn't take language and patch file as its arguments yet:
Again, this removes any need for cherry-picking the diff manually. This cherry-picking gets particularly cumbersome for rides and large scenery objects. Indeed, with this script, this workflow can now be completely automated!
As an aside: personally, I like to order the dump files before working on translations. We can avoid complicating
language_dump.py
by usingjq
to this end: