Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Weblate detects illegal escape sequence in .po files too late #5905

Closed
luebbe opened this issue Apr 23, 2021 · 4 comments
Closed

Weblate detects illegal escape sequence in .po files too late #5905

luebbe opened this issue Apr 23, 2021 · 4 comments
Assignees
Labels
backlog This is not on the Weblate roadmap for now. Can be prioritized by sponsorship. enhancement Adding or requesting a new feature.
Milestone

Comments

@luebbe
Copy link
Contributor

luebbe commented Apr 23, 2021

As you might already know, I like to initialize translation components by uploading a full set (.pot and .po) files to weblate via zip. ;)

I did this and the component creation succeeded without error message. (Or at least I didn't notice)
But, as I found out later, the French .po file contained messages with "\r\n" in the translations.

I would have expected that this:

msgid ""
"Warning!\n"
"\n"
"The application of this product can be dangerous.\n"
"Please use it with care.\n"
"\n"
msgstr ""
"Avertissement !\r\n"
"\r\n"
"L'application de ce produit peut être dangereuse.\r\n"
"Veuillez l'utiliser avec précaution.\r\n"
"\r\n"

would already be detected when the .po file is analyzed for the first time.

What happened now was that (days) later, the .pot file was uploaded to weblate via the API and failed with a '400 Bad Request'.
Here's a snippet of the application's log of the API request/response:

Request     : multipart/form-data POST 'https://weblate.*****/api/translations/*****/*****/en/file/'                                      
[GET/POST / E] method=source                                                                                                                                         
[FILE / E] file=exe\locale\*****.pot                                                                                                                              
Push component '*****' source - failed                                                                                                               
Status Code 400 'Bad Request'                                                                                                                                        
{"detail":"..........................................................................................................................................................
........................................................................................................................................... done.\n/app/data/vcs/*****/*****/exe\\locale\\fr\\LC_MESSAGES\\*****.po:263: warning: internationalized messages should not contain the '\\r' escape sequence\n/app/dat
a/vcs/*****/*****/exe\\locale\\fr\\LC_MESSAGES\\*****.po:263: warning: internationalized messages should not contain the '\\r' escape sequence

... ad infinitum ;)

I would have expected that uploading the .pot file itself passes successfully, because it is not an error in the .pot file, but in one of the .po files that was uploaded earlier.

Thoughts:

  • check the .po file for "\r\n" escape sequences, when it is uploaded and either fail the upload or make this a quality check which is visible in the UI.
  • right now the French translations with the "\r\n" escape sequence are not distinguishable on the UI from translations with the correct "\n" only escape sequence.

The component was set up with Weblate 4.5.x and the error when uploading the .pot file occurred with 4.6.0.

@nijel
Copy link
Member

nijel commented Apr 26, 2021

Seems like translate-toolkit parser is less strict than gettext one here. I can see several approaches to this:

  • Make the translate-toolkit parser stricter and comply with gettext one in this (it complains on \a, \b, \f, \r and \v)
  • Validate the files using gettext parser
  • Use pot2po instead of msgmerge and accept such strings

@nijel nijel added backlog This is not on the Weblate roadmap for now. Can be prioritized by sponsorship. enhancement Adding or requesting a new feature. labels Apr 26, 2021
@github-actions
Copy link

This issue has been added to the backlog. It is not scheduled on the Weblate roadmap, but it eventually might be implemented. In case you need this feature soon, please consider helping or push it by funding the development.

@luebbe
Copy link
Contributor Author

luebbe commented May 20, 2021

This turns out to be a real blocker. Today I activated the French translation for a component that didn't have French yet. Automatic translations were turned on and Weblate added a lot of 100% match suggestions to that new component. I confirmed these matches, worked a little bit on the sources and decided to update the sources. This failed again with Error 400:

[GET/POST / E] method=source
[FILE / E] file=exe\locale\*****.pot
  Push component '***** User Interface' source - failed
  Status Code 400 'Bad Request'
{"detail":"................................. done.\n/app/data/vcs/*****/*****\\locale\\fr\\LC_MESSAGES\\*****.po:122: warning: internationalized messages should not contain the '\\r' escape sequence\n/app/data/vcs/*****/*****\\locale\\fr\\LC_MESSAGES\\*****.po:122: warning: internationalized messages should not contain the '\\r' escape sequence\n/app/data/vcs/*****/*****\\locale\\fr\\LC_MESSAGES\\*****.po:273: warning: internationalized messages should not contain the '\\r' escape sequence\n"}

Obviously the string with the illegal "\r\n" escape sequence was taken from Weblate's translation memory. So it is going to be suggested for French translations over and over again. On the Weblate UI I can't tell a suggestion with "\r\n" from one with "\n".
I guess I have to remove it from the translation memory somehow. How can I achieve this?

I'd suggest to upload the files with a stricter validation initially. Gettext must have had some reason to complain about these escape sequences. Possibly double new-lines?

@nijel nijel self-assigned this May 20, 2021
@nijel nijel added this to the 4.7 milestone May 20, 2021
@nijel nijel closed this as completed in cc7a770 May 20, 2021
@github-actions
Copy link

Thank you for your report; the issue you have reported has just been fixed.

  • In case you see a problem with the fix, please comment on this issue.
  • In case you see a similar problem, please open a separate issue.
  • If you are happy with the outcome, don’t hesitate to support Weblate by making a donation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backlog This is not on the Weblate roadmap for now. Can be prioritized by sponsorship. enhancement Adding or requesting a new feature.
Projects
None yet
Development

No branches or pull requests

2 participants