Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UTF-8 Java Properties upload broken #5083

Closed
Nick-kel opened this issue Dec 21, 2020 · 18 comments
Closed

UTF-8 Java Properties upload broken #5083

Nick-kel opened this issue Dec 21, 2020 · 18 comments
Assignees
Labels
bug Something is broken.
Milestone

Comments

@Nick-kel
Copy link

If I upload something to a language in Weblate, the source language works very well in UTF-8 (e.g. a § is displayed correctly). But if I upload a translation to a "normal" language, it isn't in UTF-8 and in front of a § there's a weird character:
image

I'm using the newest version of Weblate

@Nick-kel Nick-kel added the question This is more a question for the support than an issue. label Dec 21, 2020
@github-actions
Copy link

This issue looks more like a support question than an issue. We strive to answer these reasonably fast, but purchasing the support subscription is not only more responsible and faster for your business but also makes Weblate stronger. In case your question is already answered, making a donation is the right way to say thank you!

@nijel
Copy link
Member

nijel commented Dec 21, 2020

Maybe the existing file is iso-8859-1 encoded? What file format do you use? How do you upload?

@Nick-kel
Copy link
Author

Nick-kel commented Dec 21, 2020

The existing file is UTF-8 encoded and I'm using a Java Properties (UTF-8) format. I'm uploading the translation using the "Files" > "Upload translation" button in a language.

@nijel
Copy link
Member

nijel commented Dec 21, 2020

I mean which upload method do you use...

@Nick-kel
Copy link
Author

The upload method is "add as translation"

@nijel
Copy link
Member

nijel commented Dec 22, 2020

Can you please share the uploaded file? It seems that it's detected as iso-8859-1 instead of UTF-8. Maybe it has encoding error or there is something wrong with the detection.

@Nick-kel
Copy link
Author

@nijel
Copy link
Member

nijel commented Dec 22, 2020

There is no unicode char in there which could break...

@Nick-kel
Copy link
Author

@nijel
Copy link
Member

nijel commented Dec 23, 2020

That one works fine in my tests. The content looks like something (mis-)detectcs it as iso-8859-1. It might be a bug in chardet on the specific file you are using.

@Nick-kel
Copy link
Author

image
@nijel how to fix this?

@nijel
Copy link
Member

nijel commented Dec 24, 2020

How to reproduce that?

@GoneTone
Copy link

GoneTone commented Dec 25, 2020

I also encountered the same problem. When uploading a new Chinese translation, it will be garbled.

File is UTF-8 encoded.

image

@Nick-kel
Copy link
Author

@nijel I don't know ... I only uploaded the translations, these are my settings:
image
In the source language there's no problem with the encoding, only in other languages

@nijel
Copy link
Member

nijel commented Dec 26, 2020

The most likely suspect is still chardet here as the detection is always in place. Unfortunately, the file in #5083 (comment) doesn't show it. Can reproduce it with that file? Having the actual uploaded file to reproduce this would help...

@GoneTone
Copy link

@nijel Here contains the demo video and translation files:
https://drive.google.com/drive/folders/1pY2AEZgkvsgW-DOVRN9tN5u4Z_fd99Vn?usp=sharing

@nijel
Copy link
Member

nijel commented Jan 4, 2021

@GoneTone I can reproduce this if the file format is configured to ISO-8859-1, and it's expected in this case - the replace upload method does no processing of the file. When I switch file format in Weblate to UTF-8, it is shown correctly and the replace upload works fine. What still seems to be broken is regular upload though, what probably matches behavior @Nick-kel was describing initially.

@nijel nijel added bug Something is broken. and removed question This is more a question for the support than an issue. labels Jan 4, 2021
@nijel nijel self-assigned this Jan 4, 2021
@nijel nijel added this to the 4.4.1 milestone Jan 4, 2021
@nijel nijel changed the title UTF-8 UTF-8 Java Properties upload broken Jan 4, 2021
@nijel nijel closed this as completed in 315b458 Jan 4, 2021
nijel added a commit that referenced this issue Jan 4, 2021
It seems to be the most common variant used these days, so adjust
autodetection to it.

Issue #5083
@github-actions
Copy link

github-actions bot commented Jan 4, 2021

Thank you for your report, the issue you have reported has just been fixed.

  • In case you see a problem with the fix, please comment on this issue.
  • In case you see a similar problem, please open a separate issue.
  • If you are happy with the outcome, don’t hesitate to support Weblate by making a donation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something is broken.
Projects
None yet
Development

No branches or pull requests

3 participants