Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UnicodeEncodeError: 'charmap' codec can't encode character '\u0119' #5475

Closed
ToxicTree opened this issue Feb 19, 2021 · 5 comments
Closed

UnicodeEncodeError: 'charmap' codec can't encode character '\u0119' #5475

ToxicTree opened this issue Feb 19, 2021 · 5 comments
Labels
question This is more a question for the support than an issue.

Comments

@ToxicTree
Copy link

A translation includes a character that stops us from using the platform.

We are not able to push commited translation files.
We can't change files or remove the translation that seem to casue the error.

We recently upgraded to Weblate docker image 4.4.2-2 but it did not solve the issue.

Not sure how to reproduce it, maybe mess around with sources in a specific format and then enter a unicode character into the translation?

It would be great if the files were converted to utf-8 by default in some way. At least if the translation added to it need utf-8 or a check before saving and commiting.

Exception Type: UnicodeEncodeError
Exception Value: 'charmap' codec can't encode character '\u0119' in position 54: character maps to
Exception Location: /usr/lib/python3.7/encodings/cp1252.py, line 12, in encode

We are using "Simple CSV" and I am a bit unsure about the encoding because git/weblate seem to change them sometimes.

Is it a problem with the python plugin or is there a workaround to fix this?

@nijel
Copy link
Member

nijel commented Feb 19, 2021

I think I've recently fixed something in this, but I can't find the actual fix right now. Anyway try testing whether 4.5 (released today) will improve the situation.

The root cause is probably that chardet misdetects the file, see chardet/chardet#185

@nijel nijel added the question This is more a question for the support than an issue. label Feb 19, 2021
@github-actions
Copy link

This issue looks more like a support question than an issue. We strive to answer these reasonably fast, but purchasing the support subscription is not only more responsible and faster for your business but also makes Weblate stronger. In case your question is already answered, making a donation is the right way to say thank you!

@ToxicTree
Copy link
Author

I think I've recently fixed something in this, but I can't find the actual fix right now. Anyway try testing whether 4.5 (released today) will improve the situation.

The root cause is probably that chardet misdetects the file, see chardet/chardet#185

4.5 isn't available through docker-compose pull yet.

I did some inspection on the files using file -i at another place outside the container and saw that a few files used another encoding than utf-8 for some reason. But I have no idea how to change these into utf-8 in weblate-repo so that they work when weblate need to push commits.

Is there any tools that can be used after bashing inside weblate or something to run on the container so that encoding works, or maybe force python to use utf-8 with a configuration property?

@ToxicTree
Copy link
Author

I managed to fix the problem we had by removing the translations for that file + language.

@github-actions
Copy link

The issue you have reported is resolved now. If you don’t feel it’s right, please follow it’s labels to get a clue and take further steps.

  • In case you see a similar problem, please open a separate issue.
  • If you are happy with the outcome, don’t hesitate to support Weblate by making a donation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question This is more a question for the support than an issue.
Projects
None yet
Development

No branches or pull requests

2 participants