Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Encoding issues regarding Chinese player names #30

Open
HenryQuan opened this issue Dec 29, 2023 · 3 comments
Open

Encoding issues regarding Chinese player names #30

HenryQuan opened this issue Dec 29, 2023 · 3 comments

Comments

@HenryQuan
Copy link

Hi, I have found a very minor issue regarding Chinese player names while working on a PR which also uses this library, WoWs-Builder-Team/minimap_renderer#14. This issue is specific to the CN realm because it supports using Chinese characters as the player name. This isn't possible elsewhere as far as I know.

Under battle_controller, we may want to add additional encodings to address issues regarding Chinese characters. This doesn't seem to affect English names.

name=player["name"].encode('ISO8859-1').decode('UTF-8'),
@Monstrofil
Copy link
Owner

@HenryQuan any chances you can link replay with chinese names in this issue? I will add some unit tests for that case.

@HenryQuan
Copy link
Author

@Monstrofil Yes, I have one shared by my friend, 20231214_203306_PFSC210-Marseille_46_Estuary.zip. Weirdly, the clan tag can also be in Chinese, but it is encoded correctly. Only the player name is not correct. The Chinese server can be detected by checking the realm to be CN. Chinese characters can be detected using something like hanzidentifier.

image

Monstrofil pushed a commit that referenced this issue Mar 17, 2024
This patch changes the latin-1 workaround added in #19
because it apparently breaks the utf-8 encoding.

Instead, manually try to convert all bytes to utf-8 and
keep bytes whenever conversion is not possible.

This fixes the #30.
@Monstrofil
Copy link
Owner

@HenryQuan the issue with utf-8 was caused by the old workaround that was intended to fix the problem with how python2 (which WG still uses) handles strings and how python3 does. I made a bit better variant which loads pickle as bytes and later recursively searches for bytes and tries to decode them as unicode strings. This option also has problems e.g. with empty strings, but at least it handles names properly.

I added this new solution to 13.2 and also backported it to 12.11 (because replay you sent was that version), so try it once you have time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants