Encoding issues regarding Chinese player names #30

HenryQuan · 2023-12-29T04:58:15Z

Hi, I have found a very minor issue regarding Chinese player names while working on a PR which also uses this library, WoWs-Builder-Team/minimap_renderer#14. This issue is specific to the CN realm because it supports using Chinese characters as the player name. This isn't possible elsewhere as far as I know.

Under battle_controller, we may want to add additional encodings to address issues regarding Chinese characters. This doesn't seem to affect English names.

name=player["name"].encode('ISO8859-1').decode('UTF-8'),

The text was updated successfully, but these errors were encountered:

Monstrofil · 2023-12-29T16:14:10Z

@HenryQuan any chances you can link replay with chinese names in this issue? I will add some unit tests for that case.

HenryQuan · 2023-12-29T22:56:42Z

@Monstrofil Yes, I have one shared by my friend, 20231214_203306_PFSC210-Marseille_46_Estuary.zip. Weirdly, the clan tag can also be in Chinese, but it is encoded correctly. Only the player name is not correct. The Chinese server can be detected by checking the realm to be CN. Chinese characters can be detected using something like hanzidentifier.

This patch changes the latin-1 workaround added in #19 because it apparently breaks the utf-8 encoding. Instead, manually try to convert all bytes to utf-8 and keep bytes whenever conversion is not possible. This fixes the #30.

Monstrofil · 2024-03-18T20:08:40Z

@HenryQuan the issue with utf-8 was caused by the old workaround that was intended to fix the problem with how python2 (which WG still uses) handles strings and how python3 does. I made a bit better variant which loads pickle as bytes and later recursively searches for bytes and tries to decode them as unicode strings. This option also has problems e.g. with empty strings, but at least it handles names properly.

I added this new solution to 13.2 and also backported it to 12.11 (because replay you sent was that version), so try it once you have time.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Encoding issues regarding Chinese player names #30

Encoding issues regarding Chinese player names #30

HenryQuan commented Dec 29, 2023

Monstrofil commented Dec 29, 2023

HenryQuan commented Dec 29, 2023

Monstrofil commented Mar 18, 2024

Encoding issues regarding Chinese player names #30

Encoding issues regarding Chinese player names #30

Comments

HenryQuan commented Dec 29, 2023

Monstrofil commented Dec 29, 2023

HenryQuan commented Dec 29, 2023

Monstrofil commented Mar 18, 2024