Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Encoding with player names and chat messages #55

Open
lichifeng opened this issue Nov 20, 2017 · 1 comment
Open

Encoding with player names and chat messages #55

lichifeng opened this issue Nov 20, 2017 · 1 comment

Comments

@lichifeng
Copy link

It seems game records generated on different Windows versions have different character encodings. Especially when dealing with game records from non-latin users, character encoding is a headache. Player names and chat messages cannot always display correctly.

I tried to resolve this with mb_detect_encoding() and mb_convert_encoding() but failed. It is hard for mb_detect_encoding() to make a good guess( Maybe a player name is too short? )

Since I mainly use recanalyst to analyze game records from chinese users, I simply decode strings extracted from records with GBK( common encoding for chinese characters) and then encode them with UTF-8. The result of this solution is acceptable for me, but apparently dirty and not elegant.

So here is my question: Is there a way to know encoding of strings in records explicitly?

Thanks.

@goto-bus-stop
Copy link
Owner

Thanks for bringing this up! Recorded games don't store the encoding but we may be able to guess it by comparing the text from the Objectives tab with some predefined language strings for example. RecAnalyst does that for map names here. I think it'd be ideal to solve this in RecAnalyst, so that player names will always be returned as utf8.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants