-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Decoding error of some UTF8 strings in the input map #20
Comments
An input map can have map strings of multiple encodings, and there is no way to detect encoding perfectly (but only handful of encodings are used for SC maps). A majority of Korean users still use old version of SCMDraft2 because of many inconveniences in latest SCMD2, e.g. classic trigedit is literally unusable with non-ascii strings, or opening old map of CP949 map string encoding with UTF-8 settings ends up corrupt all map strings. Old version of SCMDraft2 only uses system ANSI encoding (CP949 for Korean Windows OS) rather than UTF-8 as map string encoding. Many Korean users had tried concatenate unit name to string and got erroneous result, because in SC:R only UTF-8 encoding can be edited in-game by EUD. So, currently euddraft only tries to convert encoding of all unit name strings, to UTF-8 when string is decodable with CP949. I know it's not ideal at all but not sure what is best way to handle this. I'll add a way to opt in if I couldn't come up with any better way. |
Yes, you could add some options for users to opt. For example, in main.edd: [coding]
decode: utf8 which forces euddraft to decode all the strings using utf8. [coding]
decode: default to use the old default decoding method. |
Another suggestion: I know the old SCMD2 uses system ANSI encoding, which makes transcoding a headache for everyone. But since the 2019-10-03 version of SCMD2, the STRx comes out and makes everything perfect. In this or newer versions of SCMD2, as long as the map is saved as SC:R version, all the input strings would be encoded using UTF8 into STRx by SCMD2. |
It is possible to use ANSI encoding even in newer version of SCMD2. There is a custom locale option in profile settings and it is mandatory for adding new SC:R features for old maintaining maps. Not all STRx maps use UTF8 so the headache still exists xd |
Oh, got it. Looking forward to the new euddraft version :D |
@Chromowolf Sorry for delay, now you can set encoding to decode unit name in input map with
|
Btw, I'm not sure whether it's this: (i'm using the newest version of scmd, the 2020-06-24 version) I also read this (Btw I'm currently using your TrigEditPlus. It's perfect! It can set the codepage to 65001 automatically. I'm just curious about how to set the |
Weird, IIRC although there is an error with 65001 but unicode strings are displayed nicely.
https://cafe.naver.com/edac/79812 |
When euddraft compiles an input SCX, it first decodes the strings in the STR or STRx section of the input map (by utf8 or other means), and then encodes them back to the STR or STRx section (by utf8). I believe there is such a decoding/encoding process, instead of simply copying the bytes, am I right?
Everything would be OK if euddraft use UTF8 for both decoding and encoding process. However, for some specific strings, the decoding process uses cp949 instead of UTF8, and then the wrongly decoded strings are encoded back using UTF8, resulting wrong characters displayed in game.
Example:
![luzhang](https://user-images.githubusercontent.com/37732663/104809900-b8ab3c80-582b-11eb-9eeb-e69fd675ce32.png)
![dasha](https://user-images.githubusercontent.com/37732663/104809901-b9dc6980-582b-11eb-82d3-d54684a044e2.png)
Modify some unit names in the input SCX (using the newest version of SCMD and save as SC:R)
"Terran Academy" is renamed to "路障"
"Terran Armory" is renamed to "大厦"
Note that the UTF8 bytes for the above 4 characters are:
路: E8 B7 AF
障: E9 9A 9C
大: E5 A4 A7
厦: E5 8E A6
Open the map in Starcraft, it displays the correct characters:
![inlz](https://user-images.githubusercontent.com/37732663/104809955-438c3700-582c-11eb-83e2-02efe6b0ca99.png)
![inds](https://user-images.githubusercontent.com/37732663/104809956-4555fa80-582c-11eb-839b-b9e7233eae96.png)
Then using the newest version of euddraft to compile the map, and open the output EUD map in game:
![outlz](https://user-images.githubusercontent.com/37732663/104810006-d036f500-582c-11eb-9f5d-0814791efc8a.png)
![outds](https://user-images.githubusercontent.com/37732663/104810007-d200b880-582c-11eb-9833-d751cc1917aa.png)
"路障" is displayed correctly, but "大厦" goes wrong (鸚㎩렑):
I've checked the STRx section of the input map using Hex Editor, the strings are "E8 B7 AF E9 9A 9C" and "E5 A4 A7 E5 8E A6". And the strings in the output EUD map are "E8 B7 AF E9 9A 9C" and "E9 B8 9A E3 8E A9 EB A0 91".
It happens that the second string "E5 A4 A7 E5 8E A6" is decoded using cp949 by euddraft, to "鸚㎩렑", and then encoded back using UTF8, resulting "E9 B8 9A E3 8E A9 EB A0 91".
I don't know why this happens. Could you fix this?
The text was updated successfully, but these errors were encountered: