Nobita's Biohazard Korean translation tries to load file name with wrong encoding #649

Closed
fdelapena opened this Issue Nov 17, 2015 · 8 comments

Projects

None yet

2 participants

@fdelapena
Member

Game download, note: zip contains Korean filenames, watch out when unpacking.

How to reproduce: just play the intro cutscene, it will fail in the first conversation after the Doraemon's magic door entrance. It will fail to load FaceSet/롥릐뚺딁. Log says: Cannot find: FaceSet/[\0x0018][\0x0018]�[\0x0018]l[\0x0018]�[\0x0018].
(The [\0x0018] are handwritten intentionally here to prevent markdown converting these into 0xfffd).

@fdelapena fdelapena added this to the 0.4 milestone Nov 17, 2015
@Ghabry
Member
Ghabry commented Nov 17, 2015

Hard to extract correctly ;)

@fdelapena
Member

As Zegeri pointed in chat, --encoding windows-949-2000 seems to convert the string properly.
However, it does not convert \ to Won sign (keeps the \).

About another issue with the failure when finding 오브젝~1, this file exists in some Korean IB game torrent after doing a search, apart of 오브젝트1 (available too in the same torrent).
Update: as carstene1ns pointed in chat, it seems like some short filename support in non-Unicode Windows (Win95/98/ME), which appends ~1, ~2, etc. to file names longer than 8 bytes.

@fdelapena
Member

This encoding works with the faceset and works with Won out of the box:
ibm-1363_P110-1997.

Unfortunately, fails with some other filenames which contains e.g. middle dots. The following encoding works with those filenames, however they show a backslash when they should display a won:
windows-949-2000.

Note: in case to be used in static builds, it will need to be added to icudata and remove the old one (and update reader_util.cpp in liblcf).

@fdelapena fdelapena added a commit to fdelapena/easyrpg-liblcf that referenced this issue Nov 18, 2015
@fdelapena fdelapena Change Korean encoding to ibm-1363 936ec9a
@Ghabry
Member
Ghabry commented Dec 15, 2015

Could we delay this to 0.4.1? Updating ICU everywhere takes a while...

@fdelapena
Member

Sure, it also depends on #673 first to test and make which is the right encoding to use yet.

@fdelapena fdelapena modified the milestone: 0.4.1, 0.4 Dec 15, 2015
@fdelapena fdelapena added a commit to fdelapena/easyrpg-liblcf that referenced this issue Dec 17, 2015
@fdelapena fdelapena Change Korean encoding to windows-949-2000 dd07947
@fdelapena
Member

I have uploaded a new icudt56l version including all KSC variants (every variant has very low kb cost). After rebuilding all toolchains with icu56 and the new icudata it should be easier to test and fix this issue for all player platforms.

@fdelapena fdelapena added a commit to fdelapena/easyrpg-liblcf that referenced this issue Feb 11, 2016
@fdelapena fdelapena Change Korean encoding to windows-949-2000 fdb5913
@fdelapena
Member

Tested all available working ICU encodings for Korean with the following results:

Encoding FaceSet fails? Middle dot fails? ₩ or ?
ibm-1363_P110-1997 No Yes
ibm-1363_P11B-1998 No Yes \
ibm-949_P110-1999 Yes Yes
ibm-949_P11A-1999 Yes Yes \
ibm-970_P110_P110-2006_U2 Yes No \
windows-949-2000 No No \

So the only way is windows-949-2000.
Losing the Won support is not nice but it can be worked around into a separated issue.

It's time to update icudata for all toolchains to include the following Korean mapping tables:

  • ibm-1363_P11B-1998.cnv (windows-949-2000 relies on this one)
  • windows-949-2000.cnv

Previously existing Korean mapping tables in custom icudata can (should) be dropped.

@Ghabry
Member
Ghabry commented Feb 11, 2016

Lets patch ICU to fix this 👍 ;)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment