New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nobita's Biohazard Korean translation tries to load file name with wrong encoding #649

Closed
fdelapena opened this Issue Nov 17, 2015 · 8 comments

Comments

Projects
None yet
2 participants
@fdelapena
Contributor

fdelapena commented Nov 17, 2015

Game download, note: zip contains Korean filenames, watch out when unpacking.

How to reproduce: just play the intro cutscene, it will fail in the first conversation after the Doraemon's magic door entrance. It will fail to load FaceSet/롥릐뚺딁. Log says: Cannot find: FaceSet/[\0x0018][\0x0018]�[\0x0018]l[\0x0018]�[\0x0018].
(The [\0x0018] are handwritten intentionally here to prevent markdown converting these into 0xfffd).

@fdelapena fdelapena added this to the 0.4 milestone Nov 17, 2015

@Ghabry

This comment has been minimized.

Show comment
Hide comment
@Ghabry

Ghabry Nov 17, 2015

Member

Hard to extract correctly ;)

Member

Ghabry commented Nov 17, 2015

Hard to extract correctly ;)

@fdelapena

This comment has been minimized.

Show comment
Hide comment
@fdelapena

fdelapena Nov 18, 2015

Contributor

As Zegeri pointed in chat, --encoding windows-949-2000 seems to convert the string properly.
However, it does not convert \ to Won sign (keeps the \).

About another issue with the failure when finding 오브젝~1, this file exists in some Korean IB game torrent after doing a search, apart of 오브젝트1 (available too in the same torrent).
Update: as carstene1ns pointed in chat, it seems like some short filename support in non-Unicode Windows (Win95/98/ME), which appends ~1, ~2, etc. to file names longer than 8 bytes.

Contributor

fdelapena commented Nov 18, 2015

As Zegeri pointed in chat, --encoding windows-949-2000 seems to convert the string properly.
However, it does not convert \ to Won sign (keeps the \).

About another issue with the failure when finding 오브젝~1, this file exists in some Korean IB game torrent after doing a search, apart of 오브젝트1 (available too in the same torrent).
Update: as carstene1ns pointed in chat, it seems like some short filename support in non-Unicode Windows (Win95/98/ME), which appends ~1, ~2, etc. to file names longer than 8 bytes.

@fdelapena

This comment has been minimized.

Show comment
Hide comment
@fdelapena

fdelapena Nov 18, 2015

Contributor

This encoding works with the faceset and works with Won out of the box:
ibm-1363_P110-1997.

Unfortunately, fails with some other filenames which contains e.g. middle dots. The following encoding works with those filenames, however they show a backslash when they should display a won:
windows-949-2000.

Note: in case to be used in static builds, it will need to be added to icudata and remove the old one (and update reader_util.cpp in liblcf).

Contributor

fdelapena commented Nov 18, 2015

This encoding works with the faceset and works with Won out of the box:
ibm-1363_P110-1997.

Unfortunately, fails with some other filenames which contains e.g. middle dots. The following encoding works with those filenames, however they show a backslash when they should display a won:
windows-949-2000.

Note: in case to be used in static builds, it will need to be added to icudata and remove the old one (and update reader_util.cpp in liblcf).

@Ghabry

This comment has been minimized.

Show comment
Hide comment
@Ghabry

Ghabry Dec 15, 2015

Member

Could we delay this to 0.4.1? Updating ICU everywhere takes a while...

Member

Ghabry commented Dec 15, 2015

Could we delay this to 0.4.1? Updating ICU everywhere takes a while...

@fdelapena

This comment has been minimized.

Show comment
Hide comment
@fdelapena

fdelapena Dec 15, 2015

Contributor

Sure, it also depends on #673 first to test and make which is the right encoding to use yet.

Contributor

fdelapena commented Dec 15, 2015

Sure, it also depends on #673 first to test and make which is the right encoding to use yet.

@fdelapena fdelapena modified the milestones: 0.4.1, 0.4 Dec 15, 2015

@fdelapena

This comment has been minimized.

Show comment
Hide comment
@fdelapena

fdelapena Dec 24, 2015

Contributor

I have uploaded a new icudt56l version including all KSC variants (every variant has very low kb cost). After rebuilding all toolchains with icu56 and the new icudata it should be easier to test and fix this issue for all player platforms.

Contributor

fdelapena commented Dec 24, 2015

I have uploaded a new icudt56l version including all KSC variants (every variant has very low kb cost). After rebuilding all toolchains with icu56 and the new icudata it should be easier to test and fix this issue for all player platforms.

@fdelapena

This comment has been minimized.

Show comment
Hide comment
@fdelapena

fdelapena Feb 11, 2016

Contributor

Tested all available working ICU encodings for Korean with the following results:

Encoding FaceSet fails? Middle dot fails? ₩ or ?
ibm-1363_P110-1997 No Yes
ibm-1363_P11B-1998 No Yes \
ibm-949_P110-1999 Yes Yes
ibm-949_P11A-1999 Yes Yes \
ibm-970_P110_P110-2006_U2 Yes No \
windows-949-2000 No No \

So the only way is windows-949-2000.
Losing the Won support is not nice but it can be worked around into a separated issue.

It's time to update icudata for all toolchains to include the following Korean mapping tables:

  • ibm-1363_P11B-1998.cnv (windows-949-2000 relies on this one)
  • windows-949-2000.cnv

Previously existing Korean mapping tables in custom icudata can (should) be dropped.

Contributor

fdelapena commented Feb 11, 2016

Tested all available working ICU encodings for Korean with the following results:

Encoding FaceSet fails? Middle dot fails? ₩ or ?
ibm-1363_P110-1997 No Yes
ibm-1363_P11B-1998 No Yes \
ibm-949_P110-1999 Yes Yes
ibm-949_P11A-1999 Yes Yes \
ibm-970_P110_P110-2006_U2 Yes No \
windows-949-2000 No No \

So the only way is windows-949-2000.
Losing the Won support is not nice but it can be worked around into a separated issue.

It's time to update icudata for all toolchains to include the following Korean mapping tables:

  • ibm-1363_P11B-1998.cnv (windows-949-2000 relies on this one)
  • windows-949-2000.cnv

Previously existing Korean mapping tables in custom icudata can (should) be dropped.

@Ghabry

This comment has been minimized.

Show comment
Hide comment
@Ghabry

Ghabry Feb 11, 2016

Member

Lets patch ICU to fix this 👍 ;)

Member

Ghabry commented Feb 11, 2016

Lets patch ICU to fix this 👍 ;)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment