Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Composite japanese characters cause files to be not found #1530

Closed
SmiVan opened this Issue Dec 6, 2018 · 5 comments

Comments

Projects
None yet
3 participants
@SmiVan
Copy link
Contributor

SmiVan commented Dec 6, 2018

Experiencing a bug related to encoding problems with japanese characters with filenames in Yume2kki:


Name of the game: Yume 2kki

Player platform:

macOS 10.13.6

Locale:

ru_RU.UTF-8

Save File and Log

save-and-log.zip

Steps to reproduce:

Navigate to track No. 003 D in the PC music player.


In particular, the Player claims to not be able to find 夢オルゴールFC.

Interestingly enough, grepping the music folder has shown that the track does not indeed exist under that name, but exists under the name 夢オルゴールFC instead.

Looks similar right?

Hence I have further copy/pasted the correct and the wrong name into text files and ran xxd against both of them producing:

$ xxd correct.txt
00000000: e5a4 a2e3 82aa e383 abe3 82b3 e382 99e3  ................
00000010: 83bc e383 ab46 430a                      .....FC.
$ xxd wrong.txt
00000000: e5a4 a2e3 82aa e383 abe3 82b4 e383 bce3  ................
00000010: 83ab 4643 0a                             ..FC.

which are clearly different.

wrong: e5a4a2e382aae383ab e382b4        e383bce383ab46430a
right: e5a4a2e382aae383ab e382b3 e38299 e383bce383ab46430a

In particular the difference is in the ゴ character as shown by grepping the music folder.

Potentially related issues are #1383 and #839

@SmiVan SmiVan changed the title [FileFinder] Some japanese characters cause files to be not found [FileFinder] Composite japanese characters cause files to be not found Dec 6, 2018

@SmiVan

This comment has been minimized.

Copy link
Contributor Author

SmiVan commented Dec 6, 2018

As revealed by @fdelapena on IRC, the difference is that in the filename the character is recorded as コ + ゙ while the file finder expects it to be .

@fmatthew5876

This comment has been minimized.

Copy link
Contributor

fmatthew5876 commented Dec 6, 2018

There is a patch in #839. That solution involves adding a function to ReaderUtil in liblcf, and the changing filefinder to use it in Player.

@fmatthew5876

This comment has been minimized.

Copy link
Contributor

fmatthew5876 commented Dec 6, 2018

In the patch Ghabry does
const Normalizer2* norm = icu::Normalizer2::getNFKCInstance(err);

But this might not be right. There is NFD, NFC, NFKC, and NFKD.

https://stackoverflow.com/questions/7931204/what-is-normalized-utf-8-all-about
http://icu-project.org/apiref/icu4c/classicu_1_1Normalizer2.html

In this example, the longer string is the one on the file system. For that I think we would need NFD or NKFD?

How can we know what format is used on disk? This looks like NFD/NFKD but others could be NFC/NFKC? Might we have to try them all in a loop in FileFinder::openUTF8()?

@fdelapena fdelapena added the FileFinder label Dec 6, 2018

@SmiVan SmiVan changed the title [FileFinder] Composite japanese characters cause files to be not found Composite japanese characters cause files to be not found Dec 6, 2018

@SmiVan

This comment has been minimized.

Copy link
Contributor Author

SmiVan commented Dec 6, 2018

In this example, the longer string is the one on the file system. For that I think we would need NFD or NKFD?

How can we know what format is used on disk? This looks like NFD/NFKD but others could be NFC/NFKC?

I've implemented the fix using NFKC, and it works just fine on macOS, so I guess this isn't a problem and we don't need to guess the format.

This still needs to be verified that it works on non macOS platforms though.

@fdelapena fdelapena added this to the 0.6.0 (likely) milestone Dec 6, 2018

Ghabry added a commit to SmiVan/EasyRPG-Player that referenced this issue Feb 4, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.