Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Windows encoding issues #247

Closed
smessmer opened this issue Jan 26, 2019 · 6 comments
Closed

Windows encoding issues #247

smessmer opened this issue Jan 26, 2019 · 6 comments

Comments

@smessmer
Copy link
Member

Reported encoding issues for CryFS on Windows. Moving discussion from #157 to a new issue to here.

Discussion history so far:

@cfbao:

I've encountered encoding errors with Unicode (Chinese, specifically) characters on Windows 7.

Sometimes (not always), when a Chinese character (typically 3 bytes in UTF-8) is followed by an ASCII character whose binary form starts with two leading zeros, the 3rd byte of the Chinese character would turn into 0xBF = 0b10111111, and the ASCII character would disappear.

For example, 好.txt would turn into 奿txt.
When trying to open/rename/delete this file, the opening application or Windows Explorer would say they cannot find the file.

However, some other characters or character strings does not have this problem. For example, 一.txt and 很好.txt work just fine.

I don't know how cryfs handles character encoding. Would be interested to know.

@smessmer:

The chinese character issue is weird. Does the same happen to other DokanY file systems or only to CryFS?

@cfbao:

I just tested this with Keybase Filesystem, which also uses Dokany, and there's no encoding error as far as I can see. The problem is reproducible in CryFS.

@smessmer:

Is this a Windows 7 only issue? I tried on Linux and Windows 10 but couldn't reproduce it, but maybe I'm doing it wrong. Do you have one of these systems at hand to try?

@cfbao:

It seems to be a code page/system locale issue.
My Windows system locale is normally set to Chinese (cp936), and the encoding issue appears.
When I change the locale to English (cp1252) or Arabic (cp720?), Chinese characters are just fine.
I've now tested this both on my main Win 7 system and a Win 10 VM.

This suggests that CryFS isn't fully Unicode on Windows, otherwise its behaviour should be code page independent.
I'm guessing here, but perhaps CryFS is (directly or indirectly) passing "narrow byte" (or "multi byte") character strings into Windows API? This would result in Windows using the locale dependent "ANSI" code page rather than Unicode. To use Unicode on Windows, UTF-16 encoded wide character strings are needed instead.

P.S. another example: Files with Arabic names created when my system was on Arabic locale, have now disappeared when I switched the locale back to Chinese.

@smessmer
Copy link
Member Author

CryFS doesn't do any en- or decoding. It uses the dokanfuse API, which uses const char* and is supposed to be UTF-8. But even if it was a different encoding, as long as the dokanfuse API is consistent in how they encode, this shouldn't happen because CryFS doesn't re-encode anything.

The keybase filesystem seems written in Go with a small custom C++ Dokan bridge implementation, they don't use the dokanfuse API, that's probably why they don't run into this bug.

I'll file a bug with the DokanY team.

@smessmer
Copy link
Member Author

smessmer commented Jan 26, 2019

Actually, @cfbao can you file it? They're asking for some more info like CPU architecture and OS version, and I didn't repro it: https://github.com/dokan-dev/dokany/issues/new

@smessmer
Copy link
Member Author

Never mind, I just reproduced it and it does not seem to be dokanfuse's fault. Will try to figure out what the issue is.

@smessmer
Copy link
Member Author

This should be fixed in 13ad69b and is in the 0.10-rc3 release candidate.

@smessmer
Copy link
Member Author

Can you check and confirm if this is fixed with 0.10-rc3?

@cfbao
Copy link

cfbao commented Jan 27, 2019

I just tested with this build, and there doesn't seem to be any encoding problems now.
Thanks a lot!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants