-
-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Folder Names with Umlaut are not Supported #79
Comments
I reproduced the issue with unzip on linux and after diving in the source code of unzip, I think I understand what's happening. JSZip set the flag saying "the path is in utf8 !" but don't add any extra field... Extra field expected by unzip. I will re-read the "APPENDIX D - Language Encoding (EFS)" of the zip specs tomorrow, and hopefully fix this bug :) |
Ok, great. I use Ark on KDE to open the Archive. |
I spent some time in the source code of unzip (v6.0), to understand what is really going on and my first conclusion was (almost) wrong :-) This will be long and technical but I need to share with the world this madness. Feel free to skip to the tl;dr. On Linux... with unzip :
The comment block on this macro is really helpful :
JSZip generates zip files with the DOS flag so unzip expects the IBM 437 code page. The UTF8 flag ( I suspect the other archive managers to do the same guesses for the encoding so adding the unicode path extra field is the easy way to be sure that the path is correctly read. If we still have issues with other managers, we will change the "version made by" field to UNIX for example (but without any extra info, unzip will break the file permissions so this need some development/tests). Changing it to NTFS doesn't seem to be a good idea : windows itself, winrar, winzip, etc uses DOS (more on that after) and the specification says "10 - Windows NTFS" while the unzip code says On WindowsI also tested on Windows (seven) with the default compressed folders feature. Windows generated a zip file as DOS using the IBM 437 code page. Of course, I was on a NTFS partition with unicode filename. If I use characters that are not in this code page, I get a nice :
This post on superuser.com give a well explained answer (the links in this post are broken but you can find them on archive.org, here or here for example) This also means that without a <locale code page used in Windows> to utf8 converter, JSZip won't read correctly zip files generated by the default windows compressed folders feature if they contains non-ascii file names. And by <locale code page used in Windows>, this seems to go from IBM 437 code page to Japanese Shift-JIS code page. We could/should go the same way as winrar when generating a zip file : with non-ascii characters in the path, replace them with _ but set the correct path as the extra field. I will test it on several archive managers but that would a nice fallback on Windows : on non compatible managers, instead of "I ♥ you.txt" it would display "I _ you.txt" instead of "I GÖÑ you.txt". TL;DRIf you whish to use unicode in file names :
|
This patch sets the unicode path extra field. unzip needs at least one extra field to correctly handle unicode path, so using the path is as good as any other information. This could improve the situation with other archive managers too. This field is usually used without the utf8 flag, with a non unicode path in the header (winrar, winzip). This helps (a bit) with the messy Windows' default compressed folders feature but breaks on p7zip which doesn't seek the unicode path extra field. So for now, UTF-8 everywhere ! Fix Stuk#79.
Ok, great, MANY thanks, hope for a soon release... |
With this code you will end up with
ASCII control chars as path:
The text was updated successfully, but these errors were encountered: