Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Folder Names with Umlaut are not Supported #79

Closed
vanthome opened this issue Jan 13, 2014 · 4 comments · Fixed by #82
Closed

Folder Names with Umlaut are not Supported #79

vanthome opened this issue Jan 13, 2014 · 4 comments · Fixed by #82

Comments

@vanthome
Copy link

With this code you will end up with
ASCII control chars as path:

var zip = new JSZip();
zip.file("Hello.txt", "Hello World\n");
var img = zip.folder("öäü");
img.file("smile.gif", imgData, {base64: true});
var content = zip.generate();
location.href="data:application/zip;base64,"+content;
@dduponchel
Copy link
Collaborator

I reproduced the issue with unzip on linux and after diving in the source code of unzip, I think I understand what's happening. JSZip set the flag saying "the path is in utf8 !" but don't add any extra field... Extra field expected by unzip. I will re-read the "APPENDIX D - Language Encoding (EFS)" of the zip specs tomorrow, and hopefully fix this bug :)
To be sure that I will fix your bug, do you use unzip or an other archive manager ?

@vanthome
Copy link
Author

Ok, great. I use Ark on KDE to open the Archive.

@dduponchel
Copy link
Collaborator

I spent some time in the source code of unzip (v6.0), to understand what is really going on and my first conclusion was (almost) wrong :-) This will be long and technical but I need to share with the world this madness. Feel free to skip to the tl;dr.

On Linux

... with unzip :

  • in fileio.c:2243, in the "translate the Zip entry filename" part, there is a call to the macro Ext_ASCII_TO_Native
  • in unzpriv.h:3005, the macro uses the version/platform to convert the file name
  • in fileio.c:2306, with an unicode path extra field, we overwrite the filename to its unicode version

The comment block on this macro is really helpful :

/* Convert filename (and file comment string) into "internal" charset.
 * This macro assumes that Zip entry filenames are coded in OEM (IBM DOS)
 * codepage when made on
 *  -> DOS (this includes 16-bit Windows 3.1)  (FS_FAT_)
 *  -> OS/2                                    (FS_HPFS_)
 *  -> Win95/WinNT with Nico Mak's WinZip      (FS_NTFS_ && hostver == "5.0")
 * EXCEPTIONS:
 *  PKZIP for Windows 2.5, 2.6, and 4.0 flag their entries as "FS_FAT_", but
 *  the filename stored in the local header is coded in Windows ANSI (CP 1252
 *  resp. ISO 8859-1 on US and western Europe locale settings).
 *  Likewise, PKZIP for UNIX 2.51 flags its entries as "FS_FAT_", but the
 *  filenames stored in BOTH the local and the central header are coded
 *  in the local system's codepage (usually ANSI codings like ISO 8859-1).
 *
 * All other ports are assumed to code zip entry filenames in ISO 8859-1.
 */

JSZip generates zip files with the DOS flag so unzip expects the IBM 437 code page. The UTF8 flag (G.pInfo->GPFIsUTF8) doesn't seem to be used to generate the final file name without the unicode path extra field.

I suspect the other archive managers to do the same guesses for the encoding so adding the unicode path extra field is the easy way to be sure that the path is correctly read.

If we still have issues with other managers, we will change the "version made by" field to UNIX for example (but without any extra info, unzip will break the file permissions so this need some development/tests). Changing it to NTFS doesn't seem to be a good idea : windows itself, winrar, winzip, etc uses DOS (more on that after) and the specification says "10 - Windows NTFS" while the unzip code says #define FS_NTFS_ 11...

On Windows

I also tested on Windows (seven) with the default compressed folders feature. Windows generated a zip file as DOS using the IBM 437 code page. Of course, I was on a NTFS partition with unicode filename. If I use characters that are not in this code page, I get a nice :

'C:\♥.txt' cannot be compressed because it includes characters that cannot be used in a compressed folder, such as ♥. You should rename this file or directory.

This post on superuser.com give a well explained answer (the links in this post are broken but you can find them on archive.org, here or here for example)

This also means that without a <locale code page used in Windows> to utf8 converter, JSZip won't read correctly zip files generated by the default windows compressed folders feature if they contains non-ascii file names. And by <locale code page used in Windows>, this seems to go from IBM 437 code page to Japanese Shift-JIS code page.

We could/should go the same way as winrar when generating a zip file : with non-ascii characters in the path, replace them with _ but set the correct path as the extra field. I will test it on several archive managers but that would a nice fallback on Windows : on non compatible managers, instead of "I ♥ you.txt" it would display "I _ you.txt" instead of "I GÖÑ you.txt".

TL;DR

If you whish to use unicode in file names :

  • On windows, please install / force your users to install an external archive manager, the default one is a mess. I will test _ as a fallback for unicode characters for the worst case scenario.
  • On linux, the fix is coming :)

dduponchel added a commit to dduponchel/jszip that referenced this issue Jan 19, 2014
This patch sets the unicode path extra field. unzip needs at least one
extra field to correctly handle unicode path, so using the path is as
good as any other information. This could improve the situation with
other archive managers too.

This field is usually used without the utf8 flag, with a non unicode
path in the header (winrar, winzip). This helps (a bit) with the messy
Windows' default compressed folders feature but breaks on p7zip which
doesn't seek the unicode path extra field.

So for now, UTF-8 everywhere !

Fix Stuk#79.
@vanthome
Copy link
Author

Ok, great, MANY thanks, hope for a soon release...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants