Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unpacking zip, encoding corrupts filenames #12

Closed
tkisme opened this issue Jan 4, 2017 · 24 comments
Closed

Unpacking zip, encoding corrupts filenames #12

tkisme opened this issue Jan 4, 2017 · 24 comments
Assignees
Milestone

Comments

@tkisme
Copy link

tkisme commented Jan 4, 2017

It's happening to names of files and folders within zip archives (scrambled when looking or unzipping them), and to the content in some Text files, other files seems to fare better even though their names are gibberish.

@tkisme
Copy link
Author

tkisme commented Jan 4, 2017

@tkisme
Copy link
Author

tkisme commented Jan 4, 2017

no offence, Entropy do it well.

@aonez aonez self-assigned this Jan 9, 2017
@aonez aonez added this to the Look at milestone Jan 9, 2017
@aonez
Copy link
Owner

aonez commented Jan 9, 2017

thanks for the tip, will look at it

@aonez
Copy link
Owner

aonez commented Jan 17, 2017

@tkisme can you share a file to test?

@maz-1
Copy link

maz-1 commented Jan 23, 2017

Maybe use p7zip-natspec or unzip-iconv for encoding autodetection?
https://aur.archlinux.org/packages/p7zip-natspec/
https://aur.archlinux.org/packages/unzip-iconv

@aonez
Copy link
Owner

aonez commented Jan 23, 2017

Just deleted one of you comments because it contained a file with adult images.

For the battle-net issue, here is the right issue that already contains a test file: #2

For the zip encoding, here an all ages test file, found here.

Just note that the bundled macOS compression utility also fails to extract it with the correct encoding.

@aonez
Copy link
Owner

aonez commented Jan 23, 2017

Just for test I've opened Unicode test.zip file, that you can find inside the test file posted before and Entropy did not extracted the files with the Chinese encoding, omitted them. Also the adult file you've posted before got unusable with Entropy, no images nor text file when extracted.

@aonez
Copy link
Owner

aonez commented Jan 23, 2017

Here another test file. This one is encoded in Greek DOS (737 codepage). This one was found here.

@aonez
Copy link
Owner

aonez commented Jan 23, 2017

Some useful info here: https://sourceforge.net/p/p7zip/discussion/383044/thread/3d213124

No quick fix though.

@tkisme
Copy link
Author

tkisme commented Jan 24, 2017

maybe add iconv is enough?Can't wait to test.

@tkisme
Copy link
Author

tkisme commented Jan 24, 2017

image
from The Unarchiver
maybe show a window to select encoding?

@maz-1
Copy link

maz-1 commented Jan 25, 2017

Official fix could take some time, but as I have pasted above, there is already third-party patches to add encoding autodetection support to unzip and p7zip :-)
Anyway, the default 'Archive Utility.app' seems to handle different encodings of zip files well, so you can use the bultin app for zip files.

@tkisme
Copy link
Author

tkisme commented Jan 26, 2017

I think keka is the replacement of the default 'Archive Utility.app'.Use different application to handle different archives is just annoying.
So I think this is the necessary function.

@maz-1
Copy link

maz-1 commented Jan 28, 2017

I can't get the point. Keka is behaving similar to built-in archive utility (double click and decompress), I did not see anything annoying.
Anyway, I tried to build latest p7zip with natspec patch and replaced Keka.app/Contents/Resources/keka7z, still no luck. The patch needs to be fixed for p7zip 16.02.

@aonez
Copy link
Owner

aonez commented Jan 29, 2017

@maz-1 , what natspec library you've used to compile p7zip? Can you share those files?

@maz-1
Copy link

maz-1 commented Jan 29, 2017

Here is the natspec patch for p7zip 15.14.1
https://aur.archlinux.org/cgit/aur.git/tree/natspec.patch?h=p7zip-natspec
line 31 doesnt apply on p7zip 16.02, need a fix.
Here is the natspec library
https://sourceforge.net/projects/natspec/

@tkisme
Whatever. Why not stop arguring and start finding a solution to the problem?

@aonez
Copy link
Owner

aonez commented Jan 29, 2017

@maz-1 I've already downloaded both the patch and the library files, and fixed the patch. But I'm unable to link p7zip to the natspec library. How you did it?

In case you need it, here the patch: natspec_p7zip1602.patch.zip

@maz-1
Copy link

maz-1 commented Jan 30, 2017

I made a homebrew formula to build natspec:
https://gist.github.com/maz-1/2c797fa2ccf5aa815017676bbb884a73

Download natspec.rb and run "brew install natspec.rb" should install the library.
And then patch p7zip with this(Modified a little to link to natspec):
https://gist.github.com/maz-1/f8e0bce8516e11337a758314c07ca423

But the encoding detection is still borken (15.14.1 has the same problem):

/Users/ling/Downloads/p7zip_16.02/bin/7z  x /Users/ling/Downloads/test/Unicode\ test.zip

7-Zip [64] 16.02 : Copyright (c) 1999-2016 Igor Pavlov : 2016-05-21
p7zip Version 16.02 (locale=utf8,Utf16=on,HugeFiles=on,64 bits,8 CPUs x64)

Scanning the drive for archives:
1 file, 8458 bytes (9 KiB)

Extracting archive: /Users/ling/Downloads/test/Unicode test.zip
Broken encoding: '(null)' (to) or 'CP936' (from) or UCS2. May be you forget setlocale in main or gconv-modules is missed?
: Invalid argument
…
Broken encoding: '(null)' (to) or 'CP936' (from) or UCS2. May be you forget setlocale in main or gconv-modules is missed?
: Invalid argument
--
Path = /Users/ling/Downloads/test/Unicode test.zip
Type = zip
Physical Size = 8458

  0%Broken encoding: '(null)' (to) or 'CP936' (from) or UCS2. May be you forget setlocale in main or gconv-modules is missed?
: Invalid argument
…
Broken encoding: '(null)' (to) or 'CP936' (from) or UCS2. May be you forget setlocale in main or gconv-modules is missed?
: No such file or directory
Everything is Ok

Folders: 7
Files: 30
Size:       132
Compressed: 8458

@aonez aonez added the zip label Feb 6, 2017
@aonez
Copy link
Owner

aonez commented Mar 7, 2017

Another file that gets broken, Russian content shared by Колобок via mail: IIS.zip

@aonez aonez changed the title Unpacking zip chinese file name corrupt Unpacking zip, encoding corrupts filenames Mar 7, 2017
@InfinityMe
Copy link

IIS.zip was probably created by the built-in Windows archiver, I now checked, unpacked and re-packed the files, then unpacked, the encoding went bad.
WinRar on Windows at the same time works with such an archive correctly.
2017-03-09 22 49 09
2017-03-09 22 48 29

@aonez
Copy link
Owner

aonez commented Mar 20, 2018

Should check https://github.com/ethereon/p7zip-hybrid encoding changes

@aonez aonez removed this from the Look at milestone Mar 22, 2018
@aonez aonez added this to the 1.1.1 milestone Mar 22, 2018
@aonez aonez modified the milestones: 1.1.1, 1.1.2 Jun 13, 2018
@aonez aonez modified the milestones: 1.1.2, 1.1.3 Jun 22, 2018
@aonez aonez modified the milestones: 1.1.3, 1.1.4 Aug 9, 2018
@aonez aonez modified the milestones: 1.1.4, 1.2.0 Sep 11, 2018
@aonez aonez modified the milestones: 1.2.0, 1.1.6 Nov 7, 2018
@aonez
Copy link
Owner

aonez commented Nov 26, 2018

This one should be fixed in 1.1.6!!!

@aonez aonez closed this as completed Nov 26, 2018
@c0494133d4
Copy link

I am doing a research whether I could replace 'The Unarchiver' with this great App.

The only concern seems to be zip file name encoding. IMHO, auto encoding detection does not work 100%, so manual selection should be provided as a last resort.

With no offence, is it implemented?

@aonez
Copy link
Owner

aonez commented Dec 18, 2018

It was implemented in 1.1.6 @ffffwh. Inspired in The Unarchiver way, so will be very (maybe too much) familiar. You can test the feature with the example file found in the comment #12 (comment).

aonez pushed a commit that referenced this issue Jan 17, 2019
aonez pushed a commit that referenced this issue Oct 14, 2019
aonez pushed a commit that referenced this issue Oct 14, 2019
aonez pushed a commit that referenced this issue Dec 21, 2020
aonez pushed a commit that referenced this issue Mar 30, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants