New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unpacking zip, encoding corrupts filenames #12

Closed
tkisme opened this Issue Jan 4, 2017 · 22 comments

Comments

4 participants
@tkisme

tkisme commented Jan 4, 2017

It's happening to names of files and folders within zip archives (scrambled when looking or unzipping them), and to the content in some Text files, other files seems to fare better even though their names are gibberish.

@tkisme

This comment has been minimized.

tkisme commented Jan 4, 2017

@tkisme

This comment has been minimized.

tkisme commented Jan 4, 2017

no offence, Entropy do it well.

@aonez aonez added bug p7zip labels Jan 9, 2017

@aonez aonez self-assigned this Jan 9, 2017

@aonez aonez added this to the Look at milestone Jan 9, 2017

@aonez

This comment has been minimized.

Owner

aonez commented Jan 9, 2017

thanks for the tip, will look at it

@aonez

This comment has been minimized.

Owner

aonez commented Jan 17, 2017

@tkisme can you share a file to test?

@maz-1

This comment has been minimized.

maz-1 commented Jan 23, 2017

Maybe use p7zip-natspec or unzip-iconv for encoding autodetection?
https://aur.archlinux.org/packages/p7zip-natspec/
https://aur.archlinux.org/packages/unzip-iconv

@aonez

This comment has been minimized.

Owner

aonez commented Jan 23, 2017

Just deleted one of you comments because it contained a file with adult images.

For the battle-net issue, here is the right issue that already contains a test file: #2

For the zip encoding, here an all ages test file, found here.

Just note that the bundled macOS compression utility also fails to extract it with the correct encoding.

@aonez

This comment has been minimized.

Owner

aonez commented Jan 23, 2017

Just for test I've opened Unicode test.zip file, that you can find inside the test file posted before and Entropy did not extracted the files with the Chinese encoding, omitted them. Also the adult file you've posted before got unusable with Entropy, no images nor text file when extracted.

@aonez

This comment has been minimized.

Owner

aonez commented Jan 23, 2017

Here another test file. This one is encoded in Greek DOS (737 codepage). This one was found here.

@aonez

This comment has been minimized.

Owner

aonez commented Jan 23, 2017

Some useful info here: https://sourceforge.net/p/p7zip/discussion/383044/thread/3d213124

No quick fix though.

@tkisme

This comment has been minimized.

tkisme commented Jan 24, 2017

maybe add iconv is enough?Can't wait to test.

@tkisme

This comment has been minimized.

tkisme commented Jan 24, 2017

image
from The Unarchiver
maybe show a window to select encoding?

@maz-1

This comment has been minimized.

maz-1 commented Jan 25, 2017

Official fix could take some time, but as I have pasted above, there is already third-party patches to add encoding autodetection support to unzip and p7zip :-)
Anyway, the default 'Archive Utility.app' seems to handle different encodings of zip files well, so you can use the bultin app for zip files.

@tkisme

This comment has been minimized.

tkisme commented Jan 26, 2017

I think keka is the replacement of the default 'Archive Utility.app'.Use different application to handle different archives is just annoying.
So I think this is the necessary function.

@maz-1

This comment has been minimized.

maz-1 commented Jan 28, 2017

I can't get the point. Keka is behaving similar to built-in archive utility (double click and decompress), I did not see anything annoying.
Anyway, I tried to build latest p7zip with natspec patch and replaced Keka.app/Contents/Resources/keka7z, still no luck. The patch needs to be fixed for p7zip 16.02.

@aonez

This comment has been minimized.

Owner

aonez commented Jan 29, 2017

@maz-1 , what natspec library you've used to compile p7zip? Can you share those files?

@maz-1

This comment has been minimized.

maz-1 commented Jan 29, 2017

Here is the natspec patch for p7zip 15.14.1
https://aur.archlinux.org/cgit/aur.git/tree/natspec.patch?h=p7zip-natspec
line 31 doesnt apply on p7zip 16.02, need a fix.
Here is the natspec library
https://sourceforge.net/projects/natspec/

@tkisme
Whatever. Why not stop arguring and start finding a solution to the problem?

@aonez

This comment has been minimized.

Owner

aonez commented Jan 29, 2017

@maz-1 I've already downloaded both the patch and the library files, and fixed the patch. But I'm unable to link p7zip to the natspec library. How you did it?

In case you need it, here the patch: natspec_p7zip1602.patch.zip

@maz-1

This comment has been minimized.

maz-1 commented Jan 30, 2017

I made a homebrew formula to build natspec:
https://gist.github.com/maz-1/2c797fa2ccf5aa815017676bbb884a73

Download natspec.rb and run "brew install natspec.rb" should install the library.
And then patch p7zip with this(Modified a little to link to natspec):
https://gist.github.com/maz-1/f8e0bce8516e11337a758314c07ca423

But the encoding detection is still borken (15.14.1 has the same problem):

/Users/ling/Downloads/p7zip_16.02/bin/7z  x /Users/ling/Downloads/test/Unicode\ test.zip

7-Zip [64] 16.02 : Copyright (c) 1999-2016 Igor Pavlov : 2016-05-21
p7zip Version 16.02 (locale=utf8,Utf16=on,HugeFiles=on,64 bits,8 CPUs x64)

Scanning the drive for archives:
1 file, 8458 bytes (9 KiB)

Extracting archive: /Users/ling/Downloads/test/Unicode test.zip
Broken encoding: '(null)' (to) or 'CP936' (from) or UCS2. May be you forget setlocale in main or gconv-modules is missed?
: Invalid argument
…
Broken encoding: '(null)' (to) or 'CP936' (from) or UCS2. May be you forget setlocale in main or gconv-modules is missed?
: Invalid argument
--
Path = /Users/ling/Downloads/test/Unicode test.zip
Type = zip
Physical Size = 8458

  0%Broken encoding: '(null)' (to) or 'CP936' (from) or UCS2. May be you forget setlocale in main or gconv-modules is missed?
: Invalid argument
…
Broken encoding: '(null)' (to) or 'CP936' (from) or UCS2. May be you forget setlocale in main or gconv-modules is missed?
: No such file or directory
Everything is Ok

Folders: 7
Files: 30
Size:       132
Compressed: 8458

@aonez aonez added the zip label Feb 6, 2017

@aonez

This comment has been minimized.

Owner

aonez commented Mar 7, 2017

Another file that gets broken, Russian content shared by Колобок via mail: IIS.zip

@aonez aonez changed the title from Unpacking zip chinese file name corrupt to Unpacking zip, encoding corrupts filenames Mar 7, 2017

@InfinityMe

This comment has been minimized.

InfinityMe commented Mar 9, 2017

IIS.zip was probably created by the built-in Windows archiver, I now checked, unpacked and re-packed the files, then unpacked, the encoding went bad.
WinRar on Windows at the same time works with such an archive correctly.
2017-03-09 22 49 09
2017-03-09 22 48 29

@aonez

This comment has been minimized.

Owner

aonez commented Mar 20, 2018

Should check https://github.com/ethereon/p7zip-hybrid encoding changes

@aonez aonez removed this from the Look at milestone Mar 22, 2018

@aonez aonez added this to the 1.1.1 milestone Mar 22, 2018

@aonez aonez modified the milestones: 1.1.1, 1.1.2 Jun 13, 2018

@aonez aonez modified the milestones: 1.1.2, 1.1.3 Jun 22, 2018

@aonez aonez modified the milestones: 1.1.3, 1.1.4 Aug 9, 2018

@aonez aonez modified the milestones: 1.1.4, 1.2.0 Sep 11, 2018

@aonez aonez modified the milestones: 1.2.0, 1.1.6 Nov 7, 2018

@aonez

This comment has been minimized.

Owner

aonez commented Nov 26, 2018

This one should be fixed in 1.1.6!!!

@aonez aonez closed this Nov 26, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment