Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ZIP archives containing files with non-ascii names are displayed wrongly #102

Open
Yanpas opened this issue Oct 7, 2015 · 17 comments
Open

Comments

@Yanpas
Copy link

Yanpas commented Oct 7, 2015

There has been already released patch for file-roller, which replaces p7zip with unzip. Look here https://bugs.launchpad.net/ubuntu/+source/p7zip/+bug/1382106

Testfile from gnome bugzilla https://bugs.launchpad.net/ubuntu/+source/unzip/+bug/580961/+attachment/1803463/+files/%D7%90%D7%A7%D7%95%D7%9C%D7%95%D7%92%D7%99%D7%94%20%D7%9C%D7%9E%D7%94%D7%A0%D7%93%D7%A1%D7%99%D7%9D.zip

Gnome-bugzilla: https://bugzilla.gnome.org/show_bug.cgi?id=306403

@Yanpas
Copy link
Author

Yanpas commented Oct 7, 2015

See also #5

@sc0w
Copy link
Member

sc0w commented Jun 9, 2016

p7zip 15.14 doesn't fix this issue

@gapan
Copy link

gapan commented Jun 9, 2016

Right. I tried the file posted by Yanpas and it still doesn't work with the p7zip 15.14.

@monsta
Copy link
Contributor

monsta commented Feb 15, 2017

Debian Testing has p7zip 16.02 now - did it get better?

@monsta monsta changed the title ZIP archieves containing files with non-ascii names are displayed wrongly ZIP archives containing files with non-ascii names are displayed wrongly Feb 15, 2017
@jinguojie-loongson
Copy link

No. I tried Fedora25 with p7zip 16.02 installed, and the error still exists.

@monsta
Copy link
Contributor

monsta commented Feb 15, 2017

That sucks. Looks like we should look at #5...

@nanxiongchao
Copy link

nanxiongchao commented Feb 25, 2017

I am fixing p7zip by adding necessary codepage conversion action.
Hope to eliminate this bug in subsequent p7zip versions.

@sc0w
Copy link
Member

sc0w commented May 17, 2017

This works in terminal with the file in this report:

env LANG=C 7z l filename.zip | iconv -f CP737 -t UTF-8

אקולוגיה למהנדסים.zip

@vkareh
Copy link
Member

vkareh commented Jul 9, 2019

@sc0w, with your command on your file, I see this:

$ env LANG=C 7z l אקולוגיה\ למהנדסים.zip | iconv -f CP737 -t UTF-8

7-Zip [64] 16.02 : Copyright (c) 1999-2016 Igor Pavlov : 2016-05-21
p7zip Version 16.02 (locale=C,Utf16=off,HugeFiles=on,64 bits,8 CPUs Intel(R) Core(TM) i7-7700HQ CPU @ 2.80GHz (906E9),ASM,AES-NI)

Scanning the drive for archives:
1 file, 640957 bytes (626 KiB)

Listing archive: ╫Ρ╫π╫Χ╫ε╫Χ╫Τ╫β╫Φ ╫ε╫η╫Φ╫ι╫Υ╫κ╫β╫ζ.zip

--
Path = ╫Ρ╫π╫Χ╫ε╫Χ╫Τ╫β╫Φ ╫ε╫η╫Φ╫ι╫Υ╫κ╫β╫ζ.zip
Type = zip
Physical Size = 640957

   Date      Time    Attr         Size   Compressed  Name
------------------- ----- ------------ ------------  ------------------------
2010-12-27 16:09:38 D....            0            0  ΑΩΖΝΖΓΚΕ ΝΟΕΡΔΣΚΞ
2010-12-27 16:09:36 ....A       259072       105777  ΑΩΖΝΖΓΚΕ ΝΟΕΡΔΣΚΞ/ΟΖβΓΚΞ ΝΟΒΘΠ ΒΑΩΖΝΖΓΚΕ.doc
2010-12-27 16:09:36 ....A        33792         6613  ΑΩΖΝΖΓΚΕ ΝΟΕΡΔΣΚΞ/ΡΣΦΘ ΝΦαΩ 26.doc
2010-12-27 16:09:36 ....A        32256         5144  ΑΩΖΝΖΓΚΕ ΝΟΕΡΔΣΚΞ/ΣΚΜΖΞ ΦαΩ 20.doc
2010-12-27 16:09:36 ....A       750592       442911  ΑΩΖΝΖΓΚΕ ΝΟΕΡΔΣΚΞ/ΦαΩ 13.doc
2010-12-27 16:09:36 ....A        48128        10545  ΑΩΖΝΖΓΚΕ ΝΟΕΡΔΣΚΞ/ΦαΩ 23.doc
2010-12-27 16:09:36 ....A        59392        16380  ΑΩΖΝΖΓΚΕ ΝΟΕΡΔΣΚΞ/ΦαΩ 26.doc
2010-12-27 16:09:36 ....A        39424         8353  ΑΩΖΝΖΓΚΕ ΝΟΕΡΔΣΚΞ/ΦαΩ 3.doc
2010-12-27 16:09:36 ....A       135168        43980  ΑΩΖΝΖΓΚΕ ΝΟΕΡΔΣΚΞ/ΦαΩΚΞ 20-22.doc
------------------- ----- ------------ ------------  ------------------------
2010-12-27 16:09:38            1357824       639703  8 files, 1 folders

But using unzip gives me a different character set:

$ unzip -l אקולוגיה\ למהנדסים.zip 
Archive:  אקולוגיה למהנדסים.zip
  Length      Date    Time    Name
---------  ---------- -----   ----
        0  2010-12-27 16:09   АЧЕМЕВЙД МОДРГСЙН/
   259072  2010-12-27 16:09   АЧЕМЕВЙД МОДРГСЙН/ОЕЩВЙН МОБЗП БАЧЕМЕВЙД.doc
    33792  2010-12-27 16:09   АЧЕМЕВЙД МОДРГСЙН/РСФЗ МФШЧ 26.doc
    32256  2010-12-27 16:09   АЧЕМЕВЙД МОДРГСЙН/СЙЛЕН ФШЧ 20.doc
   750592  2010-12-27 16:09   АЧЕМЕВЙД МОДРГСЙН/ФШЧ 13.doc
    48128  2010-12-27 16:09   АЧЕМЕВЙД МОДРГСЙН/ФШЧ 23.doc
    59392  2010-12-27 16:09   АЧЕМЕВЙД МОДРГСЙН/ФШЧ 26.doc
    39424  2010-12-27 16:09   АЧЕМЕВЙД МОДРГСЙН/ФШЧ 3.doc
   135168  2010-12-27 16:09   АЧЕМЕВЙД МОДРГСЙН/ФШЧЙН 20-22.doc
---------                     -------
  1357824                     9 files

Using unar, I get this:

$ unar אקולוגיה\ למהנדסים.zip 
אקולוגיה למהנדסים.zip: Zip
  %80%97%85%8c%85%82%89%84 %8c%8e%84%90%83%91%89%8d//  (dir)... OK.
  %80%97%85%8c%85%82%89%84 %8c%8e%84%90%83%91%89%8d/%8e%85%99%82%89%8d %8c%8e%81%87%8f %81%80%97%85%8c%85%82%89%84.doc  (259072 B)... OK.
  %80%97%85%8c%85%82%89%84 %8c%8e%84%90%83%91%89%8d/%90%91%94%87 %8c%94%98%97 26.doc  (33792 B)... OK.
  %80%97%85%8c%85%82%89%84 %8c%8e%84%90%83%91%89%8d/%91%89%8b%85%8d %94%98%97 20.doc  (32256 B)... OK.
  %80%97%85%8c%85%82%89%84 %8c%8e%84%90%83%91%89%8d/%94%98%97 13.doc  (750592 B)... OK.
  %80%97%85%8c%85%82%89%84 %8c%8e%84%90%83%91%89%8d/%94%98%97 23.doc  (48128 B)... OK.
  %80%97%85%8c%85%82%89%84 %8c%8e%84%90%83%91%89%8d/%94%98%97 26.doc  (59392 B)... OK.
  %80%97%85%8c%85%82%89%84 %8c%8e%84%90%83%91%89%8d/%94%98%97 3.doc  (39424 B)... OK.
  %80%97%85%8c%85%82%89%84 %8c%8e%84%90%83%91%89%8d/%94%98%97%89%8d 20-22.doc  (135168 B)... OK.
Successfully extracted to "אקולוגיה למהנדסים".

None of those are actually the same character set as the filename. The filename looks like hebrew characters to me, but the contents look like greek (7z) and cyrilic (unzip) characters. I don't really know either of those languages, so I cannot comment on this, but it looks like there's no solution for this yet...

@Yanpas can you verify whether the file is encoded with any of these character sets?

@Yanpas
Copy link
Author

Yanpas commented Jul 9, 2019

Yep, same on windows with 7zip. (chcp 866)
image

@sc0w
Copy link
Member

sc0w commented Jul 9, 2019

@vkareh unzip doesn't work here:

$ unzip -l default.zip 
Archive:  default.zip
  Length      Date    Time    Name
---------  ---------- -----   ----
        0  2010-12-27 16:09   �������� ��������/
   259072  2010-12-27 16:09   �������� ��������/������ ����� ���������.doc
    33792  2010-12-27 16:09   �������� ��������/���� ���� 26.doc
    32256  2010-12-27 16:09   �������� ��������/����� ��� 20.doc
   750592  2010-12-27 16:09   �������� ��������/��� 13.doc
    48128  2010-12-27 16:09   �������� ��������/��� 23.doc
    59392  2010-12-27 16:09   �������� ��������/��� 26.doc
    39424  2010-12-27 16:09   �������� ��������/��� 3.doc
   135168  2010-12-27 16:09   �������� ��������/����� 20-22.doc
---------                     -------
  1357824                     9 files

@vkareh
Copy link
Member

vkareh commented Jul 9, 2019

@sc0w, what's your echo $LANG?

If I set it to a UTF-8 variant (LANG=en_US.UTF-8 unzip -l default.zip) then it works...

@sc0w
Copy link
Member

sc0w commented Jul 9, 2019

it works with:

$ env LANG=C 7z l default.zip | iconv -f CP866 -t UTF-8

7-Zip [64] 16.02 : Copyright (c) 1999-2016 Igor Pavlov : 2016-05-21
p7zip Version 16.02 (locale=C,Utf16=off,HugeFiles=on,64 bits,2 CPUs Intel(R) Core(TM)2 Duo CPU     E8400  @ 3.00GHz (1067A),ASM)

Scanning the drive for archives:
1 file, 640957 bytes (626 KiB)

Listing archive: default.zip

--
Path = default.zip
Type = zip
Physical Size = 640957

   Date      Time    Attr         Size   Compressed  Name
------------------- ----- ------------ ------------  ------------------------
2010-12-27 16:09:38 D....            0            0  АЧЕМЕВЙД МОДРГСЙН
2010-12-27 16:09:36 ....A       259072       105777  АЧЕМЕВЙД МОДРГСЙН/ОЕЩВЙН МОБЗП БАЧЕМЕВЙД.doc
2010-12-27 16:09:36 ....A        33792         6613  АЧЕМЕВЙД МОДРГСЙН/РСФЗ МФШЧ 26.doc
2010-12-27 16:09:36 ....A        32256         5144  АЧЕМЕВЙД МОДРГСЙН/СЙЛЕН ФШЧ 20.doc
2010-12-27 16:09:36 ....A       750592       442911  АЧЕМЕВЙД МОДРГСЙН/ФШЧ 13.doc
2010-12-27 16:09:36 ....A        48128        10545  АЧЕМЕВЙД МОДРГСЙН/ФШЧ 23.doc
2010-12-27 16:09:36 ....A        59392        16380  АЧЕМЕВЙД МОДРГСЙН/ФШЧ 26.doc
2010-12-27 16:09:36 ....A        39424         8353  АЧЕМЕВЙД МОДРГСЙН/ФШЧ 3.doc
2010-12-27 16:09:36 ....A       135168        43980  АЧЕМЕВЙД МОДРГСЙН/ФШЧЙН 20-22.doc
------------------- ----- ------------ ------------  ------------------------
2010-12-27 16:09:38            1357824       639703  8 files, 1 folders

it doesn't work with unar (lsar is part of unar used to see the list of files)

$ lsar default.zip 
default.zip: Zip
%80%97%85%8c%85%82%89%84 %8c%8e%84%90%83%91%89%8d//
%80%97%85%8c%85%82%89%84 %8c%8e%84%90%83%91%89%8d/%8e%85%99%82%89%8d %8c%8e%81%87%8f %81%80%97%85%8c%85%82%89%84.doc
%80%97%85%8c%85%82%89%84 %8c%8e%84%90%83%91%89%8d/%90%91%94%87 %8c%94%98%97 26.doc
%80%97%85%8c%85%82%89%84 %8c%8e%84%90%83%91%89%8d/%91%89%8b%85%8d %94%98%97 20.doc
%80%97%85%8c%85%82%89%84 %8c%8e%84%90%83%91%89%8d/%94%98%97 13.doc
%80%97%85%8c%85%82%89%84 %8c%8e%84%90%83%91%89%8d/%94%98%97 23.doc
%80%97%85%8c%85%82%89%84 %8c%8e%84%90%83%91%89%8d/%94%98%97 26.doc
%80%97%85%8c%85%82%89%84 %8c%8e%84%90%83%91%89%8d/%94%98%97 3.doc
%80%97%85%8c%85%82%89%84 %8c%8e%84%90%83%91%89%8d/%94%98%97%89%8d 20-22.do

@sc0w
Copy link
Member

sc0w commented Jul 9, 2019

@sc0w, what's your echo $LANG?

If I set it to a UTF-8 variant (LANG=en_US.UTF-8 unzip -l default.zip) then it works...

$ echo $LANG
C.UTF-8
$ env LANG=en_US.UTF-8 unzip -l default.zip
Archive:  default.zip
  Length      Date    Time    Name
---------  ---------- -----   ----
        0  2010-12-27 16:09   �������� ��������/
   259072  2010-12-27 16:09   �������� ��������/������ ����� ���������.doc
    33792  2010-12-27 16:09   �������� ��������/���� ���� 26.doc
    32256  2010-12-27 16:09   �������� ��������/����� ��� 20.doc
   750592  2010-12-27 16:09   �������� ��������/��� 13.doc
    48128  2010-12-27 16:09   �������� ��������/��� 23.doc
    59392  2010-12-27 16:09   �������� ��������/��� 26.doc
    39424  2010-12-27 16:09   �������� ��������/��� 3.doc
   135168  2010-12-27 16:09   �������� ��������/����� 20-22.doc
---------                     -------
  1357824                     9 files

@vkareh
Copy link
Member

vkareh commented Jul 9, 2019

LOL, this is going nowhere >_<

Okay, finally found it:

$ LANG=C unzip -l 'אקולוגיה למהנדסים.zip' | iconv -f CP862 -t UTF-8
Archive:  ╫נ╫º╫ץ╫£╫ץ╫ע╫ש╫פ ╫£╫₧╫פ╫á╫ף╫í╫ש╫¥.zip
  Length      Date    Time    Name
---------  ---------- -----   ----
        0  2010-12-27 16:09   אקולוגיה למהנדסים/
   259072  2010-12-27 16:09   אקולוגיה למהנדסים/מושגים למבחן באקולוגיה.doc
    33792  2010-12-27 16:09   אקולוגיה למהנדסים/נספח לפרק 26.doc
    32256  2010-12-27 16:09   אקולוגיה למהנדסים/סיכום פרק 20.doc
   750592  2010-12-27 16:09   אקולוגיה למהנדסים/פרק 13.doc
    48128  2010-12-27 16:09   אקולוגיה למהנדסים/פרק 23.doc
    59392  2010-12-27 16:09   אקולוגיה למהנדסים/פרק 26.doc
    39424  2010-12-27 16:09   אקולוגיה למהנדסים/פרק 3.doc
   135168  2010-12-27 16:09   אקולוגיה למהנדסים/פרקים 20-22.doc
---------                     -------
  1357824                     9 files

According to 306403#c25, the original file is in Hebrew. The "Archive" entry looks wrong, but the filenames are correct now. Hebrew is read right to left, so you see this: (file-number, appended) file-name/folder-name (read r-to-l), and followed by .doc (read l-to-r, since it's in Latin alphabet)

The problem is now how to tell engrampa to use any specific encoding?

@MikePryadko
Copy link

I have same problem: I need ability to set encoding for an archive, like this:

$ unzip -l -O CP866 ГИС\ ЖКХ_Интеграция_v.13.1.1.5\ \(текущие\).zip 
Archive:  ГИС ЖКХ_Интеграция_v.13.1.1.5 (текущие).zip
  Length      Date    Time    Name
---------  ---------- -----   ----
     8760  2020-04-03 11:10   ГИС ЖКХ_Интеграция_v.13.1.1.5 (текущие)/CA-PPAK_2019.pem
     1834  2020-04-03 11:10   ГИС ЖКХ_Интеграция_v.13.1.1.5 (текущие)/CA-SIT_2019.pem
        0  2020-04-03 11:10   ГИС ЖКХ_Интеграция_v.13.1.1.5 (текущие)/hcs_wsdl_xsd_v.13.1.1.5/
        0  2020-04-03 11:10   ГИС ЖКХ_Интеграция_v.13.1.1.5 (текущие)/hcs_wsdl_xsd_v.13.1.1.5/appeals/
     5883  2020-04-03 11:10   ГИС ЖКХ_Интеграция_v.13.1.1.5 (текущие)/hcs_wsdl_xsd_v.13.1.1.5/appeals/hcs-appeals-service-async.wsdl
{-=text trimmed=-}
    37381  2020-04-03 11:10   ГИС ЖКХ_Интеграция_v.13.1.1.5 (текущие)/Шаблон заявки на подключение к информационному взаимодействию с СИТ ГИС ЖКХ кредитных организаций.docx
    37350  2020-04-03 11:10   ГИС ЖКХ_Интеграция_v.13.1.1.5 (текущие)/Шаблон заявки на подключение к информационному взаимодействию с СИТ ГИС ЖКХ РГИС,МИС.docx
    36245  2020-04-03 11:10   ГИС ЖКХ_Интеграция_v.13.1.1.5 (текущие)/Шаблон заявки на подключение к информационному взаимодействию с СИТ ГИС ЖКХ собственных ИС.docx
    36906  2020-04-03 11:10   ГИС ЖКХ_Интеграция_v.13.1.1.5 (текущие)/Шаблон уведомления об окончании тестовых испытаний.docx
---------                     -------
 30225772                     108 files

But with engrampa I have only some "default" behaviour:
enrampa

@smesgr
Copy link

smesgr commented May 27, 2021

Ümläute ßind töll.rtf.zip
File with Unicode ZIP (packed by MacOS X) are not displayed and unpacked properly. I assume General Purpose Flag in Appendix D (https://pkware.cachefly.net/webdocs/casestudies/APPNOTE.TXT) isn't used to determine UTF-8 / Unicode encoding is enabled for the ZIP. This works fine with unzip and zipinfo.

the filename should be:
zipinfo Ümläute\ ßind\ töll.rtf.zip
Archive: Ümläute ßind töll.rtf.zip
Zip file size: 1058 bytes, number of entries: 2
-rw-r--r-- 2.0 unx 403 bX defN 21-May-27 10:15 Ümläute ßind töll.rtf
-rw-r--r-- 2.0 unx 523 bX defN 21-May-27 10:15 __MACOSX/._Ümläute ßind töll.rtf
2 files, 926 bytes uncompressed, 590 bytes compressed: 36.3%

Tested with Engrampa 1.24.0

See attached ZIP-File.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

9 participants