Ensure unicode characters in zip-compressed filenames work correctly #4702

dagwieers · 2016-09-04T21:07:59Z

ISSUE TYPE

Bugfix Pull Request

COMPONENT NAME

unarchive

SUMMARY

This fixes:

The correct encoding of unicode paths internally (so the filenames we scrape from the output and is returned by zipfile match)
Disable LANG=C for the unzip command (because it breaks the unicode output, unlike on gtar)

Another corner-case we are fixing hoping it doesn't break anything else.
This fixes #4409, #4478 and another unreported issue.

@abadger @jimi-c If this turns out to work fine, this fix needs to go into v2.1.2 and v2.2 !

Another corner-case we are fixing hoping it doesn't break anything else. This fixes: - The correct encoding of unicode paths internally (so the filenames we scrape from the output and is returned by zipfile match) - Disable LANG=C for the unzip command (because it breaks the unicode output, unlike on gtar)

gregdek · 2016-09-04T21:25:07Z

Thanks again to @dagwieers. This module is going into community review.

We notice that you are one of the maintainers (@dagwieers, @pileofrogs) of this module, so when you think it's ready, please comment "shipit" and we will consider it for merging.

[This message brought to you by your friendly Ansibull-bot.]

dagwieers · 2016-09-04T21:25:47Z

shipit

gregdek · 2016-09-05T03:25:07Z

Thanks @dagwieers. Since you are one of the maintainers (@dagwieers, @pileofrogs) of this module and you commented with "shipit", we are marking this PR for inclusion.

[This message brought to you by your friendly Ansibull-bot.]

abadger · 2016-09-06T16:54:30Z

files/unarchive.py

@@ -347,7 +347,7 @@ def is_unarchived(self):
            version = pcs[1]
            ostype = pcs[2]
            size = int(pcs[3])
-            path = pcs[7]
+            path = unicode(pcs[7], 'utf-8')


unicode is not available on python3. This also won't handle non-utf8 filenames but I'm not sure we care about those for now. Do you remember if any of the old bugs were wrt non-utf8 filenames inside of the tar or zip?

proposed code to deal with the python3 issue:

from ansible.module_utils._text import to_text [...] path = to_text(pcs[7], errors='surrogate_or_strict')

Don't we test all code against python3 ? I thought if it was approved by Travis/Shippable, things were fine ?

For most modules there's two sets of tests wrt python3 and modules -- there's a syntax test which attempts to compile the source into bytecode using python3 and integration tests. The compile test won't pick this up because unicode() could be a user defined function. The integration tests only exercise the code paths needed for the integration test to pass and also are only running on the modules which already pass (to help catch regressions). Since unarchive doesn't pass yet, the integration tests aren't being run on it.

Here's the blacklist of integration tags: https://github.com/ansible/ansible/blob/devel/test/utils/shippable/python3-test-tag-blacklist.txt

abadger · 2016-09-06T16:59:15Z

Overall strategy looks okay. I made some comments on the implementation.

abadger · 2016-09-06T16:59:22Z

needs_revision

gregdek · 2016-09-06T21:25:12Z

Thanks @dagwieers for this PR. This PR requires revisions, either because it fails to build or by reviewer request. Please make the suggested revisions. When you are done, please comment with text 'ready_for_review' and we will put this PR back into review.

[This message brought to you by your friendly Ansibull-bot.]

dagwieers · 2016-09-06T21:54:02Z

@abadger ready_for_review

gregdek · 2016-09-07T03:25:11Z

Thanks @dagwieers. To the current maintainers, @dagwieers, @pileofrogs please review according to guidelines (http://docs.ansible.com/ansible/developing_modules.html#module-checklist) and comment with text 'shipit', 'needs_revision' or 'close_me' as appropriate.

[This message brought to you by your friendly Ansibull-bot.]

abadger · 2016-09-09T16:32:32Z

Merged to devel.... There's some conflicts when trying to merge this to stable-2.1. I think the environment setting can be merged pretty straightforwardly. The change to make pcs[7] a text string is harder as to_text isn't available in 2.1. try: unicode(pcs[7], 'utf-8') except UnicodeError: module.fail_json([...]) might be the best thing to do if we want to backpotr that.

dagwieers · 2016-09-09T16:38:37Z

Wouldn't backporting to_text not be easier, especially if it doesn't exist, it won't break anything either ?
Maybe a simplified version of to_text ?

Or leaving it out altogether, especially if v2.2 is going to be out in the coming weeks ?

abadger · 2016-09-09T16:46:01Z

Yeah, I can merge just the environment setting if that's sufficient.

@abadger

Can't port directly because some helper functions are not available in 2.1 commit 1cda0b1 Author: Dag Wieers <dag@wieers.com> Date: Fri Sep 9 18:26:19 2016 +0200 Ensure unicode characters in zip-compressed filenames work correctly (#4702) Another corner-case we are fixing hoping it doesn't break anything else. This fixes: - The correct encoding of unicode paths internally (so the filenames we scrape from the output and is returned by zipfile match) - Disable LANG=C for the unzip command (because it breaks the unicode output, unlike on gtar) * Fix for python3 and other suggestions from @abadger

abadger · 2016-09-09T17:00:57Z

Cherrypicked to 2.1 here: aec6f4d
In the end I was a little wary that not changing pcs[7] to a text type would cause problems. Hopefully using the raw unicode() constructor there will work in the majority of cases. Will work for ascii, should work for utf-8, and not sure if zip files allow non-utf8 in the archived filenames.

dagwieers · 2016-09-09T21:44:48Z

@abadger Thanks ! Let's hope the number of reported issues wrt. unarchive stays low now :-)

dagwieers mentioned this pull request Sep 4, 2016

unarchive module error--cloud not support Chinese #4478

Closed

gregdek added the owner_pr label Sep 4, 2016

gregdek added community_review bugfix_pull_request labels Sep 4, 2016

dagwieers mentioned this pull request Sep 4, 2016

Unarchive fails verifying non-ascii named files in zip archives #4409

Closed

gregdek added shipit and removed community_review labels Sep 5, 2016

abadger reviewed Sep 6, 2016
View reviewed changes

gregdek added needs_revision and removed shipit labels Sep 6, 2016

Fix for python3 and other suggestions from @abadger

3ba7ad7

gregdek added community_review and removed needs_revision labels Sep 7, 2016

abadger merged commit 1cda0b1 into ansible:devel Sep 9, 2016

alikins added the P2 label Sep 9, 2016

dagwieers deleted the patch-7 branch September 9, 2016 20:54

sivel mentioned this pull request Nov 6, 2020

Unarchive fails for remote tar archives when using relative paths ansible/ansible#72517

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ensure unicode characters in zip-compressed filenames work correctly #4702

Ensure unicode characters in zip-compressed filenames work correctly #4702

dagwieers commented Sep 4, 2016 •

edited

Loading

gregdek commented Sep 4, 2016

dagwieers commented Sep 4, 2016

gregdek commented Sep 5, 2016

abadger Sep 6, 2016

dagwieers Sep 6, 2016

abadger Sep 9, 2016

abadger commented Sep 6, 2016

abadger commented Sep 6, 2016

gregdek commented Sep 6, 2016

dagwieers commented Sep 6, 2016

gregdek commented Sep 7, 2016

abadger commented Sep 9, 2016

dagwieers commented Sep 9, 2016

abadger commented Sep 9, 2016

abadger commented Sep 9, 2016

dagwieers commented Sep 9, 2016

Ensure unicode characters in zip-compressed filenames work correctly #4702

Ensure unicode characters in zip-compressed filenames work correctly #4702

Conversation

dagwieers commented Sep 4, 2016 • edited Loading

ISSUE TYPE

COMPONENT NAME

SUMMARY

gregdek commented Sep 4, 2016

dagwieers commented Sep 4, 2016

gregdek commented Sep 5, 2016

abadger Sep 6, 2016

Choose a reason for hiding this comment

dagwieers Sep 6, 2016

Choose a reason for hiding this comment

abadger Sep 9, 2016

Choose a reason for hiding this comment

abadger commented Sep 6, 2016

abadger commented Sep 6, 2016

gregdek commented Sep 6, 2016

dagwieers commented Sep 6, 2016

gregdek commented Sep 7, 2016

abadger commented Sep 9, 2016

dagwieers commented Sep 9, 2016

abadger commented Sep 9, 2016

abadger commented Sep 9, 2016

dagwieers commented Sep 9, 2016

dagwieers commented Sep 4, 2016 •

edited

Loading