-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Ensure unicode characters in zip-compressed filenames work correctly #4702
Conversation
Another corner-case we are fixing hoping it doesn't break anything else. This fixes: - The correct encoding of unicode paths internally (so the filenames we scrape from the output and is returned by zipfile match) - Disable LANG=C for the unzip command (because it breaks the unicode output, unlike on gtar)
Thanks again to @dagwieers. This module is going into community review. We notice that you are one of the maintainers (@dagwieers, @pileofrogs) of this module, so when you think it's ready, please comment "shipit" and we will consider it for merging. [This message brought to you by your friendly Ansibull-bot.] |
shipit |
Thanks @dagwieers. Since you are one of the maintainers (@dagwieers, @pileofrogs) of this module and you commented with "shipit", we are marking this PR for inclusion. [This message brought to you by your friendly Ansibull-bot.] |
@@ -347,7 +347,7 @@ def is_unarchived(self): | |||
version = pcs[1] | |||
ostype = pcs[2] | |||
size = int(pcs[3]) | |||
path = pcs[7] | |||
path = unicode(pcs[7], 'utf-8') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unicode is not available on python3. This also won't handle non-utf8 filenames but I'm not sure we care about those for now. Do you remember if any of the old bugs were wrt non-utf8 filenames inside of the tar or zip?
proposed code to deal with the python3 issue:
from ansible.module_utils._text import to_text
[...]
path = to_text(pcs[7], errors='surrogate_or_strict')
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't we test all code against python3 ? I thought if it was approved by Travis/Shippable, things were fine ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For most modules there's two sets of tests wrt python3 and modules -- there's a syntax test which attempts to compile the source into bytecode using python3 and integration tests. The compile test won't pick this up because unicode()
could be a user defined function. The integration tests only exercise the code paths needed for the integration test to pass and also are only running on the modules which already pass (to help catch regressions). Since unarchive doesn't pass yet, the integration tests aren't being run on it.
Here's the blacklist of integration tags: https://github.com/ansible/ansible/blob/devel/test/utils/shippable/python3-test-tag-blacklist.txt
Overall strategy looks okay. I made some comments on the implementation. |
needs_revision |
Thanks @dagwieers for this PR. This PR requires revisions, either because it fails to build or by reviewer request. Please make the suggested revisions. When you are done, please comment with text 'ready_for_review' and we will put this PR back into review. [This message brought to you by your friendly Ansibull-bot.] |
@abadger ready_for_review |
Thanks @dagwieers. To the current maintainers, @dagwieers, @pileofrogs please review according to guidelines (http://docs.ansible.com/ansible/developing_modules.html#module-checklist) and comment with text 'shipit', 'needs_revision' or 'close_me' as appropriate. [This message brought to you by your friendly Ansibull-bot.] |
Merged to devel.... There's some conflicts when trying to merge this to stable-2.1. I think the environment setting can be merged pretty straightforwardly. The change to make pcs[7] a text string is harder as to_text isn't available in 2.1. try: unicode(pcs[7], 'utf-8') except UnicodeError: module.fail_json([...]) might be the best thing to do if we want to backpotr that. |
Wouldn't backporting to_text not be easier, especially if it doesn't exist, it won't break anything either ? Or leaving it out altogether, especially if v2.2 is going to be out in the coming weeks ? |
Yeah, I can merge just the environment setting if that's sufficient. |
Can't port directly because some helper functions are not available in 2.1 commit 1cda0b1 Author: Dag Wieers <dag@wieers.com> Date: Fri Sep 9 18:26:19 2016 +0200 Ensure unicode characters in zip-compressed filenames work correctly (#4702) Another corner-case we are fixing hoping it doesn't break anything else. This fixes: - The correct encoding of unicode paths internally (so the filenames we scrape from the output and is returned by zipfile match) - Disable LANG=C for the unzip command (because it breaks the unicode output, unlike on gtar) * Fix for python3 and other suggestions from @abadger
Cherrypicked to 2.1 here: aec6f4d |
@abadger Thanks ! Let's hope the number of reported issues wrt. unarchive stays low now :-) |
ISSUE TYPE
COMPONENT NAME
unarchive
SUMMARY
This fixes:
Another corner-case we are fixing hoping it doesn't break anything else.
This fixes #4409, #4478 and another unreported issue.
@abadger @jimi-c If this turns out to work fine, this fix needs to go into v2.1.2 and v2.2 !