Skip to content
This repository has been archived by the owner on Oct 30, 2018. It is now read-only.

Ensure unicode characters in zip-compressed filenames work correctly #4702

Merged
merged 2 commits into from
Sep 9, 2016

Conversation

dagwieers
Copy link
Contributor

@dagwieers dagwieers commented Sep 4, 2016

ISSUE TYPE
  • Bugfix Pull Request
COMPONENT NAME

unarchive

SUMMARY

This fixes:

  • The correct encoding of unicode paths internally (so the filenames we scrape from the output and is returned by zipfile match)
  • Disable LANG=C for the unzip command (because it breaks the unicode output, unlike on gtar)

Another corner-case we are fixing hoping it doesn't break anything else.
This fixes #4409, #4478 and another unreported issue.

@abadger @jimi-c If this turns out to work fine, this fix needs to go into v2.1.2 and v2.2 !

Another corner-case we are fixing hoping it doesn't break anything else.

This fixes:
- The correct encoding of unicode paths internally (so the filenames we scrape from the output and is returned by zipfile match)
- Disable LANG=C for the unzip command (because it breaks the unicode output, unlike on gtar)
@gregdek
Copy link
Contributor

gregdek commented Sep 4, 2016

Thanks again to @dagwieers. This module is going into community review.

We notice that you are one of the maintainers (@dagwieers, @pileofrogs) of this module, so when you think it's ready, please comment "shipit" and we will consider it for merging.

[This message brought to you by your friendly Ansibull-bot.]

@dagwieers
Copy link
Contributor Author

shipit

@gregdek
Copy link
Contributor

gregdek commented Sep 5, 2016

Thanks @dagwieers. Since you are one of the maintainers (@dagwieers, @pileofrogs) of this module and you commented with "shipit", we are marking this PR for inclusion.

[This message brought to you by your friendly Ansibull-bot.]

@@ -347,7 +347,7 @@ def is_unarchived(self):
version = pcs[1]
ostype = pcs[2]
size = int(pcs[3])
path = pcs[7]
path = unicode(pcs[7], 'utf-8')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unicode is not available on python3. This also won't handle non-utf8 filenames but I'm not sure we care about those for now. Do you remember if any of the old bugs were wrt non-utf8 filenames inside of the tar or zip?

proposed code to deal with the python3 issue:

from ansible.module_utils._text import to_text
[...]
path = to_text(pcs[7], errors='surrogate_or_strict')

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we test all code against python3 ? I thought if it was approved by Travis/Shippable, things were fine ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For most modules there's two sets of tests wrt python3 and modules -- there's a syntax test which attempts to compile the source into bytecode using python3 and integration tests. The compile test won't pick this up because unicode() could be a user defined function. The integration tests only exercise the code paths needed for the integration test to pass and also are only running on the modules which already pass (to help catch regressions). Since unarchive doesn't pass yet, the integration tests aren't being run on it.

Here's the blacklist of integration tags: https://github.com/ansible/ansible/blob/devel/test/utils/shippable/python3-test-tag-blacklist.txt

@abadger
Copy link
Contributor

abadger commented Sep 6, 2016

Overall strategy looks okay. I made some comments on the implementation.

@abadger
Copy link
Contributor

abadger commented Sep 6, 2016

needs_revision

@gregdek
Copy link
Contributor

gregdek commented Sep 6, 2016

Thanks @dagwieers for this PR. This PR requires revisions, either because it fails to build or by reviewer request. Please make the suggested revisions. When you are done, please comment with text 'ready_for_review' and we will put this PR back into review.

[This message brought to you by your friendly Ansibull-bot.]

@dagwieers
Copy link
Contributor Author

@abadger ready_for_review

@gregdek
Copy link
Contributor

gregdek commented Sep 7, 2016

Thanks @dagwieers. To the current maintainers, @dagwieers, @pileofrogs please review according to guidelines (http://docs.ansible.com/ansible/developing_modules.html#module-checklist) and comment with text 'shipit', 'needs_revision' or 'close_me' as appropriate.

[This message brought to you by your friendly Ansibull-bot.]

@abadger abadger merged commit 1cda0b1 into ansible:devel Sep 9, 2016
@abadger
Copy link
Contributor

abadger commented Sep 9, 2016

Merged to devel.... There's some conflicts when trying to merge this to stable-2.1. I think the environment setting can be merged pretty straightforwardly. The change to make pcs[7] a text string is harder as to_text isn't available in 2.1. try: unicode(pcs[7], 'utf-8') except UnicodeError: module.fail_json([...]) might be the best thing to do if we want to backpotr that.

@alikins alikins added the P2 label Sep 9, 2016
@dagwieers
Copy link
Contributor Author

Wouldn't backporting to_text not be easier, especially if it doesn't exist, it won't break anything either ?
Maybe a simplified version of to_text ?

Or leaving it out altogether, especially if v2.2 is going to be out in the coming weeks ?

@abadger
Copy link
Contributor

abadger commented Sep 9, 2016

Yeah, I can merge just the environment setting if that's sufficient.

abadger added a commit that referenced this pull request Sep 9, 2016
Can't port directly because some helper functions are not available in 2.1

commit 1cda0b1
Author: Dag Wieers <dag@wieers.com>
Date:   Fri Sep 9 18:26:19 2016 +0200

    Ensure unicode characters in zip-compressed filenames work correctly (#4702)
    Another corner-case we are fixing hoping it doesn't break anything else.
    This fixes:
    - The correct encoding of unicode paths internally (so the filenames we scrape from the output and is returned by zipfile match)
    - Disable LANG=C for the unzip command (because it breaks the unicode output, unlike on gtar)
    * Fix for python3 and other suggestions from @abadger
@abadger
Copy link
Contributor

abadger commented Sep 9, 2016

Cherrypicked to 2.1 here: aec6f4d
In the end I was a little wary that not changing pcs[7] to a text type would cause problems. Hopefully using the raw unicode() constructor there will work in the majority of cases. Will work for ascii, should work for utf-8, and not sure if zip files allow non-utf8 in the archived filenames.

@dagwieers dagwieers deleted the patch-7 branch September 9, 2016 20:54
@dagwieers
Copy link
Contributor Author

@abadger Thanks ! Let's hope the number of reported issues wrt. unarchive stays low now :-)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Unarchive fails verifying non-ascii named files in zip archives
4 participants