-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Conversation
This manages compressed files or archives of many compressed files. You can maintain or update .gz, .bz2 compressed files, .zip archives, or tarballs compressed with gzip or bzip2. Possible use cases: * Back up user home directories * Ensure large text files are always compressed * Archive trees for distribution
* Don't include the archive in the archive if it falls within an archived path * If remove=True and the archive would be in an archived path, fail. * Fix single-file zip file compression * Add more documentation about 'state' return
Use correct variable in expanduser
Thanks @bendoh for this new module. When this module receives 'shipit' comments from two community members and any 'needs_revision' comments have been resolved, we will mark for inclusion. [This message brought to you by your friendly Ansibull-bot.] |
compression: | ||
description: | ||
- The type of compression to use. Can be 'gz', 'bz2', or 'zip'. | ||
choices: [ 'gz', 'bz2', 'zip' ] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
default is gz
according arg spec.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! got it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This conflates compression with archive format. Probably do not want to do that. Maybe:
choices: ['tar.gz', 'tar.bz2', 'zip']
Looking into this further, I also realized that this module attempts to guess whether the user wants to archive or compress. We don't want to do that as the user may really require a single file tarball rather than a gz compressed file. I'll discuss that in the issue rather than as a line comment.
Bikeshedding: the parameter name would be more accurately said to "format" or "archive_type" rather than compression.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also may want to support 'tar' (uncompressed tarfiles) here.
I was just testing the module. Seems good but i am from germany and if i run it on a directory in path |
Can you provide an example? Thanks for testing! Very helpful. |
* rename archive -> arcfile (where it's a file descriptor) * additional return * simplify logic around 'archive?' flag * maintain os separator after arcroot * use function instead of lambda for filter, ensure file exists before file.cmp'ing it * track errored files and fail if there are any
…ules-extras into bendoh-archive-module * 'bendoh-archive-module' of github.com:bendoh/ansible-modules-extras: Add 'default' to docs for 'compression' option
ec856c8
to
ef620c7
Compare
…rent This fixed a few bugs and simplified the code
I think «creates» must works like in module «shell». |
Sure no problem
the debug output
the resulting file contains only 64 kb |
I also would prefer to using dest instead of creates |
expanded_paths = expanded_paths + glob.glob(path) | ||
else: | ||
expanded_paths.append(path) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: There's a difference between these two code paths. Simply appending the path does not check if the path exists. glob.glob() will return an empty list if no files and directories matched the path.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Worth a comment!
@bendoh A few review comments came in after I merged this. Could you please raise a fresh PR to fix those issues? Thanks for this new module. We like the pure Python implementation. |
|
||
# If we actually matched multiple files or TRIED to, then | ||
# treat this as a multi-file archive | ||
archive = globby or os.path.isdir(expanded_paths[0]) or len(expanded_paths) > 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
globby is always False here (globby is set to an initial value at the top of the main() function but then never changed. Perhaps it just needs to be removed altogether?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should get set to true when a glob character is detected; so even only one file matches the glob, it will generate an archive instead of a simple compressed file. I'll update that in my future PR...
Thanks for the feedback, I'll open a new PR with fixes soon. |
if i < len(arcroot): | ||
arcroot = os.path.dirname(arcroot[0:i+1]) | ||
|
||
arcroot += os.sep |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
xrange is not python3 compatible. Not sure this does the right thing if, for example, you have expanded_paths = ['/srv/one-two/', '/srv/one-three']
I think it will end up with arcroot = '/srv/one-t/'
. Probably need to write some code based around recursively calling os.path.split()
or (a little hackier but usually correct) arcroot.split(os.path.sep)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Except that os.path.dirname
is used to determine the actual shared parent directory, it's not just extracting substrings from the path!
Couple toplevel things to note as well:
Right now I favour the latter for a few reasons:
So rather than spending too much time trying to figure out how best to change the UI here to handle both archives and simple compressors, we should probably create a second module that handles pure compression and take it out of this one. |
To be honest, I don't think making a distinction between unarchive and uncompress makes a lot of sense to our users. In most (if not all) cases, they want to get the content and not the archive out of a compressed file (if they already understand the difference between an archive and a compressed file, as zip does not make this distinction). My plan was to deprecate unarchive, and replace it with a pure-python uncompress module since we cannot create a pure-python unarchive module as a drop-in replacement anyhow (extra_opts simply ceases to exist so it breaks stuff). To summarize, it means that this module would become compress instead, and I welcome this contribution ! |
@abadger Thank you for your feedback. I haven't done a lot of work with Python, so I followed the structure of other modules as examples until I felt a bit more comfortable. I'm not too familiar with Python2 vs 3 compatibility issues, either, so thanks for pointing them out. I will work on an update tonight and submit another PR. I like the idea of having more explicit output format options, so we know exactly what kind of files we get out. In terms of the idea you raise of recursively compressing files within a tree with gzip, I think it would add a bulky chunk of code, but it is definitely worth exploring. @dagwieers I agree that many users wouldn't really make the distinction between a compressed and an archived file, and it seems like splitting hairs to me. This module attempts to smooth over the differences that do exist between those concepts in a predictable way. For parity's sake, however, since you intend to deprecate unarchive in favor of the pure-python uncompress, wouldn't it be logical to rename this module to compress? That way the archive/unarchive commands would be logically grouped (as unavailable/deprecated) and the compress/uncompress commands would be similarly grouped, as two pure-python modules. It would then become the practice to use that pair of verbs to describe the concept of "squishing file(s)", and be more coherent to users. The reason I chose archive as the name in the first place was to be an obvious opposite of the existing unarchive command. |
@bendoh That would indeed have my preference, but I don't think my plan was ever vetted by the powers that be ;-) There is some backstory in ansible/ansible-modules-core#3307 BTW I think your archive (or rather compress) module ought to live in ansible-modules-core, next to unarchive (or preferably the new/future pure-python uncompress). |
I've added the archive+compress question to the core meeting agenda so we can get an overall plan sorted out before 2.2. |
@bendoh @dagwieers We talked about it at yesterdays meeting and it seems that we're okay with either single module or two modules. However we're concerned about the parameters and how they will affect what the user can do. We're not sure if we have time to review the parameters for 2.2 so we're going to debate what our options for 2.2 are at the meeting tomorrow. |
@abadger Thanks for the heads-up. This is related to the specifics of archive/compress, but I would also be interested to know what the plan is for a pure-python uncompress module (and the deprecation of unarchive in favor of it). There are some restrictions if we go pure-python, but we would end up with a first-class idempotent and check-mode/diff-mode supporting module. (Not the current bug-ridden implementation that scrapes output :-)) |
@dagwieers I don't think it's on our radar to write such a thing ourselves but we'd certainly consider deprecating unarchive for such a module if it was contributed. Regarding archive, at the meeting yesterday we came to the conclusion that no one on the core team has the time to go over the parameters before Feature Freeze. Time may or may not free up after feature freeze and before the 2.2 release candidates start (with many new features come many new bugs :-( We decided that since users rely on the parameters staying the same (or being replaced in a backwards compatible manner) we don't want to ship archive in 2.2 without getting those right. So once stable-2.2 branches, we'll revert archive in that branch but leave it in devel for 2.3. If someone is able to review the parameters and come up with a new design that handles the cornercases (single file archives and multi-file compression) and then we're able to merge code to implement that before the 2.2 rc's start then we can cherry-pick the changes back into 2.2. If not, we'll make every effort to get that done for 2.3. I'm very sorry that we don't have more time to spend on giving guidance about the parameters now. Thanks for your patience. |
I'm just excited to have been able to contribute! Thanks @abadger, look forward to getting this into 2.3. There are a couple of things I need to clean up for old-python compatibility, in any case. And thank you @dagwieers for all your input. |
ISSUE TYPE
COMPONENT NAME
archive
ANSIBLE VERSION
SUMMARY
This module serves two purposes:
My particular use case was that I wanted to ensure a large SQL file stayed in a compressed state, and would update when the non-compressed version was restored. I initially opened a PR ansible/ansible-modules-core#3735 against the file module to achieve this behavior, but it was clear that was not the right place for it.
This module offers the same functionality and more, by allowing multiple paths, trees, or patterns to be specified for archival. The common root for all of the files is computed and removed from the archived path names.
The simple use case is to compress a single file:
When
/my/file
exists,/my/file.gz
is created with the gzip-compressed contents of/my/file
. If the/my/file.gz
already exists, it is over-written and the difference in the compressed file size is used to determine whether or not the state has changed. If/my/file.gz
exists and the source file/my/file
does not, the state is consideredcompress
and unchanged./my/file
is removed once compressed, unlessremove=False
is specified.Another use:
Creates a tarball at /my/archive.tgz with the contents of
/my/tree
. Since all of the files in the specified path share a common base path in/my/tree
, the files in the produced archive are relative to/my/tree
. The path/my/tree
is removed unlessremove=False
is specified.