Skip to content
This repository has been archived by the owner on Oct 30, 2018. It is now read-only.

New Module: archive #2323

Merged
merged 22 commits into from
Aug 3, 2016
Merged

New Module: archive #2323

merged 22 commits into from
Aug 3, 2016

Conversation

bendoh
Copy link
Contributor

@bendoh bendoh commented May 27, 2016

ISSUE TYPE
  • New Module Pull Request
COMPONENT NAME

archive

ANSIBLE VERSION
ansible 2.2.0
  config file =
  configured module search path = Default w/o overrides
SUMMARY

This module serves two purposes:

  • Allow management of compressed archives or single files using standard compression formats
  • Act as an analog to the 'unarchive' module which does the opposite.

My particular use case was that I wanted to ensure a large SQL file stayed in a compressed state, and would update when the non-compressed version was restored. I initially opened a PR ansible/ansible-modules-core#3735 against the file module to achieve this behavior, but it was clear that was not the right place for it.

This module offers the same functionality and more, by allowing multiple paths, trees, or patterns to be specified for archival. The common root for all of the files is computed and removed from the archived path names.

The simple use case is to compress a single file:

- name: Compress this file
  archive: path=/my/file

When /my/file exists, /my/file.gz is created with the gzip-compressed contents of /my/file. If the /my/file.gz already exists, it is over-written and the difference in the compressed file size is used to determine whether or not the state has changed. If /my/file.gz exists and the source file /my/file does not, the state is considered compress and unchanged. /my/file is removed once compressed, unless remove=False is specified.

Another use:

- name: Compress this tree
  archive: path=/my/tree creates=/my/archive.tgz

Creates a tarball at /my/archive.tgz with the contents of /my/tree. Since all of the files in the specified path share a common base path in /my/tree, the files in the produced archive are relative to /my/tree. The path /my/tree is removed unless remove=False is specified.

Ben Doherty added 12 commits May 26, 2016 19:54
This manages compressed files or archives of many compressed files. You can maintain or update .gz, .bz2 compressed files, .zip archives, or tarballs compressed with gzip or bzip2.

Possible use cases:

* Back up user home directories
* Ensure large text files are always compressed
* Archive trees for distribution
* Don't include the archive in the archive if it falls within an archived path
* If remove=True and the archive would be in an archived path, fail.
* Fix single-file zip file compression
* Add more documentation about 'state' return
@gregdek
Copy link
Contributor

gregdek commented May 27, 2016

Thanks @bendoh for this new module. When this module receives 'shipit' comments from two community members and any 'needs_revision' comments have been resolved, we will mark for inclusion.

[This message brought to you by your friendly Ansibull-bot.]

compression:
description:
- The type of compression to use. Can be 'gz', 'bz2', or 'zip'.
choices: [ 'gz', 'bz2', 'zip' ]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

default is gz according arg spec.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! got it

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This conflates compression with archive format. Probably do not want to do that. Maybe:

choices: ['tar.gz', 'tar.bz2', 'zip']

Looking into this further, I also realized that this module attempts to guess whether the user wants to archive or compress. We don't want to do that as the user may really require a single file tarball rather than a gz compressed file. I'll discuss that in the issue rather than as a line comment.

Bikeshedding: the parameter name would be more accurately said to "format" or "archive_type" rather than compression.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also may want to support 'tar' (uncompressed tarfiles) here.

@bendoh bendoh changed the title Initial commit of extras/archive module. New Module: archive May 28, 2016
@raben2
Copy link
Contributor

raben2 commented May 31, 2016

I was just testing the module. Seems good but i am from germany and if i run it on a directory in path
which contains vowel mutation Filenames it fails.
Maybe if you handle filenames as strings it could work.

@bendoh
Copy link
Contributor Author

bendoh commented May 31, 2016

Can you provide an example? Thanks for testing! Very helpful.

Ben Doherty added 3 commits May 31, 2016 18:30
* rename archive -> arcfile (where it's a file descriptor)
* additional return
* simplify logic around 'archive?' flag
* maintain os separator after arcroot
* use function instead of lambda for filter, ensure file exists before file.cmp'ing it
* track errored files and fail if there are any
…ules-extras into bendoh-archive-module

* 'bendoh-archive-module' of github.com:bendoh/ansible-modules-extras:
  Add 'default' to docs for 'compression' option
…rent

This fixed a few bugs and simplified the code
@vazhnov
Copy link

vazhnov commented Jun 1, 2016

I think «creates» must works like in module «shell».
So, please use another name for destination path, «dest» for example.

@raben2
Copy link
Contributor

raben2 commented Jun 1, 2016

Sure no problem
my playbook

- hosts: localhost
  connection: local
  gather_facts: false
  become: false
  tasks:
    - archive:
        path: '/home/test/Downloads/*'
        creates: '/home/test/dl.tar.bz2'
        compression: bz2
      register: bzarchive
    - debug: msg="{{ bzarchive }}"

the debug output

Using /etc/ansible/ansible.cfg as config file
Loaded callback default of type stdout, v2.0

PLAYBOOK: test.yaml ************************************************************
1 plays in ./test.yaml

PLAY [localhost] ***************************************************************

TASK [archive] *****************************************************************
task path: /home/test/scripts/ansible-modules/test.yaml:6
Using module file /home/test/scripts/ansible-modules/ds/files/archive.py
<127.0.0.1> ESTABLISH LOCAL CONNECTION FOR USER: test
<127.0.0.1> EXEC /bin/sh -c '( umask 77 && mkdir -p "` echo $HOME/.ansible/tmp/ansible-tmp-1464766245.33-145653721695337 `" && echo ansible-tmp-1464766245.33-145653721695337="` echo $HOME/.ansible/tmp/ansible-tmp-1464766245.33-145653721695337 `" ) && sleep 0'
<127.0.0.1> PUT /tmp/tmpco21zM TO /home/test/.ansible/tmp/ansible-tmp-1464766245.33-145653721695337/archive.py
<127.0.0.1> EXEC /bin/sh -c 'chmod -R u+x /home/test/.ansible/tmp/ansible-tmp-1464766245.33-145653721695337/ && sleep 0'
<127.0.0.1> EXEC /bin/sh -c 'LANG=C LC_ALL=C LC_MESSAGES=C /usr/bin/python /home/test/.ansible/tmp/ansible-tmp-1464766245.33-145653721695337/archive.py; rm -rf "/home/test/.ansible/tmp/ansible-tmp-1464766245.33-145653721695337/" > /dev/null 2>&1 && sleep 0'
changed: [127.0.0.1] => {"archived": [], "arcroot": "/home/test/Downloads/", "changed": true, "creates": "/home/test/dl.tar.bz2", "expanded_paths": ["/home/test/Downloads/ArangoDB-2.8.9.tar.bz2", "/home/test/Downloads/content_pack.json", "/home/test/Downloads/virtualbox-5.0_5.0.20-106931~Ubuntu~wily_amd64.deb", "/home/test/Downloads/linux-headers-4.6.0-040600rc7-generic_4.6.0-040600rc7.201605081830_amd64.deb", "/home/test/Downloads/step1(1).png", "/home/test/Downloads/linux-headers-4.6.0-040600rc7_4.6.0-040600rc7.201605081830_all.deb", "/home/test/Downloads/a.png", "/home/test/Downloads/google-talkplugin_current_amd64.deb", "/home/test/Downloads/graylog-plugin-jira-master.zip", "/home/test/Downloads/graylog-plugin-jira-1.0.0.jar", "/home/test/Downloads/linux-image-4.6.0-040600rc7-generic_4.6.0-040600rc7.201605081830_amd64.deb", "/home/test/Downloads/über.JPG", "/home/test/Downloads/linux-firmware_1.157_all.deb", "/home/test/Downloads/step2.png", "/home/test/Downloads/3030068632725216.pdf", "/home/test/Downloads/Oracle_VM_VirtualBox_Extension_Pack-5.0.20-106931.vbox-extpack", "/home/test/Downloads/step1.png", "/home/test/Downloads/3030068638915006.pdf", "/home/test/Downloads/comonea-gunda-1.22.jar"], "invocation": {"module_args": {"backup": null, "compression": "bz2", "content": null, "creates": "/home/test/dl.tar.bz2", "delimiter": null, "directory_mode": null, "follow": false, "force": null, "group": null, "mode": null, "owner": null, "path": ["/home/test/Downloads/*"], "regexp": null, "remote_src": null, "remove": false, "selevel": null, "serole": null, "setype": null, "seuser": null, "src": null}, "module_name": "archive"}, "missing": [], "state": "archive"}

TASK [debug] *******************************************************************
task path: /home/test/scripts/ansible-modules/test.yaml:11
ok: [127.0.0.1] => {
    "msg": {
        "archived": [], 
        "arcroot": "/home/test/Downloads/", 
        "changed": true, 
        "creates": "/home/test/dl.tar.bz2", 
        "expanded_paths": [
            "/home/test/Downloads/ArangoDB-2.8.9.tar.bz2", 
            "/home/test/Downloads/content_pack.json", 
            "/home/test/Downloads/virtualbox-5.0_5.0.20-106931~Ubuntu~wily_amd64.deb", 
            "/home/test/Downloads/linux-headers-4.6.0-040600rc7-generic_4.6.0-040600rc7.201605081830_amd64.deb", 
            "/home/test/Downloads/step1(1).png", 
            "/home/test/Downloads/linux-headers-4.6.0-040600rc7_4.6.0-040600rc7.201605081830_all.deb", 
            "/home/test/Downloads/a.png", 
            "/home/test/Downloads/google-talkplugin_current_amd64.deb", 
            "/home/test/Downloads/graylog-plugin-jira-master.zip", 
            "/home/test/Downloads/graylog-plugin-jira-1.0.0.jar", 
            "/home/test/Downloads/linux-image-4.6.0-040600rc7-generic_4.6.0-040600rc7.201605081830_amd64.deb", 
            "/home/test/Downloads/über.JPG", 
            "/home/test/Downloads/linux-firmware_1.157_all.deb", 
            "/home/test/Downloads/step2.png", 
            "/home/test/Downloads/3030068632725216.pdf", 
            "/home/test/Downloads/Oracle_VM_VirtualBox_Extension_Pack-5.0.20-106931.vbox-extpack", 
            "/home/test/Downloads/step1.png", 
            "/home/test/Downloads/3030068638915006.pdf", 
            "/home/test/Downloads/comonea-gunda-1.22.jar"
        ], 
        "missing": [], 
        "state": "archive"
    }
}

the resulting file contains only 64 kb

@raben2
Copy link
Contributor

raben2 commented Jun 1, 2016

I also would prefer to using dest instead of creates

@gundalow gundalow merged commit 44948d3 into ansible:devel Aug 3, 2016
expanded_paths = expanded_paths + glob.glob(path)
else:
expanded_paths.append(path)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: There's a difference between these two code paths. Simply appending the path does not check if the path exists. glob.glob() will return an empty list if no files and directories matched the path.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth a comment!

@gundalow
Copy link
Contributor

gundalow commented Aug 3, 2016

@bendoh A few review comments came in after I merged this. Could you please raise a fresh PR to fix those issues?

Thanks for this new module. We like the pure Python implementation.


# If we actually matched multiple files or TRIED to, then
# treat this as a multi-file archive
archive = globby or os.path.isdir(expanded_paths[0]) or len(expanded_paths) > 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

globby is always False here (globby is set to an initial value at the top of the main() function but then never changed. Perhaps it just needs to be removed altogether?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should get set to true when a glob character is detected; so even only one file matches the glob, it will generate an archive instead of a simple compressed file. I'll update that in my future PR...

@bendoh
Copy link
Contributor Author

bendoh commented Aug 3, 2016

Thanks for the feedback, I'll open a new PR with fixes soon.

if i < len(arcroot):
arcroot = os.path.dirname(arcroot[0:i+1])

arcroot += os.sep
Copy link
Contributor

@abadger abadger Aug 3, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

xrange is not python3 compatible. Not sure this does the right thing if, for example, you have expanded_paths = ['/srv/one-two/', '/srv/one-three']

I think it will end up with arcroot = '/srv/one-t/'. Probably need to write some code based around recursively calling os.path.split() or (a little hackier but usually correct) arcroot.split(os.path.sep)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Except that os.path.dirname is used to determine the actual shared parent directory, it's not just extracting substrings from the path!

@abadger
Copy link
Contributor

abadger commented Aug 3, 2016

Couple toplevel things to note as well:

  • style-wise, splitting the one big main() function into separate functions will make it easier to read. Eventually we'll want to unittest modules and separating code which has side effects from code which doesn't will make that easier. But this module has a lot of code that is side effect driven so we may need to leave that until later.
  • This module mixes compression with archiving and attempts to guess whether the user meant one or the other. This isn't a good idea as a ser may want to place a single file into a tar.gz. Couple options:
    • We could make the choice explicit (for instance, if we changed compression to list both tar.gz and gz formats, the code could then recursively gzip all the files it found in the specified paths).
    • We could leave this as archiving only and push uncompressing to its own module.

Right now I favour the latter for a few reasons:

  • Leaving compression out of this module will mirror what unarchive does.
  • I don't see us adding raw uncompress support to the unarchive module, at least anytime soon. unarchive is pretty complex as it is and compression would be an added thing to worry about.
  • There's a module review for uncompress ( Adds uncompress module #1301 and add plugin for uncompress module ansible#13352 ) that just needs a few more wrinkles ironed out. So a compress module will match with that.

So rather than spending too much time trying to figure out how best to change the UI here to handle both archives and simple compressors, we should probably create a second module that handles pure compression and take it out of this one.

@dagwieers
Copy link
Contributor

dagwieers commented Aug 8, 2016

To be honest, I don't think making a distinction between unarchive and uncompress makes a lot of sense to our users. In most (if not all) cases, they want to get the content and not the archive out of a compressed file (if they already understand the difference between an archive and a compressed file, as zip does not make this distinction).

My plan was to deprecate unarchive, and replace it with a pure-python uncompress module since we cannot create a pure-python unarchive module as a drop-in replacement anyhow (extra_opts simply ceases to exist so it breaks stuff).

To summarize, it means that this module would become compress instead, and I welcome this contribution !

@bendoh
Copy link
Contributor Author

bendoh commented Aug 8, 2016

@abadger Thank you for your feedback. I haven't done a lot of work with Python, so I followed the structure of other modules as examples until I felt a bit more comfortable. I'm not too familiar with Python2 vs 3 compatibility issues, either, so thanks for pointing them out. I will work on an update tonight and submit another PR.

I like the idea of having more explicit output format options, so we know exactly what kind of files we get out. In terms of the idea you raise of recursively compressing files within a tree with gzip, I think it would add a bulky chunk of code, but it is definitely worth exploring.

@dagwieers I agree that many users wouldn't really make the distinction between a compressed and an archived file, and it seems like splitting hairs to me. This module attempts to smooth over the differences that do exist between those concepts in a predictable way.

For parity's sake, however, since you intend to deprecate unarchive in favor of the pure-python uncompress, wouldn't it be logical to rename this module to compress? That way the archive/unarchive commands would be logically grouped (as unavailable/deprecated) and the compress/uncompress commands would be similarly grouped, as two pure-python modules. It would then become the practice to use that pair of verbs to describe the concept of "squishing file(s)", and be more coherent to users. The reason I chose archive as the name in the first place was to be an obvious opposite of the existing unarchive command.

@dagwieers
Copy link
Contributor

dagwieers commented Aug 8, 2016

@bendoh That would indeed have my preference, but I don't think my plan was ever vetted by the powers that be ;-) There is some backstory in ansible/ansible-modules-core#3307

BTW I think your archive (or rather compress) module ought to live in ansible-modules-core, next to unarchive (or preferably the new/future pure-python uncompress).

@abadger
Copy link
Contributor

abadger commented Aug 25, 2016

I've added the archive+compress question to the core meeting agenda so we can get an overall plan sorted out before 2.2.

@abadger
Copy link
Contributor

abadger commented Aug 31, 2016

@bendoh @dagwieers We talked about it at yesterdays meeting and it seems that we're okay with either single module or two modules. However we're concerned about the parameters and how they will affect what the user can do. We're not sure if we have time to review the parameters for 2.2 so we're going to debate what our options for 2.2 are at the meeting tomorrow.

@dagwieers
Copy link
Contributor

@abadger Thanks for the heads-up.

This is related to the specifics of archive/compress, but I would also be interested to know what the plan is for a pure-python uncompress module (and the deprecation of unarchive in favor of it). There are some restrictions if we go pure-python, but we would end up with a first-class idempotent and check-mode/diff-mode supporting module. (Not the current bug-ridden implementation that scrapes output :-))

@abadger
Copy link
Contributor

abadger commented Sep 2, 2016

@dagwieers I don't think it's on our radar to write such a thing ourselves but we'd certainly consider deprecating unarchive for such a module if it was contributed.

Regarding archive, at the meeting yesterday we came to the conclusion that no one on the core team has the time to go over the parameters before Feature Freeze. Time may or may not free up after feature freeze and before the 2.2 release candidates start (with many new features come many new bugs :-( We decided that since users rely on the parameters staying the same (or being replaced in a backwards compatible manner) we don't want to ship archive in 2.2 without getting those right. So once stable-2.2 branches, we'll revert archive in that branch but leave it in devel for 2.3. If someone is able to review the parameters and come up with a new design that handles the cornercases (single file archives and multi-file compression) and then we're able to merge code to implement that before the 2.2 rc's start then we can cherry-pick the changes back into 2.2. If not, we'll make every effort to get that done for 2.3.

I'm very sorry that we don't have more time to spend on giving guidance about the parameters now. Thanks for your patience.

@bendoh
Copy link
Contributor Author

bendoh commented Sep 2, 2016

I'm just excited to have been able to contribute! Thanks @abadger, look forward to getting this into 2.3. There are a couple of things I need to clean up for old-python compatibility, in any case. And thank you @dagwieers for all your input.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants