New Module: archive #2323

bendoh · 2016-05-27T00:09:13Z

ISSUE TYPE

New Module Pull Request

COMPONENT NAME

ANSIBLE VERSION

ansible 2.2.0
  config file =
  configured module search path = Default w/o overrides

SUMMARY

This module serves two purposes:

Allow management of compressed archives or single files using standard compression formats
Act as an analog to the 'unarchive' module which does the opposite.

My particular use case was that I wanted to ensure a large SQL file stayed in a compressed state, and would update when the non-compressed version was restored. I initially opened a PR ansible/ansible-modules-core#3735 against the file module to achieve this behavior, but it was clear that was not the right place for it.

This module offers the same functionality and more, by allowing multiple paths, trees, or patterns to be specified for archival. The common root for all of the files is computed and removed from the archived path names.

The simple use case is to compress a single file:

- name: Compress this file
  archive: path=/my/file

When /my/file exists, /my/file.gz is created with the gzip-compressed contents of /my/file. If the /my/file.gz already exists, it is over-written and the difference in the compressed file size is used to determine whether or not the state has changed. If /my/file.gz exists and the source file /my/file does not, the state is considered compress and unchanged. /my/file is removed once compressed, unless remove=False is specified.

Another use:

- name: Compress this tree
  archive: path=/my/tree creates=/my/archive.tgz

Creates a tarball at /my/archive.tgz with the contents of /my/tree. Since all of the files in the specified path share a common base path in /my/tree, the files in the produced archive are relative to /my/tree. The path /my/tree is removed unless remove=False is specified.

This manages compressed files or archives of many compressed files. You can maintain or update .gz, .bz2 compressed files, .zip archives, or tarballs compressed with gzip or bzip2. Possible use cases: * Back up user home directories * Ensure large text files are always compressed * Archive trees for distribution

* Don't include the archive in the archive if it falls within an archived path * If remove=True and the archive would be in an archived path, fail. * Fix single-file zip file compression * Add more documentation about 'state' return

Use correct variable in expanduser

gregdek · 2016-05-27T04:25:08Z

Thanks @bendoh for this new module. When this module receives 'shipit' comments from two community members and any 'needs_revision' comments have been resolved, we will mark for inclusion.

[This message brought to you by your friendly Ansibull-bot.]

resmo · 2016-05-28T12:11:06Z

files/archive.py

+  compression:
+    description:
+      - The type of compression to use. Can be 'gz', 'bz2', or 'zip'.
+    choices: [ 'gz', 'bz2', 'zip' ]


default is gz according arg spec.

Thanks! got it

This conflates compression with archive format. Probably do not want to do that. Maybe:

choices: ['tar.gz', 'tar.bz2', 'zip']

Looking into this further, I also realized that this module attempts to guess whether the user wants to archive or compress. We don't want to do that as the user may really require a single file tarball rather than a gz compressed file. I'll discuss that in the issue rather than as a line comment.

Bikeshedding: the parameter name would be more accurately said to "format" or "archive_type" rather than compression.

Also may want to support 'tar' (uncompressed tarfiles) here.

raben2 · 2016-05-31T21:17:19Z

I was just testing the module. Seems good but i am from germany and if i run it on a directory in path
which contains vowel mutation Filenames it fails.
Maybe if you handle filenames as strings it could work.

bendoh · 2016-05-31T21:22:46Z

Can you provide an example? Thanks for testing! Very helpful.

* rename archive -> arcfile (where it's a file descriptor) * additional return * simplify logic around 'archive?' flag * maintain os separator after arcroot * use function instead of lambda for filter, ensure file exists before file.cmp'ing it * track errored files and fail if there are any

…ules-extras into bendoh-archive-module * 'bendoh-archive-module' of github.com:bendoh/ansible-modules-extras: Add 'default' to docs for 'compression' option

…rent This fixed a few bugs and simplified the code

vazhnov · 2016-06-01T06:19:08Z

I think «creates» must works like in module «shell».
So, please use another name for destination path, «dest» for example.

raben2 · 2016-06-01T07:38:07Z

Sure no problem
my playbook

- hosts: localhost
  connection: local
  gather_facts: false
  become: false
  tasks:
    - archive:
        path: '/home/test/Downloads/*'
        creates: '/home/test/dl.tar.bz2'
        compression: bz2
      register: bzarchive
    - debug: msg="{{ bzarchive }}"

the debug output

Using /etc/ansible/ansible.cfg as config file
Loaded callback default of type stdout, v2.0

PLAYBOOK: test.yaml ************************************************************
1 plays in ./test.yaml

PLAY [localhost] ***************************************************************

TASK [archive] *****************************************************************
task path: /home/test/scripts/ansible-modules/test.yaml:6
Using module file /home/test/scripts/ansible-modules/ds/files/archive.py
<127.0.0.1> ESTABLISH LOCAL CONNECTION FOR USER: test
<127.0.0.1> EXEC /bin/sh -c '( umask 77 && mkdir -p "` echo $HOME/.ansible/tmp/ansible-tmp-1464766245.33-145653721695337 `" && echo ansible-tmp-1464766245.33-145653721695337="` echo $HOME/.ansible/tmp/ansible-tmp-1464766245.33-145653721695337 `" ) && sleep 0'
<127.0.0.1> PUT /tmp/tmpco21zM TO /home/test/.ansible/tmp/ansible-tmp-1464766245.33-145653721695337/archive.py
<127.0.0.1> EXEC /bin/sh -c 'chmod -R u+x /home/test/.ansible/tmp/ansible-tmp-1464766245.33-145653721695337/ && sleep 0'
<127.0.0.1> EXEC /bin/sh -c 'LANG=C LC_ALL=C LC_MESSAGES=C /usr/bin/python /home/test/.ansible/tmp/ansible-tmp-1464766245.33-145653721695337/archive.py; rm -rf "/home/test/.ansible/tmp/ansible-tmp-1464766245.33-145653721695337/" > /dev/null 2>&1 && sleep 0'
changed: [127.0.0.1] => {"archived": [], "arcroot": "/home/test/Downloads/", "changed": true, "creates": "/home/test/dl.tar.bz2", "expanded_paths": ["/home/test/Downloads/ArangoDB-2.8.9.tar.bz2", "/home/test/Downloads/content_pack.json", "/home/test/Downloads/virtualbox-5.0_5.0.20-106931~Ubuntu~wily_amd64.deb", "/home/test/Downloads/linux-headers-4.6.0-040600rc7-generic_4.6.0-040600rc7.201605081830_amd64.deb", "/home/test/Downloads/step1(1).png", "/home/test/Downloads/linux-headers-4.6.0-040600rc7_4.6.0-040600rc7.201605081830_all.deb", "/home/test/Downloads/a.png", "/home/test/Downloads/google-talkplugin_current_amd64.deb", "/home/test/Downloads/graylog-plugin-jira-master.zip", "/home/test/Downloads/graylog-plugin-jira-1.0.0.jar", "/home/test/Downloads/linux-image-4.6.0-040600rc7-generic_4.6.0-040600rc7.201605081830_amd64.deb", "/home/test/Downloads/über.JPG", "/home/test/Downloads/linux-firmware_1.157_all.deb", "/home/test/Downloads/step2.png", "/home/test/Downloads/3030068632725216.pdf", "/home/test/Downloads/Oracle_VM_VirtualBox_Extension_Pack-5.0.20-106931.vbox-extpack", "/home/test/Downloads/step1.png", "/home/test/Downloads/3030068638915006.pdf", "/home/test/Downloads/comonea-gunda-1.22.jar"], "invocation": {"module_args": {"backup": null, "compression": "bz2", "content": null, "creates": "/home/test/dl.tar.bz2", "delimiter": null, "directory_mode": null, "follow": false, "force": null, "group": null, "mode": null, "owner": null, "path": ["/home/test/Downloads/*"], "regexp": null, "remote_src": null, "remove": false, "selevel": null, "serole": null, "setype": null, "seuser": null, "src": null}, "module_name": "archive"}, "missing": [], "state": "archive"}

TASK [debug] *******************************************************************
task path: /home/test/scripts/ansible-modules/test.yaml:11
ok: [127.0.0.1] => {
    "msg": {
        "archived": [], 
        "arcroot": "/home/test/Downloads/", 
        "changed": true, 
        "creates": "/home/test/dl.tar.bz2", 
        "expanded_paths": [
            "/home/test/Downloads/ArangoDB-2.8.9.tar.bz2", 
            "/home/test/Downloads/content_pack.json", 
            "/home/test/Downloads/virtualbox-5.0_5.0.20-106931~Ubuntu~wily_amd64.deb", 
            "/home/test/Downloads/linux-headers-4.6.0-040600rc7-generic_4.6.0-040600rc7.201605081830_amd64.deb", 
            "/home/test/Downloads/step1(1).png", 
            "/home/test/Downloads/linux-headers-4.6.0-040600rc7_4.6.0-040600rc7.201605081830_all.deb", 
            "/home/test/Downloads/a.png", 
            "/home/test/Downloads/google-talkplugin_current_amd64.deb", 
            "/home/test/Downloads/graylog-plugin-jira-master.zip", 
            "/home/test/Downloads/graylog-plugin-jira-1.0.0.jar", 
            "/home/test/Downloads/linux-image-4.6.0-040600rc7-generic_4.6.0-040600rc7.201605081830_amd64.deb", 
            "/home/test/Downloads/über.JPG", 
            "/home/test/Downloads/linux-firmware_1.157_all.deb", 
            "/home/test/Downloads/step2.png", 
            "/home/test/Downloads/3030068632725216.pdf", 
            "/home/test/Downloads/Oracle_VM_VirtualBox_Extension_Pack-5.0.20-106931.vbox-extpack", 
            "/home/test/Downloads/step1.png", 
            "/home/test/Downloads/3030068638915006.pdf", 
            "/home/test/Downloads/comonea-gunda-1.22.jar"
        ], 
        "missing": [], 
        "state": "archive"
    }
}

the resulting file contains only 64 kb

raben2 · 2016-06-01T07:40:27Z

I also would prefer to using dest instead of creates

abadger · 2016-08-03T18:19:54Z

files/archive.py

+            expanded_paths = expanded_paths + glob.glob(path)
+        else:
+            expanded_paths.append(path)
+


Note: There's a difference between these two code paths. Simply appending the path does not check if the path exists. glob.glob() will return an empty list if no files and directories matched the path.

Worth a comment!

gundalow · 2016-08-03T18:25:32Z

@bendoh A few review comments came in after I merged this. Could you please raise a fresh PR to fix those issues?

Thanks for this new module. We like the pure Python implementation.

abadger · 2016-08-03T18:26:37Z

files/archive.py

+
+    # If we actually matched multiple files or TRIED to, then
+    # treat this as a multi-file archive
+    archive = globby or os.path.isdir(expanded_paths[0]) or len(expanded_paths) > 1


globby is always False here (globby is set to an initial value at the top of the main() function but then never changed. Perhaps it just needs to be removed altogether?)

It should get set to true when a glob character is detected; so even only one file matches the glob, it will generate an archive instead of a simple compressed file. I'll update that in my future PR...

bendoh · 2016-08-03T19:10:53Z

Thanks for the feedback, I'll open a new PR with fixes soon.

abadger · 2016-08-03T19:14:32Z

files/archive.py

+            if i < len(arcroot):
+                arcroot = os.path.dirname(arcroot[0:i+1])
+
+            arcroot += os.sep


xrange is not python3 compatible. Not sure this does the right thing if, for example, you have expanded_paths = ['/srv/one-two/', '/srv/one-three']

I think it will end up with arcroot = '/srv/one-t/'. Probably need to write some code based around recursively calling os.path.split() or (a little hackier but usually correct) arcroot.split(os.path.sep)

Except that os.path.dirname is used to determine the actual shared parent directory, it's not just extracting substrings from the path!

abadger · 2016-08-03T20:22:41Z

Couple toplevel things to note as well:

style-wise, splitting the one big main() function into separate functions will make it easier to read. Eventually we'll want to unittest modules and separating code which has side effects from code which doesn't will make that easier. But this module has a lot of code that is side effect driven so we may need to leave that until later.
This module mixes compression with archiving and attempts to guess whether the user meant one or the other. This isn't a good idea as a ser may want to place a single file into a tar.gz. Couple options:
- We could make the choice explicit (for instance, if we changed compression to list both tar.gz and gz formats, the code could then recursively gzip all the files it found in the specified paths).
- We could leave this as archiving only and push uncompressing to its own module.

Right now I favour the latter for a few reasons:

Leaving compression out of this module will mirror what unarchive does.
I don't see us adding raw uncompress support to the unarchive module, at least anytime soon. unarchive is pretty complex as it is and compression would be an added thing to worry about.
There's a module review for uncompress ( Adds uncompress module #1301 and add plugin for uncompress module ansible#13352 ) that just needs a few more wrinkles ironed out. So a compress module will match with that.

So rather than spending too much time trying to figure out how best to change the UI here to handle both archives and simple compressors, we should probably create a second module that handles pure compression and take it out of this one.

dagwieers · 2016-08-08T22:06:53Z

To be honest, I don't think making a distinction between unarchive and uncompress makes a lot of sense to our users. In most (if not all) cases, they want to get the content and not the archive out of a compressed file (if they already understand the difference between an archive and a compressed file, as zip does not make this distinction).

My plan was to deprecate unarchive, and replace it with a pure-python uncompress module since we cannot create a pure-python unarchive module as a drop-in replacement anyhow (extra_opts simply ceases to exist so it breaks stuff).

To summarize, it means that this module would become compress instead, and I welcome this contribution !

bendoh · 2016-08-08T22:21:13Z

@abadger Thank you for your feedback. I haven't done a lot of work with Python, so I followed the structure of other modules as examples until I felt a bit more comfortable. I'm not too familiar with Python2 vs 3 compatibility issues, either, so thanks for pointing them out. I will work on an update tonight and submit another PR.

I like the idea of having more explicit output format options, so we know exactly what kind of files we get out. In terms of the idea you raise of recursively compressing files within a tree with gzip, I think it would add a bulky chunk of code, but it is definitely worth exploring.

@dagwieers I agree that many users wouldn't really make the distinction between a compressed and an archived file, and it seems like splitting hairs to me. This module attempts to smooth over the differences that do exist between those concepts in a predictable way.

For parity's sake, however, since you intend to deprecate unarchive in favor of the pure-python uncompress, wouldn't it be logical to rename this module to compress? That way the archive/unarchive commands would be logically grouped (as unavailable/deprecated) and the compress/uncompress commands would be similarly grouped, as two pure-python modules. It would then become the practice to use that pair of verbs to describe the concept of "squishing file(s)", and be more coherent to users. The reason I chose archive as the name in the first place was to be an obvious opposite of the existing unarchive command.

dagwieers · 2016-08-08T22:27:10Z

@bendoh That would indeed have my preference, but I don't think my plan was ever vetted by the powers that be ;-) There is some backstory in ansible/ansible-modules-core#3307

BTW I think your archive (or rather compress) module ought to live in ansible-modules-core, next to unarchive (or preferably the new/future pure-python uncompress).

abadger · 2016-08-25T19:35:34Z

I've added the archive+compress question to the core meeting agenda so we can get an overall plan sorted out before 2.2.

abadger · 2016-08-31T16:24:09Z

@bendoh @dagwieers We talked about it at yesterdays meeting and it seems that we're okay with either single module or two modules. However we're concerned about the parameters and how they will affect what the user can do. We're not sure if we have time to review the parameters for 2.2 so we're going to debate what our options for 2.2 are at the meeting tomorrow.

dagwieers · 2016-08-31T17:04:20Z

@abadger Thanks for the heads-up.

This is related to the specifics of archive/compress, but I would also be interested to know what the plan is for a pure-python uncompress module (and the deprecation of unarchive in favor of it). There are some restrictions if we go pure-python, but we would end up with a first-class idempotent and check-mode/diff-mode supporting module. (Not the current bug-ridden implementation that scrapes output :-))

abadger · 2016-09-02T15:25:49Z

@dagwieers I don't think it's on our radar to write such a thing ourselves but we'd certainly consider deprecating unarchive for such a module if it was contributed.

Regarding archive, at the meeting yesterday we came to the conclusion that no one on the core team has the time to go over the parameters before Feature Freeze. Time may or may not free up after feature freeze and before the 2.2 release candidates start (with many new features come many new bugs :-( We decided that since users rely on the parameters staying the same (or being replaced in a backwards compatible manner) we don't want to ship archive in 2.2 without getting those right. So once stable-2.2 branches, we'll revert archive in that branch but leave it in devel for 2.3. If someone is able to review the parameters and come up with a new design that handles the cornercases (single file archives and multi-file compression) and then we're able to merge code to implement that before the 2.2 rc's start then we can cherry-pick the changes back into 2.2. If not, we'll make every effort to get that done for 2.3.

I'm very sorry that we don't have more time to spend on giving guidance about the parameters now. Thanks for your patience.

bendoh · 2016-09-02T15:28:33Z

I'm just excited to have been able to contribute! Thanks @abadger, look forward to getting this into 2.3. There are a couple of things I need to clean up for old-python compatibility, in any case. And thank you @dagwieers for all your input.

Ben Doherty added 12 commits May 26, 2016 19:54

Drop extra double-quote from documentation

431d8c9

Fix write mode for ZipFiles ('wb' is invalid!)

e9b8532

Add RETURN documentation

9cde150

Add compressed file source to successes when succeeds!

ecd60f4

Add license

f482cb4

Fix up for zip files and nesting logic.

cca70b7

* Don't include the archive in the archive if it falls within an archived path * If remove=True and the archive would be in an archived path, fail. * Fix single-file zip file compression * Add more documentation about 'state' return

Accept 'path' as a list argument, expose path and expanded_path,

d3e041d

Use correct variable in expanduser

Don't use <x> if <y> else <z> syntax

6db9caf

Make remove default to false. It's less frightening.

ae35ce5

Use different syntax in lambda

20bfb13

Documentation updates

6e0aac8

gregdek added new_plugin community_review labels May 27, 2016

resmo reviewed May 28, 2016
View reviewed changes

Add 'default' to docs for 'compression' option

ef620c7

bendoh changed the title ~~Initial commit of extras/archive module.~~ New Module: archive May 28, 2016

Ben Doherty added 3 commits May 31, 2016 18:30

Reword comments slightly

d5e861b

Merge branch 'bendoh-archive-module' of github.com:bendoh/ansible-mod…

ec856c8

…ules-extras into bendoh-archive-module * 'bendoh-archive-module' of github.com:bendoh/ansible-modules-extras: Add 'default' to docs for 'compression' option

bendoh force-pushed the bendoh-archive-module branch from ec856c8 to ef620c7 Compare June 1, 2016 03:09

Refactor zip and tarfile loops together, branch where calls are diffe…

0a056ec

…rent This fixed a few bugs and simplified the code

Ben Doherty added 2 commits June 1, 2016 08:16

Change 'creates' parameter to 'dest'

b57b047

Rename 'archive' -> 'arcfile' in compress branch

b9971f1

gundalow merged commit 44948d3 into ansible:devel Aug 3, 2016

abadger reviewed Aug 3, 2016
View reviewed changes

bendoh mentioned this pull request Aug 9, 2016

Updates to archive module based on code review #2699

Merged

abadger mentioned this pull request Aug 25, 2016

Agenda for Public Core Meeting - 2016-08-25 15:00 UTC ansible/community#120

Closed

abadger mentioned this pull request Aug 31, 2016

Agenda for Public Core Meeting - 2016-09-01 15:00 UTC ansible/community#121

Closed

felixfontein mentioned this pull request Oct 20, 2022

archive: better expose requirements ansible-collections/community.general#5392

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New Module: archive #2323

New Module: archive #2323

bendoh commented May 27, 2016 •

edited

Loading

gregdek commented May 27, 2016

resmo May 28, 2016

bendoh May 28, 2016

abadger Aug 3, 2016

abadger Aug 3, 2016

raben2 commented May 31, 2016

bendoh commented May 31, 2016

vazhnov commented Jun 1, 2016

raben2 commented Jun 1, 2016

raben2 commented Jun 1, 2016

abadger Aug 3, 2016

bendoh Aug 9, 2016

gundalow commented Aug 3, 2016

abadger Aug 3, 2016

bendoh Aug 3, 2016

bendoh commented Aug 3, 2016

abadger Aug 3, 2016 •

edited

Loading

bendoh Aug 9, 2016

abadger commented Aug 3, 2016

dagwieers commented Aug 8, 2016 •

edited

Loading

bendoh commented Aug 8, 2016

dagwieers commented Aug 8, 2016 •

edited

Loading

abadger commented Aug 25, 2016

abadger commented Aug 31, 2016

dagwieers commented Aug 31, 2016

abadger commented Sep 2, 2016

bendoh commented Sep 2, 2016

New Module: archive #2323

New Module: archive #2323

Conversation

bendoh commented May 27, 2016 • edited Loading

ISSUE TYPE

COMPONENT NAME

ANSIBLE VERSION

SUMMARY

gregdek commented May 27, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

raben2 commented May 31, 2016

bendoh commented May 31, 2016

vazhnov commented Jun 1, 2016

raben2 commented Jun 1, 2016

raben2 commented Jun 1, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gundalow commented Aug 3, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bendoh commented Aug 3, 2016

abadger Aug 3, 2016 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

abadger commented Aug 3, 2016

dagwieers commented Aug 8, 2016 • edited Loading

bendoh commented Aug 8, 2016

dagwieers commented Aug 8, 2016 • edited Loading

abadger commented Aug 25, 2016

abadger commented Aug 31, 2016

dagwieers commented Aug 31, 2016

abadger commented Sep 2, 2016

bendoh commented Sep 2, 2016

bendoh commented May 27, 2016 •

edited

Loading

abadger Aug 3, 2016 •

edited

Loading

dagwieers commented Aug 8, 2016 •

edited

Loading

dagwieers commented Aug 8, 2016 •

edited

Loading