New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewrite s3_sync as aws_s3_sync #39344

Open
wants to merge 4 commits into
base: devel
from

Conversation

@flowerysong
Contributor

flowerysong commented Apr 26, 2018

SUMMARY

While it's usually better than using aws_s3 in a loop, s3_sync is still very inefficient and uses much more time and memory than is necessary. This is a new module because fixing the excessive memory usage requires incompatible changes to the returned JSON, and adding pull mode with support for the standard file parameters conflicts with the 'mode' parameter. This also allowed me to make other parts of the interface more like the aws_s3 and find modules without worrying about backward compatibility.

Core logic changes (fixes #36376, fixes #37959):

  • use list_objects_v2 to find existing S3 objects instead of individual head_object calls on candidate keys
  • don't build and return five slightly different copies of the same file list
  • don't calculate MD5 checksums unless they're actually going to be used for comparison

Other noteworthy changes:

  • pull mode
  • support for regex patterns/excludes
  • revamped MIME type/encoding support (fixes #28973, closes #28969)
  • support for setting extra args via the metadata key
ISSUE TYPE
  • New Module Pull Request
COMPONENT NAME

aws_s3_sync

ANSIBLE VERSION

2.8

ADDITIONAL INFORMATION

The integration tests have been somewhat unstable in my local testing; AWS seems to sporadically return NoSuchBucket for recently created buckets, even if several tasks using that bucket have already completed. This hasn't happened on any of the Shippable runs, however.

Some benchmarks (syncing 9999 files):

Initial sync with no files in s3:
    s3_sync: 84m31.091s
    aws_s3_sync: 31m49.289s

Resync with no changes:
    s3_sync: 15m45.314s
    aws_s3_sync: 0m21.785s
@ansibot

This comment has been minimized.

Contributor

ansibot commented Apr 26, 2018

@ansibot

This comment has been minimized.

Contributor

ansibot commented Apr 26, 2018

The test ansible-test sanity --test pylint [explain] failed with 2 errors:

lib/ansible/modules/cloud/amazon/aws_s3_sync.py:620:34: undefined-variable Undefined variable 'params'
lib/ansible/modules/cloud/amazon/aws_s3_sync.py:626:52: undefined-variable Undefined variable 'params'

click here for bot help

@flowerysong flowerysong force-pushed the flowerysong:aws_s3_sync branch 4 times, most recently Apr 26, 2018

@jborean93 jborean93 removed the needs_triage label Apr 26, 2018

@flowerysong flowerysong force-pushed the flowerysong:aws_s3_sync branch Apr 26, 2018

@ryansb ryansb self-requested a review Apr 27, 2018

@flowerysong flowerysong force-pushed the flowerysong:aws_s3_sync branch Apr 29, 2018

@ansibot

This comment has been minimized.

Contributor

ansibot commented Apr 29, 2018

@ansibot

This comment has been minimized.

Contributor

ansibot commented Apr 29, 2018

The test ansible-test sanity --test ansible-doc --python 2.6 [explain] failed with 1 error:

lib/ansible/modules/cloud/amazon/aws_s3_sync.py:0:0: has a documentation error formatting or is missing documentation.

The test ansible-test sanity --test ansible-doc --python 2.7 [explain] failed with 1 error:

lib/ansible/modules/cloud/amazon/aws_s3_sync.py:0:0: has a documentation error formatting or is missing documentation.

The test ansible-test sanity --test ansible-doc --python 3.5 [explain] failed with 1 error:

lib/ansible/modules/cloud/amazon/aws_s3_sync.py:0:0: has a documentation error formatting or is missing documentation.

The test ansible-test sanity --test ansible-doc --python 3.6 [explain] failed with 1 error:

lib/ansible/modules/cloud/amazon/aws_s3_sync.py:0:0: has a documentation error formatting or is missing documentation.

The test ansible-test sanity --test ansible-doc --python 3.7 [explain] failed with 1 error:

lib/ansible/modules/cloud/amazon/aws_s3_sync.py:0:0: has a documentation error formatting or is missing documentation.

The test ansible-test sanity --test docs-build [explain] failed with the error:

Command "/usr/bin/python test/sanity/code-smell/docs-build.py" returned exit status 1.
>>> Standard Error
Traceback (most recent call last):
  File "test/sanity/code-smell/docs-build.py", line 101, in <module>
    main()
  File "test/sanity/code-smell/docs-build.py", line 17, in main
    raise subprocess.CalledProcessError(sphinx.returncode, cmd, output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['make', 'singlehtmldocs']' returned non-zero exit status 2.

The test ansible-test sanity --test validate-modules [explain] failed with 12 errors:

lib/ansible/modules/cloud/amazon/aws_s3_sync.py:0:0: E324 Value for "default" from the argument_spec ('never') for "overwrite" does not match the documentation (None)
lib/ansible/modules/cloud/amazon/aws_s3_sync.py:0:0: E324 Value for "default" from the argument_spec (True) for "validate_certs" does not match the documentation (False)
lib/ansible/modules/cloud/amazon/aws_s3_sync.py:0:0: E324 Value for "default" from the argument_spec (['size', 'last_modified']) for "diff_attributes" does not match the documentation (None)
lib/ansible/modules/cloud/amazon/aws_s3_sync.py:0:0: E325 argument_spec for "delete" defines type="bool" but documentation does not
lib/ansible/modules/cloud/amazon/aws_s3_sync.py:0:0: E325 argument_spec for "mime_override" defines type="bool" but documentation does not
lib/ansible/modules/cloud/amazon/aws_s3_sync.py:0:0: E325 argument_spec for "mime_strict" defines type="bool" but documentation does not
lib/ansible/modules/cloud/amazon/aws_s3_sync.py:0:0: E325 argument_spec for "use_regex" defines type="bool" but documentation does not
lib/ansible/modules/cloud/amazon/aws_s3_sync.py:0:0: E325 argument_spec for "validate_certs" defines type="bool" but documentation does not
lib/ansible/modules/cloud/amazon/aws_s3_sync.py:0:0: E326 Value for "choices" from the argument_spec (['always', 'never', 'different', 'newer', 'larger']) for "overwrite" does not match the documentation ([])
lib/ansible/modules/cloud/amazon/aws_s3_sync.py:0:0: E326 Value for "choices" from the argument_spec (['private', 'public-read', 'public-read-write', 'authenticated-read', 'aws-exec-read', 'bucket-owner-read', 'bucket-owner-full-control']) for "permission" does not match the documentation ([])
lib/ansible/modules/cloud/amazon/aws_s3_sync.py:0:0: E326 Value for "choices" from the argument_spec (['push', 'pull']) for "direction" does not match the documentation ([])
lib/ansible/modules/cloud/amazon/aws_s3_sync.py:74:71: E302 DOCUMENTATION is not valid YAML

The test ansible-test sanity --test yamllint [explain] failed with 1 error:

lib/ansible/modules/cloud/amazon/aws_s3_sync.py:74:71: error DOCUMENTATION: syntax error: expected <block end>, but found '<scalar>'

click here for bot help

@flowerysong flowerysong force-pushed the flowerysong:aws_s3_sync branch 2 times, most recently Apr 29, 2018

@ansibot ansibot added needs_ci and removed needs_ci labels Apr 29, 2018

@flowerysong flowerysong force-pushed the flowerysong:aws_s3_sync branch 2 times, most recently Apr 30, 2018

@flowerysong flowerysong force-pushed the flowerysong:aws_s3_sync branch from 4063a86 to e397238 Oct 1, 2018

@ansibot ansibot added core_review and removed needs_revision labels Oct 1, 2018

@flowerysong flowerysong force-pushed the flowerysong:aws_s3_sync branch 2 times, most recently from 342f72b to f2dfcda Oct 1, 2018

@flowerysong flowerysong force-pushed the flowerysong:aws_s3_sync branch from f2dfcda to 01b1bba Oct 1, 2018

@ansibot ansibot added the stale_ci label Oct 10, 2018

@flowerysong flowerysong force-pushed the flowerysong:aws_s3_sync branch 3 times, most recently from d6a1aae to 582e12c Oct 13, 2018

@ansibot

This comment has been minimized.

Contributor

ansibot commented Oct 13, 2018

@ansibot ansibot removed the stale_ci label Oct 13, 2018

@flowerysong flowerysong force-pushed the flowerysong:aws_s3_sync branch from 582e12c to 798f876 Oct 13, 2018

flowerysong added some commits Sep 30, 2018

aws_s3: Improve ETag handling
* Extract ETag calculation into a utility function for reuse by
  aws_s3_sync.
* Reduce code duplication in put/get by restructuring the logic
* Only calculate ETag when overwrite == different
* Fail gracefully when overwrite == different and MD5 isn't available
  (e.g. due to FIPS-140-2).
New-ish module: aws_s3_sync
* interface changes to be more consistent with aws_s3 and find
* more efficient: fewer API calls, less memory usage, less CPU usage
* implements pull mode
* improved MIME type/encoding detection

@ansibot ansibot added the stale_ci label Oct 24, 2018

@flowerysong flowerysong force-pushed the flowerysong:aws_s3_sync branch from 798f876 to 507dece Oct 25, 2018

@ansibot ansibot removed the stale_ci label Oct 25, 2018

@ansibot ansibot added the stale_ci label Nov 5, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment