Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

using Boto3 in s3.py #21529

Merged
merged 14 commits into from
Aug 11, 2017
Merged

using Boto3 in s3.py #21529

merged 14 commits into from
Aug 11, 2017

Conversation

s-hertel
Copy link
Contributor

@s-hertel s-hertel commented Feb 16, 2017

ISSUE TYPE
  • Feature Pull Request
COMPONENT NAME

lib/ansible/modules/cloud/amazon/s3.py

ANSIBLE VERSION
ansible 2.3.0 (boto3-s3 13631c256d) last updated 2017/02/16 13:36:21 (GMT -400)
  config file =
  configured module search path = Default w/o overrides
SUMMARY

Updating S3 since all new AWS module pull requests are expected to use boto3. This will also fix signature version bugs (eg #21200)

Closes #23757

@ansibot ansibot added WIP This issue/PR is a work in progress. Nevertheless it was shared for getting input from peers. affects_2.3 This issue/PR affects Ansible v2.3 aws cloud committer_review In order to be merged, this PR must follow the certified review workflow. feature_pull_request module This issue/PR relates to a module. needs_triage Needs a first human triage before being processed. labels Feb 16, 2017
@s-hertel s-hertel changed the title [WIP] using Boto3 with s3.py [WIP] using Boto3 in s3.py Feb 16, 2017
@ryansb ryansb removed the needs_triage Needs a first human triage before being processed. label Feb 16, 2017
@s-hertel s-hertel changed the title [WIP] using Boto3 in s3.py using Boto3 in s3.py Feb 23, 2017
@s-hertel s-hertel removed the WIP This issue/PR is a work in progress. Nevertheless it was shared for getting input from peers. label Feb 23, 2017
@ansibot ansibot added needs_revision This PR fails CI tests or a maintainer has requested a review/revision of the PR. and removed committer_review In order to be merged, this PR must follow the certified review workflow. labels Feb 24, 2017
@mattclay
Copy link
Member

CI failure due to PEP 8 issue:

2017-02-24 22:01:00 ERROR: PEP 8: lib/ansible/modules/cloud/amazon/s3.py: Passes current rule set. Remove from legacy list (test/sanity/pep8/legacy-files.txt).

This means you've resolved outstanding PEP 8 issues in your PR and the file no longer needs to be listed in legacy files list. Just remove it from that file as part of your PR.

You can run PEP 8 tests locally with make pep8.

@mattclay mattclay added the ci_verified Changes made in this PR are causing tests to fail. label Feb 25, 2017
@ansibot ansibot added committer_review In order to be merged, this PR must follow the certified review workflow. needs_revision This PR fails CI tests or a maintainer has requested a review/revision of the PR. and removed ci_verified Changes made in this PR are causing tests to fail. needs_revision This PR fails CI tests or a maintainer has requested a review/revision of the PR. committer_review In order to be merged, this PR must follow the certified review workflow. labels Feb 27, 2017
endpoint=walrus
**aws_connect_kwargs
)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CI failure due to PEP 8 issue:

2017-02-27 20:21:42 ERROR: PEP 8: lib/ansible/modules/cloud/amazon/s3.py:807:1: W293 blank line contains whitespace (current)

The PEP 8 tests can be run locally with make pep8.


try:
import boto
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this module no longer depends on boto, you should update the unit tests to reflect that.

You'll also want to remove boto from the unit test requirements.

Copy link
Contributor Author

@s-hertel s-hertel Feb 28, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mattclay Fixed!

@mattclay mattclay added the ci_verified Changes made in this PR are causing tests to fail. label Feb 28, 2017
@ansibot ansibot added needs_ci This PR requires CI testing to be performed. Please close and re-open this PR to trigger CI. and removed ci_verified Changes made in this PR are causing tests to fail. labels Feb 28, 2017
@s-hertel s-hertel force-pushed the boto3-s3 branch 3 times, most recently from ad226ab to 89facbc Compare February 28, 2017 18:35

def list_keys(module, s3, bucket, prefix, marker, max_keys):
paginator = s3.get_paginator('list_objects')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch!!


def list_keys(module, s3, bucket, prefix, marker, max_keys):
paginator = s3.get_paginator('list_objects')
all_keys = [page for page in paginator.paginate(Bucket=bucket)][0].get('Contents', [])
Copy link
Contributor

@willthames willthames Aug 7, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two lines can be simplified (untested, but should work):

keys = [data['Key'] for page in paginator.paginate(Bucket=bucket) for data in page.get('Contents', [])]

As @ryansb appeared to mention in a comment I can't currently see, this seems only to get the first page.

Edit: I can see the comment in the conversation tab, but not in the code tab.

module.fail_json(msg=str(e), exception=traceback.format_exc())
try:
bucket.delete()
paginator = s3.get_paginator('list_objects')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comments as for list_keys

Fix incorrect conditional.

Remove redundant variable assignment.

Fix s3 list_object pagination to return all pages
@s-hertel
Copy link
Contributor Author

s-hertel commented Aug 7, 2017

ready_for_review
Thanks for all the feedback! Pagination is now working correctly and the integration tests are passing locally.

Copy link
Contributor

@willthames willthames left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be merged without further changes, but it might be worth considering retries at the very least.

all_keys = bucket_object.get_all_keys(prefix=prefix, marker=marker, max_keys=max_keys)
def paginated_list(s3, bucket):
pg = s3.get_paginator('list_objects_v2')
for page in pg.paginate(Bucket=bucket):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can do return pg.paginate(Bucket=bucket).build_full_result()

Not a blocker though

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need an AWSRetry wrapper around this method?

Copy link
Contributor Author

@s-hertel s-hertel Aug 8, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'm going to keep the pagination here as-is so I don't have to filter the results both places where I'm calling this function (since all I want are the key names). Good idea about AWSRetry! I'm making a follow-up PR for that since I don't want to overload this one.


def list_keys(module, s3, bucket, prefix, marker, max_keys):
keys = [key for key in paginated_list(s3, bucket)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this have some exception handling? (I suggest here rather than paginated_list as paginated_list might not be able to handle exceptions if it does the retry)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Totally missed that - good catch.

Also remembered to allow marker/prefix/max_keys to modify what keys are listed
try:
bucket.delete()
# if there are contents then we need to delete them before we can delete the bucket
keys = [{'Key': key} for key in paginated_list(s3, **{'Bucket': bucket})]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

**{'Bucket': bucket} is equivalent to Bucket=bucket. Please use the latter :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed :)


module.exit_json(msg="LIST operation complete", s3_keys=keys)
def list_keys(module, s3, bucket, prefix, marker, max_keys):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function seems to be much more complicated than it needs to be. Does anything call this function with non-trivial values for prefix, marker or max_keys? (I'm guessing previously the function called itself to get the next page).

I would argue for using paginator with build_full_result in list_keys_with_backoff and then the calling functions (delete_keys etc.) can just use that directly rather than having to manage the page combination themselves.

Copy link
Contributor Author

@s-hertel s-hertel Aug 11, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hrm... I'm not sure if I understand what I should be changing.

The user can specify whatever they want for marker or prefix or max_keys, right? I'm not sure what a non-trivial value would be. Previously this function did not call itself to get the next page - pagination wasn't supported at all. So it's true this function has become more complicated. If there's a more elegant way though I'd definitely like to understand.

If I use build_full_result() then I'll have to iterate through that anyway and pull out all the keys so it doesn't seem very different than what I'm doing now. I will implement that if you have a strong preference, but I'm not sure what the benefit is.

I'm a little confused about the comment about delete_keys. list_keys() and delete_bucket() are calling a function that does the pagination. Is the issue that I'm only getting all the keys rather than all the contents?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By non-trivial I just mean values that aren't None or empty strings. I'm not sure how much user control we expect over those settings but I might not have read the parameters carefully enough.

The following untested somewhat pseudocode illustrates the simpler approach:

@AWSRetry(**backoff_params)
def list_keys_with_backoff(connection, bucket):
     pg = connection.get_paginator('list_objects_v2')    
     return [obj['Key'] for obj in pg.paginate(Bucket=bucket).build_full_result()['Objects']]

def list_keys(connection, bucket):
    try:
        return list_keys_with_backoff(connection, bucket)
    except botocore.exceptions.ClientError as e:
        etc...

@ryansb ryansb merged commit 1de91a9 into ansible:devel Aug 11, 2017
@s-hertel s-hertel deleted the boto3-s3 branch September 28, 2017 19:16
@ansibot ansibot added feature This issue/PR relates to a feature request. and removed feature_pull_request labels Mar 4, 2018
@dagwieers dagwieers added the fortios Fortios community label Feb 22, 2019
@ansible ansible locked and limited conversation to collaborators Apr 26, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
affects_2.3 This issue/PR affects Ansible v2.3 affects_2.4 This issue/PR affects Ansible v2.4 aws cloud feature This issue/PR relates to a feature request. fortios Fortios community module This issue/PR relates to a module. needs_revision This PR fails CI tests or a maintainer has requested a review/revision of the PR. support:core This issue/PR relates to code supported by the Ansible Engineering Team.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Ansible 2.3 - S3 module - Bucket not found
10 participants