-
Notifications
You must be signed in to change notification settings - Fork 23.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
using Boto3 in s3.py #21529
using Boto3 in s3.py #21529
Conversation
CI failure due to PEP 8 issue:
This means you've resolved outstanding PEP 8 issues in your PR and the file no longer needs to be listed in legacy files list. Just remove it from that file as part of your PR. You can run PEP 8 tests locally with |
endpoint=walrus | ||
**aws_connect_kwargs | ||
) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CI failure due to PEP 8 issue:
2017-02-27 20:21:42 ERROR: PEP 8: lib/ansible/modules/cloud/amazon/s3.py:807:1: W293 blank line contains whitespace (current)
The PEP 8 tests can be run locally with make pep8
.
|
||
try: | ||
import boto |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this module no longer depends on boto, you should update the unit tests to reflect that.
You'll also want to remove boto
from the unit test requirements.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mattclay Fixed!
ad226ab
to
89facbc
Compare
|
||
def list_keys(module, s3, bucket, prefix, marker, max_keys): | ||
paginator = s3.get_paginator('list_objects') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch!!
|
||
def list_keys(module, s3, bucket, prefix, marker, max_keys): | ||
paginator = s3.get_paginator('list_objects') | ||
all_keys = [page for page in paginator.paginate(Bucket=bucket)][0].get('Contents', []) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These two lines can be simplified (untested, but should work):
keys = [data['Key'] for page in paginator.paginate(Bucket=bucket) for data in page.get('Contents', [])]
As @ryansb appeared to mention in a comment I can't currently see, this seems only to get the first page.
Edit: I can see the comment in the conversation tab, but not in the code tab.
module.fail_json(msg=str(e), exception=traceback.format_exc()) | ||
try: | ||
bucket.delete() | ||
paginator = s3.get_paginator('list_objects') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comments as for list_keys
make s3 pep8 and remove from legacy files fix s3 unit tests
…umentation fix incorrectly documented defaults
Fix logic and use head_object instead of get_object for efficiency. Fix typo in unit test.
Fix incorrect conditional. Remove redundant variable assignment. Fix s3 list_object pagination to return all pages
ready_for_review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be merged without further changes, but it might be worth considering retries at the very least.
all_keys = bucket_object.get_all_keys(prefix=prefix, marker=marker, max_keys=max_keys) | ||
def paginated_list(s3, bucket): | ||
pg = s3.get_paginator('list_objects_v2') | ||
for page in pg.paginate(Bucket=bucket): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can do return pg.paginate(Bucket=bucket).build_full_result()
Not a blocker though
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need an AWSRetry
wrapper around this method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I'm going to keep the pagination here as-is so I don't have to filter the results both places where I'm calling this function (since all I want are the key names). Good idea about AWSRetry! I'm making a follow-up PR for that since I don't want to overload this one.
|
||
def list_keys(module, s3, bucket, prefix, marker, max_keys): | ||
keys = [key for key in paginated_list(s3, bucket)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this have some exception handling? (I suggest here rather than paginated_list as paginated_list might not be able to handle exceptions if it does the retry)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Totally missed that - good catch.
Also remembered to allow marker/prefix/max_keys to modify what keys are listed
try: | ||
bucket.delete() | ||
# if there are contents then we need to delete them before we can delete the bucket | ||
keys = [{'Key': key} for key in paginated_list(s3, **{'Bucket': bucket})] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
**{'Bucket': bucket}
is equivalent to Bucket=bucket
. Please use the latter :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed :)
|
||
module.exit_json(msg="LIST operation complete", s3_keys=keys) | ||
def list_keys(module, s3, bucket, prefix, marker, max_keys): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function seems to be much more complicated than it needs to be. Does anything call this function with non-trivial values for prefix, marker or max_keys? (I'm guessing previously the function called itself to get the next page).
I would argue for using paginator with build_full_result in list_keys_with_backoff and then the calling functions (delete_keys
etc.) can just use that directly rather than having to manage the page combination themselves.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hrm... I'm not sure if I understand what I should be changing.
The user can specify whatever they want for marker or prefix or max_keys, right? I'm not sure what a non-trivial value would be. Previously this function did not call itself to get the next page - pagination wasn't supported at all. So it's true this function has become more complicated. If there's a more elegant way though I'd definitely like to understand.
If I use build_full_result() then I'll have to iterate through that anyway and pull out all the keys so it doesn't seem very different than what I'm doing now. I will implement that if you have a strong preference, but I'm not sure what the benefit is.
I'm a little confused about the comment about delete_keys. list_keys() and delete_bucket() are calling a function that does the pagination. Is the issue that I'm only getting all the keys rather than all the contents?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By non-trivial I just mean values that aren't None or empty strings. I'm not sure how much user control we expect over those settings but I might not have read the parameters carefully enough.
The following untested somewhat pseudocode illustrates the simpler approach:
@AWSRetry(**backoff_params)
def list_keys_with_backoff(connection, bucket):
pg = connection.get_paginator('list_objects_v2')
return [obj['Key'] for obj in pg.paginate(Bucket=bucket).build_full_result()['Objects']]
def list_keys(connection, bucket):
try:
return list_keys_with_backoff(connection, bucket)
except botocore.exceptions.ClientError as e:
etc...
ISSUE TYPE
COMPONENT NAME
lib/ansible/modules/cloud/amazon/s3.py
ANSIBLE VERSION
SUMMARY
Updating S3 since all new AWS module pull requests are expected to use boto3. This will also fix signature version bugs (eg #21200)
Closes #23757