Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"ExtraArgs" option is not available while uploading a file/folder to S3 via CloudPath #254

Closed
gaisensei opened this issue Aug 18, 2022 · 3 comments · Fixed by #307
Closed

Comments

@gaisensei
Copy link

If users want to add User defined metadata tags while uploading to S3 they should be able to do so, but currently can't find a way for this using CloudPath function upload_from(). It would be great to have this implementation and I can contribute to enabling this feature if needed.

@pjbull
Copy link
Member

pjbull commented Sep 2, 2022

Thanks for filing @gaisensei! @benbemoh has a sample implementation here which is relevant for the discussion.

I think we'd like to support passing the ExtraArgs, but I want to think a bit about the API design we want. Let's use this issue to explore our options.

A few open questions/thoughts on this topic:

  • upload_from and download_to are implemented generically (not per service), so we don't want to change the kwargs to be service specific
  • ExtraArgs can be specified per-operation, whereas implementations like Add boto3 extra_args field for upload operation #253 set it at the S3Client level. It may make sense to set defaults there, but we may also want to support passing these args through to functions explicitly.
  • If we support setting the default ExtraArgs in Client instantiation, is there a way to avoid having a set of long named kwargs like boto3_upload_extra_args for every potential boto3 function. Seems potentially painful to maintain and support if the internals of the clients get refactored
  • Do we want to support just ExtraArgs or other potential kwargs as well? I could see just generically collecting sdk_kwargs in upload_from and download_to
  • As pointed out in Add boto3 extra_args field for upload operation #253, not all the boto3 functions we use accept ExtraArgs in the same way.

@benbemoh
Copy link

Thanks for filing @gaisensei! @benbemoh has a sample implementation here which is relevant for the discussion.

I think we'd like to support passing the ExtraArgs, but I want to think a bit about the API design we want. Let's use this issue to explore our options.

A few open questions/thoughts on this topic:

  • upload_from and download_to are implemented generically (not per service), so we don't want to change the kwargs to be service specific
  • ExtraArgs can be specified per-operation, whereas implementations like Add boto3 extra_args field for upload operation #253 set it at the S3Client level. It may make sense to set defaults there, but we may also want to support passing these args through to functions explicitly.
  • If we support setting the default ExtraArgs in Client instantiation, is there a way to avoid having a set of long named kwargs like boto3_upload_extra_args for every potential boto3 function. Seems potentially painful to maintain and support if the internals of the clients get refactored
  • Do we want to support just ExtraArgs or other potential kwargs as well? I could see just generically collecting sdk_kwargs in upload_from and download_to
  • As pointed out in Add boto3 extra_args field for upload operation #253, not all the boto3 functions we use accept ExtraArgs in the same way.

IMHO, the issue with ExtraArgs in upload_from and download_to methods is that the extra args depends on the implementation: for instance, there exist many aws boto3 methods to upload a file into aws s3 (e.g. boto3.client('s3').upload_file(), boto3.client('s3').upload_fileobj, etc). So if you change the code implementation of upload_from and download_to methods in the future, these extra_args might require a change accordingly.
Besides, a similar context here has been addressed using ExtraArgs in the client instantiation: see boto3_transfer_config and content_type_method: I guess there should be a homogeneous implementation/way to deal with ExtraArgs.

@pjbull
Copy link
Member

pjbull commented Dec 30, 2022

@benbemoh @gaisensei Thanks for the helpful discussion here. We've added the ability to pass extra args to the client in #307.

See the updates to the documentation for how to pass the args to the client:
https://cloudpathlib.drivendata.org/stable/authentication/#other-s3-extraargs-in-boto3

You can test it in version 0.12.0, which is on PyPI now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants