Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document difference between S3 object copy vs copy_from vs copy_object #3051

Open
mdavis-xyz opened this issue Oct 21, 2021 · 9 comments
Open
Labels
documentation This is a problem with documentation. feature-request This issue requests a feature. p3 This is a minor priority issue resources s3

Comments

@mdavis-xyz
Copy link
Contributor

s3.Object has methods copy and copy_from.

Based on the name, I assumed that copy_from would copy from some other key into the key (and bucket) of this s3.Object. Therefore I assume that the other copy function would to the opposite. i.e. copy from this s3.Object to another object. Or maybe the two are the other way around.

But after reading the docs for both, it looks like they both do the same thing. They both copy from another object into this object. Is that correct? What's the point of having two functions that copy in the same direction?

What I want is to copy the existing s3.Object into a different path. I don't want to have to manually instantiate a second s3.Object instance in python, and then pass the bucket and key manually from the first.

i.e. what's the easiest way to copy s3://bucketA/pathA.txt to s3://bucketB/pathB.txt, if I already have s3.Object('bucketA','pathA.txt')?

@mdavis-xyz mdavis-xyz added guidance Question that needs advice or information. needs-triage This issue or PR still needs to be triaged. labels Oct 21, 2021
@stobrien89 stobrien89 added the documentation This is a problem with documentation. label Oct 21, 2021
@stobrien89 stobrien89 self-assigned this Oct 21, 2021
@stobrien89 stobrien89 added investigating This issue is being investigated and/or work is in progress to resolve the issue. needs-review and removed needs-triage This issue or PR still needs to be triaged. investigating This issue is being investigated and/or work is in progress to resolve the issue. labels Oct 21, 2021
@stobrien89
Copy link
Contributor

Hi @mdavis-xyz,

That's a good point— Both copy and copy_from seem to use CopyObject under the hood. I'm not seeing any discernible differences aside from the fact that they accept arguments in different formats— I'll double-check with the team to clarify.

The easiest copy for s3://bucketA/pathA.txt to s3://bucketB/pathB.txt would be to access the meta client and use the s3Transfer copy method:

import boto3

s3 = boto3.resource('s3')

bucket = s3.Bucket('sourcebucketname')
obj = bucket.Object('sourceobject')

s3.meta.client.copy({"Bucket":bucket.name, "Key":obj.key}, 'destinationbucket', 'key')

Hope this helps!

@mdavis-xyz
Copy link
Contributor Author

Can we add a new copy method to s3.Object? One that copies from this object to another?

It seems silly to bother having high-level resources, but then to copy you have to extract the low level client from the service resource or object resource, and then extract not one but two identifiers from the high level resource to pass to the low level call, in a way that is inconsistent with the way that the destination object is passed to the call. This is quite clunky and verbose.

We should be able to do:

obj.copy_to(destinationKey=key, destinationBucket=bucket_name)

but default the destination bucket to the source bucket if omitted:

obj.copy_to(destinationKey=key)

And also:

bucket.copy(sourceKey=key1, destinationKey=key2) # copy within bucket

@mdavis-xyz
Copy link
Contributor Author

The documentation for the low level copy is also a bit confusing.

The S3 client has copy and copy_object. What's the difference?

And why do they use:

s3 = boto3.resource('s3')
s3.meta.client.copy(...)

Instead of

boto3.client('s3').copy()

?

@stobrien89 stobrien89 added feature-request This issue requests a feature. resources s3 and removed needs-review guidance Question that needs advice or information. labels Oct 26, 2021
@stobrien89
Copy link
Contributor

stobrien89 commented Oct 26, 2021

Hi @mdavis-xyz,

I was able to confirm with the team that the resource .copy resource action is basically just the s3 transfer copy method I mentioned to you in my last comment, but the action is also somewhat verbose and clunky to use because the resource you perform the action on is actually ported in as the destination for the copy. I don't think we'd add another copy method, but I definitely think we could improve the way the existing copy action is used.

For the low-level copy itself, it's

a managed transfer which will perform a multipart copy in multiple threads if necessary.

and customization over s3Transfer, which is why you need to access the meta client to use it. copy_object is the official s3 API operation, which isn't the most intuitive to use— the s3 transfer methods (and similarly sync, cp, etc. in the CLI) are there to make usage of some of the s3 APIs a bit easier.

@mdavis-xyz
Copy link
Contributor Author

Hmm, I'm still not understanding the difference.

How is boto3.resource('s3').meta.client different to boto3.client('s3')? Aren't they identical?

Are you saying that the difference is that copy does multi-threaded multi-part copy if necessary, and copy_from does a single-threaded single-part copy?

@stobrien89
Copy link
Contributor

stobrien89 commented Nov 2, 2021

Hi @mdavis-xyz,

I thought initially this was a special case where the meta client was needed and that's why it was documented, but that doesn't appear to be the case— seems to work fine on a standard client as well. And yes, the meta client is just a way to access a service's client from a resource instantiation.

Correct, copy_from is basically S3's copy_object, which is single-threaded and copy is the multi-threaded, multi-part copy from s3Transfer.

@stobrien89 stobrien89 removed their assignment Jan 7, 2022
@tim-finnigan tim-finnigan changed the title Document difference between S3 object copy vs copy_from Document difference between S3 object copy vs copy_from vs copy_object Aug 5, 2022
@tim-finnigan tim-finnigan changed the title Document difference between S3 object copy vs copy_from vs copy_object Document difference between S3 object copy vs eiifccugvrnu`hjruceefecbgdgiriicirktbgerrhhdncopy_from vs copy_object Aug 5, 2022
@tim-finnigan tim-finnigan changed the title Document difference between S3 object copy vs eiifccugvrnu`hjruceefecbgdgiriicirktbgerrhhdncopy_from vs copy_object Document difference between S3 object copy vs copy_from vs copy_object Aug 5, 2022
@aBurmeseDev aBurmeseDev added the p2 This is a standard priority issue label Nov 11, 2022
@tim-finnigan tim-finnigan added p3 This is a minor priority issue and removed p2 This is a standard priority issue labels Nov 21, 2022
@odigity
Copy link

odigity commented Feb 13, 2023

I can't seem to get either to preserve my metadata. (Specifically, LastModified.)

@ghomem
Copy link

ghomem commented Nov 6, 2023

I too was confused between copy() and copy_object(). I thought the obvious one to use was copy() and started using that one, just to realize that copy_object() is much faster, at least for small files (the situation I tested). In my experience copy_object() is 50% faster for a single process - if you use multiple parallel processes the effect is larger.

@zhiweio
Copy link

zhiweio commented Nov 15, 2023

which one support copy 'LastModified'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation This is a problem with documentation. feature-request This issue requests a feature. p3 This is a minor priority issue resources s3
Projects
None yet
Development

No branches or pull requests

7 participants