Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TransferManager.copy() blocks on getObjectMetadata request. #988

Closed
mark-vieira opened this issue Jan 19, 2017 · 8 comments
Closed

TransferManager.copy() blocks on getObjectMetadata request. #988

mark-vieira opened this issue Jan 19, 2017 · 8 comments
Assignees
Labels
feature-request A feature should be added or improved.

Comments

@mark-vieira
Copy link

Calls to TransferManager.copy() aren't really truly async since the method blocks while fetching object metadata for the source object.

ObjectMetadata metadata = srcS3.getObjectMetadata(getObjectMetadataRequest);

Ideally, this request would also be done asynchronously, as creating a large number of copy operations is bottlenecked by how quickly we can fetch object metadata.

@varunnvs92
Copy link
Contributor

Ideally getObjectMetadata operation is pretty quick as it is a HEAD request. Even other operations download and upload get the metadata synchronously.

Is it causing performance issues for you? Can you provide me data for these performance bottlenecks?

@mark-vieira
Copy link
Author

The behavior I'm seeing is for small transfers (think lots of individual HTML files) the metadata request and copy basically take the same amount of time. The queuing up of transfers then gets limited by how fast we can fetch metadata. Effectively, what I'm seeing as that we aren't taking full advantage of TransferManager's executor pool.

@varunnvs92
Copy link
Contributor

varunnvs92 commented Jan 25, 2017

We use content length from object metadata to update TransferProgress and in CopyCallable. So we need to wait for the getObjectMetadata operation to finish before starting the copy in CopyCallable. If we use separate thread for it, the copy thread is still blocking. I don't think there is any big advantage for customers by changing the current behavior.

If you like, send us a PR and we are happy to take a look.

@varunnvs92 varunnvs92 added the feature-request A feature should be added or improved. label Jan 27, 2017
@NikolayAtSony
Copy link
Contributor

If size is known, the similar approach could be used for copy as well: #983

@ejono
Copy link

ejono commented Feb 1, 2017

On a related note, I just noticed that this GetObjectMetadata request is made for the source bucket/key but does not include the source versionId (if present). So if you are copying a non-current version of an object, the latest object version's metadata is incorrectly used in the transfer progress calculations and also in determining whether or not to use multipart copy.

I couldn't find another open issue for this, so I opened #1009

@jakeab
Copy link

jakeab commented Mar 18, 2017

I am running into this now. My workflow is currently:

  1. pull metadata from existing object ((HEAD request) to attach, along with additional metadata, to the copied objects
  2. copy the object to one or more places in parallel using transfer manager.

These are small objects, so the synchronous overhead is not ideal, especially to get data possibly many times that is already known. I understand wanting the transfer manager API to stay simple. I notice some of the methods have a callback interface to get metadata, and I wonder if something like that might make sense.

@billoneil
Copy link

billoneil commented Nov 28, 2018

Another issue with this call not being async is if the object doesn't exist the getObjectMetadata throws a 404 immediately. Since the javadoc says it is async I wouldn't expect this exception to be throw until waitForCopyResult is called. This makes the caller handle exceptions in multiple places.

@debora-ito
Copy link
Member

Hey @mark-vieira @NikolayAtSony @ejono @jakeab @billoneil

the SDK team has reviewed the feature request list for V1, and since they're concentrating efforts on V2 new features they decided to not implement this one in V1. It's still being considered for the TransferManager refactor in V2, see the referenced issue above. I'll go ahead and close this one.

Please feel free to comment on the V2 issue with your use case, and reach out if you have further questions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature-request A feature should be added or improved.
Projects
None yet
Development

No branches or pull requests

7 participants