Avoid another extra getObjectMetadata request #983

NikolayAtSony · 2017-01-11T19:24:43Z

If client already knows last modified time for S3 object,
there is no reason to pull object metadata from the S3 service
inside TransferManager just to retrieve this information again.

If client already knows last modified time for S3 object, there is no reason to pull object metadata from the S3 service inside TransferManager just to retrieve this information again.

NikolayAtSony · 2017-01-25T23:13:25Z

Will you consider merging it? If not, what concerns do you have?

varunnvs92 · 2017-02-01T04:01:31Z

aws-java-sdk-s3/src/main/java/com/amazonaws/services/s3/transfer/internal/DownloadImpl.java

@@ -77,7 +74,7 @@ public synchronized ObjectMetadata getObjectMetadata() {
        if (s3Object != null) {
            return s3Object.getObjectMetadata();
        }
-        return objectMetadata;
+        return null;


This will be a breaking change as customers who are getting object metadata before will get null now.

No, they wont. Null will only be returned to the customers who will be using the new API where object metadata is not retrieved before object is downloaded. After that even while using new API, object metadata will be populated properly. It's definitely not a breaking change.

No, the s3Object is only populated for serial downloads. For parallel downloads, the s3Object will always be null. That is the reason we store object metadata and return it for parallel downloads.

varunnvs92 · 2017-02-01T04:25:26Z

aws-java-sdk-s3/src/main/java/com/amazonaws/services/s3/model/GetObjectRequest.java

+     * @see GetObjectRequest#GetObjectRequest(String, String, String)
+     * @see GetObjectRequest#GetObjectRequest(String, String, boolean)
+     */
+    public GetObjectRequest(String bucketName, String key, Long lastModifiedTime) {


We model the request and response classes as per service team (Amazon S3 in this case) specifications. The members in GetObjectRequest are based on the S3 Get Object specification. We cannot add it to the request class if we are not sending it over wire.

Well, if you can't accept changes in GetObjectRequest, the other approach is to modify doDownload in TransferManager and pass this timestamp as a separate parameter instead of part of GetObjectRequest. Will you accept such approach?

We use object metadata in 3 different places during Download operation.

Get last modification time.

Get content length

In DownloadImpl task, return metadata when requested by customer. See the other comment.

It is highly unlikely that customers will provide 1 and 2 values. Even then, we need object metadata to use in (3) for parallel downloads. The only solution I see to avoid this call is providing the object metadata as a parameter to download method. This is a rare case as most customers want to download an object from bucket/key values and won't have object metadata. So if we expect them to provide it, they have to make an additional call to get object metadata before calling download method which is not great user experience.

Use case might looks like not being very likely, but it's legit use case and even AWS recommends to use external lookup tables for storing file data attributes outside of S3 (such as key, size, timestamp). Based on this and various other use cases, I can't see a reason why metadata needs to be pulled 100% of the time ahead of object itself. Yes, it might be needed sometimes, but it's not always needed and there is no way to avoid this extra roundtrip now (which btw was avoidable using earlier versions of AWS SDK, so it should be considered as a regression). Refer (for example to 1.9.25) to

aws-sdk-java/aws-java-sdk-s3/src/main/java/com/amazonaws/services/s3/transfer/TransferManager.java

Line 747 in 5907106

private Download doDownload(final GetObjectRequest getObjectRequest,

where it was possible to actually use TransferManager without pulling metadata beforehand.

First, object metadata is available as soon as object download process is started as it's delivered as part of HEAD. For the purpose of pause/resume, there is no reason to pull the metadata beforehand as last modification time could be obtained from the object itself when pause action will be triggered because it's part of S3Object which is about to be put on pause (which might not even happened). If download not even started, it's not needed, right?

Content length could be already known to the client as part of listAvailableObjects() or by other measures and should be used instead of another round trip to the S3 (if available).

As for the point 3, it wasn't the case before (

aws-sdk-java/aws-java-sdk-s3/src/main/java/com/amazonaws/services/s3/transfer/internal/DownloadImpl.java

Line 54 in 5907106

public ObjectMetadata getObjectMetadata() {

), so I fail to see why you think it's imperative that metadata is available in advance of S3Object download start (it was never promised). The whole point of introducing this separate instance of object metadata into DownloadImpl is to achieve the retrieval of last modified time for persisting the object (https://github.com/aws/aws-sdk-java/blob/master/aws-java-sdk-s3/src/main/java/com/amazonaws/services/s3/transfer/internal/DownloadImpl.java#L193). Does it really necessary? If download wasn't started, persist it as 0, if started - pull from the S3Object metadata.

Yes, probably the current solution offered by me is a bit straightforward and nicer one is required, but let's decide how to fix it. I'm open to suggestions. It doesn't make sense to sequentially pull the metadata for each object just to know information we already know...

I understand your concerns. I agree that making HEAD request to get metadata for each call might be excessive. I am worried that if we remove object metadata from DownloadImpl, it might be a breaking change. I have to think some more on how to fix it in a clean way. I don't have bandwidth this week to work on this. I will look at this issue next week and get back to you. Thanks.

ejono · 2017-02-06T23:41:50Z

I agree that it doesn't make sense to add lastModifiedTime to GetObjectRequest. However, what about adding a new overloaded version of TransferManager.download() that takes an ObjectMetadata class? If specified and non-null, it can be used in place of having doDownload() make its own GetObjectMetadataRequest. A similar approach can be used in TransferManager.copy() in order to resolve #988.

NikolayAtSony · 2017-02-07T00:09:46Z

The suggestion about passing ObjectMetadata to download() is not very appealing as it's not that the real object metadata is present (with all headers, etc) - it's only range & timestamp that is relevant for TransferManager here. So, aside from range & last modified date there is nothing to fill into it and it seems wrong to partially initialize the object when the whole purpose now for pulling it is to obtain range & timestamp. Range could be in GetObjectRequest and it leaves only timestamp as a potential initialized object inside this ObjectMetadata instance.

Could it be better to just pass this timestamp to the overloaded version of download() and tweak doDownload to only do the pull of metadata if both range & timestamp are not present? If yes, I could create a new pull request for it.

ejono · 2017-02-07T00:16:10Z

My suggestion was coming more from the perspective of fixing it this way in TransferManager.copy() (where you very well might already have a full ObjectMetadata instance that you could then pass to TransferManager.copy()) and then also fixing it similarly here in TransferManager.download(). You could conceivably already have a full ObjectMetadata object in this case too, right? I guess that's not quite what you had in mind though, which is why you are suggesting passing just the lastModifiedTime. Could you please describe a little more about how you might already have this lastModifiedTime but not a full ObjectMetadata object? (I'm not saying there's no case but am just curious what your exact use case is.)

ejono · 2017-02-07T00:20:45Z

Oops, nvm, I just realized that you already explained this above.

millems · 2017-11-10T23:29:07Z

Sorry about the long delay on this one. Did we still want to iterate on this? We're mostly code frozen for transfer manager on 1.11.x at this point, other than for fixing bugs.

We'll make a note to integrate this feedback when we implement 2.0.x's transfer manager: aws/aws-sdk-java-v2#37. Feel free to add any other transfer manager-specific feature requests on that issue as well.

NikolayAtSony · 2017-11-13T17:44:36Z

I think we can close this one and implement this as part of v2 instead. What is the estimated ETA for v2 to be in production status?

millems · 2017-11-13T20:39:24Z

Thanks!

We haven't published a date we'll GA for V2. Once we figure it out, we'll give an announcement on a blog post.

Avoid another extra getObjectMetadata request

4021c5d

If client already knows last modified time for S3 object, there is no reason to pull object metadata from the S3 service inside TransferManager just to retrieve this information again.

shorea requested review from zhangzhx and removed request for zhangzhx January 25, 2017 15:42

shorea assigned zhangzhx Jan 25, 2017

NikolayAtSony mentioned this pull request Jan 31, 2017

TransferManager.copy() blocks on getObjectMetadata request. #988

Closed

varunnvs92 reviewed Feb 1, 2017

View reviewed changes

shorea assigned shorea and unassigned zhangzhx May 15, 2017

millems closed this Nov 13, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Avoid another extra getObjectMetadata request #983

Avoid another extra getObjectMetadata request #983

NikolayAtSony commented Jan 11, 2017

NikolayAtSony commented Jan 25, 2017

varunnvs92 Feb 1, 2017

NikolayAtSony Feb 1, 2017

varunnvs92 Feb 1, 2017

varunnvs92 Feb 1, 2017

NikolayAtSony Feb 1, 2017 •

edited

varunnvs92 Feb 1, 2017

NikolayAtSony Feb 1, 2017

varunnvs92 Feb 3, 2017

ejono commented Feb 6, 2017

NikolayAtSony commented Feb 7, 2017

ejono commented Feb 7, 2017

ejono commented Feb 7, 2017

millems commented Nov 10, 2017

NikolayAtSony commented Nov 13, 2017

millems commented Nov 13, 2017

Avoid another extra getObjectMetadata request #983

Avoid another extra getObjectMetadata request #983

Conversation

NikolayAtSony commented Jan 11, 2017

NikolayAtSony commented Jan 25, 2017

varunnvs92 Feb 1, 2017

Choose a reason for hiding this comment

NikolayAtSony Feb 1, 2017

Choose a reason for hiding this comment

varunnvs92 Feb 1, 2017

Choose a reason for hiding this comment

varunnvs92 Feb 1, 2017

Choose a reason for hiding this comment

NikolayAtSony Feb 1, 2017 • edited

Choose a reason for hiding this comment

varunnvs92 Feb 1, 2017

Choose a reason for hiding this comment

NikolayAtSony Feb 1, 2017

Choose a reason for hiding this comment

varunnvs92 Feb 3, 2017

Choose a reason for hiding this comment

ejono commented Feb 6, 2017

NikolayAtSony commented Feb 7, 2017

ejono commented Feb 7, 2017

ejono commented Feb 7, 2017

millems commented Nov 10, 2017

NikolayAtSony commented Nov 13, 2017

millems commented Nov 13, 2017

NikolayAtSony Feb 1, 2017 •

edited