Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Already on GitHub? Sign in to your account

s3.multipart.MultiPartUpload.upload_part_from_file fails to return ETag of the uploaded part #1305

Closed
vitaly-krugl opened this Issue Feb 6, 2013 · 4 comments

Comments

Projects
None yet
3 participants

Per AWS docs, the application must reap the ETag values from each "upload part" request and submit those ETags with the "complete multipart upload" request, instead of relying on "Multipart Upload Listings" request. For the boto API, this implies the following: 1. upload_part_from_file needs to return the ETag value, and 2. boto needs to provide a way to pass those ETags to a method that implements the "complete multipart upload" operation.

What I think the AWS docs are trying to say is that due to S3's property (e.g., eventual consistency), the "Multipart Upload Listings" request is not guaranteed to return all the parts as soon as the last part is uploaded, so using its results for submitting the "Multipart Completed" request might have unpredictable outcome.

Here is the supporting information...

Per AWS docs at http://docs.aws.amazon.com/AmazonS3/latest/dev/mpuoverview.html:

From the "Parts Upload" section: "Whenever you upload a part, Amazon S3 returns an ETag header in its response. For each part upload, you must record the part number and the ETag value. You need to include these values in the subsequent request to complete the multipart upload."

and

From the "Multipart Upload Listings" section: "Note Only use the returned listing for verification. You should not use the result of this listing when sending a complete multipart upload request. Instead, maintain your own list of the part numbers you specified when uploading parts and the corresponding ETag values that Amazon S3 returns."

And per http://docs.aws.amazon.com/AmazonS3/latest/API/mpUploadUploadPart.html for the "Upload Part" operation:

Sample Response
The response includes the ETag header. You need to retain this value for use when you send the Complete Multipart Upload request.

HTTP/1.1 200 OK
x-amz-id-2: Vvag1LuByRx9e6j5Onimru9pO4ZVKnJ2Qz7/C1NPcfTWAtRPfTaOFg==
x-amz-request-id: 656c76696e6727732072657175657374
Date:  Mon, 1 Nov 2010 20:34:56 GMT
ETag: "b54357faf0632cce46e942fa68356b38"
Content-Length: 0
Connection: keep-alive
Server: AmazonS3
Owner

garnaat commented Feb 21, 2013

You are right. I propose modifying upload_part_from_file to return a tuple of the part number and the etag. Would that work for you? BTW, it occurs to me that we should probably provide a concurrent multipart uploader in boto. I know s3put has one but it seems like it would be handy as part of the library.

@ghost ghost assigned garnaat Feb 21, 2013

@garnaat: Also, the method that completes the multi-part upload will need to accept a list of partNumber/etag pairs. Would you like me to take a stab at those changes?

upload_part_from_file needs to return whatever info that the corresponding S3 API returns. I don't like to use raw tuples for such things, as having to remember the order of the fields in client code is error-prone, and also makes it more difficult/error-prone to add more properties later (e.g., partNumber, etag = upload_part_from_file(...) will break if you add another element to the tuple, etc.). I prefer to use either an explicit class instance as you do for Keys, etc., or a namedtuple at the minimum for its instance-member-like access syntax.

Posix threads (via python threading module) should be fine here for concurrency, since socket I/O spends a lot of time in C code (waiting for I/O to complete) with the GIL presumably unlocked, and they are lighter-weight than full-blown processes.

Owner

garnaat commented Feb 22, 2013

I would love to have you take a stab at the changes. Thanks!

Ok, working on it

@danielgtaylor danielgtaylor added a commit that referenced this issue Sep 3, 2013

@danielgtaylor danielgtaylor Merge pull request #1675 from tpodowd/mpu_return_part_for_etag
Return the part/key so that the user can manage multipart uploads. Fixes #1675, #1305.
e9a2c59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment