New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for copy on multipart_upload with auto-detection on s3_object copy methods #218

Merged
merged 3 commits into from May 14, 2013

Conversation

Projects
None yet
2 participants
@LeeGraber
Contributor

LeeGraber commented Apr 3, 2013

as per
http://docs.aws.amazon.com/AmazonS3/latest/API/mpUploadUploadPartCopy.ht
ml, this adds an explicit api on the multipart_upload object
(copy_part) as well as auto-detection and handling of file > 5GB for
copy method on an s3_object

Added support for copy on multipart_upload
as per
http://docs.aws.amazon.com/AmazonS3/latest/API/mpUploadUploadPartCopy.ht
ml, this adds an explicit api on the multipart_upload object
(copy_part) as well as auto-detection and handling of file > 5GB for
copy method on an s3_object

@LeeGraber LeeGraber referenced this pull request Apr 3, 2013

Closed

Multipart copy support #187

@trevorrowe

This comment has been minimized.

Member

trevorrowe commented Apr 4, 2013

Looks like a very good first start.

There are only a few changes I would make to the AWS::S3::Client operation you added. We are looking forward to a v2 release of the SDK where the AWS::S3::Client will use an API configuration just like the other services. As part of this translation, the S3::Client operations will only accept the params defined directly in the S3 API documentation.

It looks like the current implementation accepts the following options, only two would need to be removed:

  • :bucket_name
  • :key
  • :upload_id
  • :part_number
  • :copy_source
  • :copy_source_range
  • :version_id
  • :first_byte ( not found in S3 API )
  • :last_byte ( not found in S3 API )

Though not necessary, there are a handful of other options we could add to this method for completeness:

  • :copy_source_if_match
  • :copy_source_if_modified_since
  • :copy_source_if_none_match
  • :copy_source_if_unmodified_since

In addition to that, the method should be renamed to #upload_part_copy to match the naming pattern. Please feel free to alias this to a more natural name. Using the same name as the API documentation aids in discoverability.

I'm still going through the code. I'll leave some comments and questions inline there.

@trevorrowe

This comment has been minimized.

trevorrowe commented on lib/aws/s3/s3_object.rb in f81d2f5 Apr 4, 2013

What about setting threshold to options[:multipart_copy_threshold] || config.s3_multipart_copy_threshold instead of the min of those two?

This comment has been minimized.

trevorrowe replied Apr 4, 2013

Also, you don't need the assignment statement in this method (looks like it was probably extracted from elsewhere and got left behind).

This comment has been minimized.

LeeGraber replied Apr 4, 2013

This comment has been minimized.

trevorrowe replied Apr 4, 2013

My original thought was the code currently prevents me from passing in a runtime option of a value larger than the configured value (e.g. the configured value is 2GB but I really want to use 5GB).

I'm not thinking this maybe shouldn't be an issue at all. If the "threshold" is removed completely then this becomes a non-issue. The threshold exists to switch AWS::S3::S3Object#copy_from between a vanilla #copy_object and multipart copy. This happens because it makes a head request to determine the object's current size. I would prefer to see the head call removed, which makes this threshold irrelevant (see my other comments).

This comment has been minimized.

LeeGraber replied Apr 4, 2013

@trevorrowe

This comment has been minimized.

trevorrowe commented on lib/aws/s3/client.rb in f81d2f5 Apr 4, 2013

You can omit this process_response block and instead pass the name of an XML parser object to the object_method call on line 1358. You can see examples of this in other methods (like get_bucket_website, it uses XML::GetBucketWebsite). If you open up the aws/s3/client/xml.rb file you should find other XML parsers with a simple grammar for parsing the XML.

@trevorrowe

This comment has been minimized.

trevorrowe commented on lib/aws/s3/s3_object.rb in f81d2f5 Apr 4, 2013

I'm a little concerned about the need here to head the object to get the content length to auto-switch to multipart copy from copy object. What about instead accepting a boolean like :use_multipart_copy?

This comment has been minimized.

LeeGraber replied Apr 4, 2013

This comment has been minimized.

trevorrowe replied Apr 4, 2013

My primary issue is the most common scenario (copying an object smaller than 5GB) is now going to pay a 2 operation cost. I feel like (I could be wrong) that most users know when their object exceeds (or is likely to exceed) 5GB.

How about this process:

  • accept a boolean option that forces a multipart copy
  • in the absence of the :use_multipart_copy option, default to a vanilla copy object
  • rescue runtime errors from S3 that indicate the file was to large and then fall back to multipart copy, also log a warning

This provides the benefit of a single-operation copy for the default case, and the ability to force a multipart copy (without needing to determine file size). Thoughts?

This comment has been minimized.

LeeGraber replied Apr 4, 2013

@LeeGraber

This comment has been minimized.

Contributor

LeeGraber commented Apr 12, 2013

Trevor, I made changes as to our discussions, tested the calls, and adjusted my code to be responsible for the head call. I did re-upload a 5GB file (mine was deleted) so I have not added the internal rescue and retry. Please take a look and let me know if we are on the same page.

@trevorrowe

This comment has been minimized.

Member

trevorrowe commented Apr 12, 2013

Thats looking pretty good!

The SDK already deals with failed requests and retries (by default up to 3 times per request) before raising the error. Have you experienced errors that propagated out during a multipart copy?

@LeeGraber

This comment has been minimized.

Contributor

LeeGraber commented Apr 12, 2013

No problems experienced by me. It is working fine.

The only thing I haven't done, which you recommended, is if the user
doesn't specify :use_multipart_copy, and the file is >5GB, then AWS throws
an exception. I am not catching that and switching to multipart
automagically. The caller would have to catch the exception. It is a nice
feature to have ... I just didn't implement it. :) Right now the code in my
application simply always checks the source length (we don't do copy that
often for me to worry about the perf).

Let me know what else you need before you are ready to run it through what
ever you do before accepting the push.

Thanks
Lee

On Fri, Apr 12, 2013 at 1:25 PM, Trevor Rowe notifications@github.comwrote:

Thats looking pretty good!

The SDK already deals with failed requests and retries (by default up to 3
times per request) before raising the error. Have you experienced errors
that propagated out during a multipart copy?


Reply to this email directly or view it on GitHubhttps://github.com//pull/218#issuecomment-16315655
.

@trevorrowe

This comment has been minimized.

Member

trevorrowe commented Apr 12, 2013

Ideally we should create an integration test. As part of the test we could upload a ~10 MB file and then perform a copy using the multipart copy. Once we have that test then I am happy to merge.

@LeeGraber

This comment has been minimized.

Contributor

LeeGraber commented Apr 12, 2013

Okay ... I will take a look at your test code. :)

On Fri, Apr 12, 2013 at 2:04 PM, Trevor Rowe notifications@github.comwrote:

Ideally we should create an integration test. As part of the test we could
upload a ~10 MB file and then perform a copy using the multipart copy. Once
we have that test then I am happy to merge.


Reply to this email directly or view it on GitHubhttps://github.com//pull/218#issuecomment-16317503
.

@trevorrowe

This comment has been minimized.

Member

trevorrowe commented Apr 12, 2013

If you have questions on how to add a cucumber feature, let me know.

@trevorrowe trevorrowe merged commit 70d011c into aws:master May 14, 2013

1 check passed

default The Travis build passed
Details
@trevorrowe

This comment has been minimized.

Member

trevorrowe commented May 14, 2013

I merged your pull request locally so I could add an integration test (93300dc). Your commits should now be on master. Thanks for the excellent work!

@LeeGraber

This comment has been minimized.

Contributor

LeeGraber commented May 14, 2013

Thanks Trevor. We look forward to switching back to the master branch :)

On Tue, May 14, 2013 at 10:12 AM, Trevor Rowe notifications@github.comwrote:

I merged your pull request locally so I could add an integration test (
93300dc 93300dc). Your
commits should now be on master. Thanks for the excellent work!


Reply to this email directly or view it on GitHubhttps://github.com//pull/218#issuecomment-17890543
.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment