-
Notifications
You must be signed in to change notification settings - Fork 2.3k
Conversation
Hi Evan, I reviewed your code - please take a look on Github. Thanks, Mike On Thu, Apr 19, 2012 at 5:31 PM, Evan Worley <
|
Thanks Mike, I'll resolve that bug and run the tests. Thanks for the instructions on how to run the tests. Also can you elaborate on how you ran the resumable upload test in #2? |
I'm getting a strange error when running the ssl tests, "gaierror: [Errno -5] No address associated with hostname". Any idea what this is about? Also a seemly unrelated S3 test error FAIL: test_1_versions (s3.test_versioning.S3VersionTest)Traceback (most recent call last): |
On the tests: One thing I suggest is to get a baseline: sync back to just before you made your changes and run the tests from that code snapshot. See if you get either/both of these failures. If you do, a couple thoughts about what might be causing them: |
Thanks for the tip. I have all the s3 and gs tests passing. The ssl tests On Fri, Apr 20, 2012 at 3:59 PM, Mike Schwartz <
|
All tests are now passing, would you please have another look? Thanks, |
One last request: can you please do a git rebase so that your comment shows up as the top entry in the log? Otherwise this commit will look like "Merge branch 'develop' of github.com:evanworley/boto into develop". |
Hi Evan, I added a couple more comments. All looks good now; main thing left at this Thanks again, Mike On Fri, Apr 20, 2012 at 4:05 PM, Evan Worley <
|
This greatly reduces the wall clock time for an upload when uploading large files. Situations in which the full MD5 is still calculated in its entirety - When resuming a partial upload - When a name is not specified for the object, because GCS uses the MD5 as the name Updating one test in ResumableUploadTests to change a byte that has already been uploaded, so a failure will occur. Otherwise the on-the-fly evaluation of the md5 will not cause an error, as it will be computed after mutation has occurred.
Hi Mike, I just finished the rebase. Thanks for all the help and feedback. Evan |
I renamed this change to "Incremental MD5 for uploads" since it actually works for both GS and S3. |
Committed. Thanks for the work on this! Mike On Fri, Apr 20, 2012 at 5:18 PM, Evan Worley <
|
Excellent, thanks for all the help/feedback, Mike. |
Hi Evan, One additional thought about the MD5 change you made: Your mod only causes Thanks, Mike On Sun, Apr 22, 2012 at 10:48 PM, Evan Worley <
|
Hey Mike, This hadn't even occurred to me, as we use aria2c to download our files Cheers, On Mon, Apr 23, 2012 at 9:47 AM, Mike Schwartz <
|
Thanks guys. I took a look at incremental MD5 computation for downloads, and the only thing needed was a simple check in gsutil. Please have a look at http://code.google.com/p/gsutil/issues/detail?id=55. Just like resumed uploads, resumed downloads still compute the full MD5 at the end. Both resumed uploads and resumed downloads could do incremental MD5 computation if they stored the partial checksum in the tracker, but this is a larger change and I wouldn't want to delay these simple fixes to support incremental MD5 computation for resumed uploads/downloads. Please let me know if you have any questions, or if my patch is missing anything. Evan |
Hi Evan, Something's not right with that patch - a number of gsutil tests fail with PYTHONPATH=./boto:$PYTHONPATH ./gslib/test_commands.py Thanks, Mike On Wed, Apr 25, 2012 at 6:01 PM, Evan Worley <
|
Thanks for the quick feedback, Mike. I wasn't aware of these tests, and the mock key object doesn't have the md5 attribute. I updated the check to also check for the md5 attribute's existence. I've attached an updated diff to the gsutil issue, and it passes all the tests on my machine. Evan |
Hmm, I don't see the patch, either in email or on github. Can you please Thanks for the continued work on this. Mike On Wed, Apr 25, 2012 at 7:30 PM, Evan Worley <
|
Incremental MD5 for GS
I've added support for computing incremental md5s for GS, when possible. There are a few situations that I identified where it wasn't possible, mentioned in the commit notes of the larger commit.
Please let me know if I missed anything. I ran a few manual tests, and everything seems to work as expected. It wasn't clear to me how to run the boto test suite, nor how to run gsutil with a tracker url.
Also, I noticed that the Name of the Key is set to the md5 when the name is missing, so I had to compute the MD5 ahead of time if the name is None.