Fix Content-MD5 header for s3transfer #2221
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Overview
This is a continuation of #1985 to resolve #1979. We're adding two additional tests to prevent future regression with empty bodies, and adding some more info on the underlying issue.
Bug
While the initial PR was addressing a lack of
Content-MD5
headers being added for empty files going to s3, the bug is little more subtle. Currently, using the low level API forPutBucket
will ensure all input, regardless of length, is transformed into a file-like object. When this is passed to our MD5 checksum function, it will always beTrue
and get the header added correctly. These always evaluate toTrue
becausebool
resolves "truthiness" to be if the file-object exists or not.The bug enters when the higher level s3 APIs are being invoked like
s3 sync
in the AWS CLI. This replaces our standard file-like wrappers (StringIO, BytesIO) with s3transfer'sReadFileChunk
. Due to s3transfer implementing__len__
on what would otherwise appear to be a file-like object, we now get different criteria for what makes it truthy.bool
will choose__len__
> 0 as being more specific than if the file exists. That broke our previous assumptions on what would be coming through toconditionally_calculate_md5
.Outcome
We probably should make
ReadFileChunk
adhere closer to the file-like object spec, but at this point it would be a backwards incompatible change. I think the original proposal from @wimglenn is the safest bet to make things operate as expected.