Sync files with the same size but not content #29
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This is done by comparing an MD5 sum of the file with the etag from S3. The etag
from S3 is an md5 of the entire file unless it is a multipart upload.
My assumption is that the filesize comparison was used to prevent downloading or
opening large files. This commit only runs the MD5 comparison for "small files"
(< 50 kilobytes). This is for two reasons: this way we avoid the processing of
claculating an MD5 sum for very large files, and we avoid the issue of dealing
with miltipart uploads. The reasonable assumption is that if a file changes
without it's size changing, it is likely to be a small file. For example, my
use-case is syncing a REVISION file that contains the git revision as a sha1.
This file is always 41 bytes, but changes frequently.
The "small file" is 50 kilobytes for now, but could easily be changed.