Skip to content

Important Usage Notes

Richard Collette edited this page May 26, 2021 · 3 revisions

About s3FileCacheHoursAgo

When s3FileCacheHoursAgo is enabled, the uploader will create a small footprint of the uploaded file in localStorage.awsUploads. Before a file is uploaded, this cache is queried by a key consisting of the file's name, size, mimetype and date timestamp. It then verifies that the partSize used when uploading matches the partSize currenlty in use. To prevent false positives, the upload then calcuates the MD5 digest of the first part for final verification. If you specify onlyRetryForSameFileName, then a further check is done that the specified destination file name matches the destination file name used previously.

If the uploaded file has an unfinished S3 multipart upload ID associated with it, then the uploader queries S3 for the parts that have been uploaded. It then uploads only the unfinished parts.

If the uploaded file has no open multipart upload, then the ETag of the last time the file was uploaded to S3 is compared to the Etag of what is currently uploaded. If the the two ETags match, the file is not uploaded again.

The timestamp of the last time the part was uploaded is compared against the value of a Date() calculated as s3FileCacheHoursAgo ago as a way to gauge 'freshness'. If the last upload was earlier than the number of hours specified, then the file is uploaded again.

It is still possible to have different files with the same name, size and timestamp. In this case, Evaporate calculates the checksum for the first part and compares that to the checksum of the first part of the file to be uploaded. If they differ, the file is uploaded anew.

Note that in order to determine if the uploaded file is the same as a local file, the uploader invokes a HEAD request to S3. The AWS S3 permissions to allow HEAD also allow GET (get object). This means that your signing url algorithm might want to not sign GET requests. It goes without saying that your AWS IAM credentials and secrets should be protected and never shared.

About AWS Signature Version 4

You can use AWS Signature Version 4. The signerUrl response must respond with a valid V4 signature. This version of Evaporate sends the part payload as UNSIGNED-PAYLOAD because we enable MD5 checksum calculations.

Be sure to configure Evaporate with aws_key, awsRegion and cryptoHexEncodedHash256 when enabling Version 4 signatures.

AWS Signature Version 4 for more information.

AWS S3 Cleanup and Housekeeping

After you initiate multipart upload and upload one or more parts, you must either complete or abort multipart upload in order to stop getting charged for storage of the uploaded parts. Only after you either complete or abort multipart upload, Amazon S3 frees up the parts storage and stops charging you for the parts storage. Refer to the AWS Multipart Upload Overview for more information.

The sample S3 bucket policy shown above should configure your S3 bucket to allow cleanup of orphaned multipart uploads but the cleanup task is not part of Evaporate. A separate tool or task will need to be created to query orphaned multipart uploads and abort them using some appropriate heuristic.

Refer to this functioning Ruby on Rails rake task for ideas.

As of March 2016, AWS supports cleaning up multipart uploads using an S3 Lifecycle Management in which new rules are added to delete Expired and Incompletely multipart uploads. for more information, refer to S3 Lifecycle Management Update – Support for Multipart Uploads and Delete Markers.