Skip to content
This repository has been archived by the owner on Jan 19, 2024. It is now read-only.

WeTransfer/wt_s3_signer

Repository files navigation

wt_s3_signer Build Status

An optimized AWS S3 URL signer.

Basic usage

s3_bucket = Aws::S3::Bucket.new('shiny-bucket-name')
ttl_seconds = 7 * 24 * 60 * 60

# we suggest caching the S3 client in the application to reuse the cached credentials
s3_client = Aws::S3::Client.new
signer = WT::S3Signer.for_s3_bucket(s3_bucket, client: s3_client, expires_in: ttl_seconds)
url_str = signer.presigned_get_url(object_key: full_s3_key)
      #=> https://shiny-bucket-name.s3.eu-west-1.amazonaws.com/dir/testobject?X-Amz-Algorithm...

Why would you want to use it?

The use case is when you need to rapidly generate lots of presigned URLs to the same S3 bucket. When doing the signing, the AWS SDK works fine - but the following operations need to be performed:

  • Credential refresh
  • Bucket region discovery (in which region does the bucket reside?)
  • Bucket endpoint discovery (which hostname should be used for the request?)
  • Cleanup of the various edge cases (blacklisted signed headers and so on)

The metadata should be retrieved only once if the bucket does not change, but with the standard SDK this information might get refreshed often. And there is a substantial amount of generic code that gets called throughout the SDK call even though it is not strictly necessary.

Our signer bypasses these operations and it performs the credential discovery, as well as bucket metadata discovery, but only once - when you instantiate it. The primary usage pattern is as follows:

signer = WT::S3Signer.for_bucket(my_bucket_resource)
signed_urls = all_object_keys.map do |obj_key|
  signer.presigned_get_url(object_key: obj_key)
end

This will stay performant even if signed_urls contains tens of thousands of entries.

Additionally, we cache all the produced strings very aggressively if they do not change between calls to the signing method. We also derive the signing key only once. This optimizes the signing even more.

Here are some benchmarks we have made for comparison. The S3Signer_SDK class executed the same flow, but it reused the Aws::S3::Presigner object that it would instantiate only once, and then call repeatedly.

Warming up --------------------------------------
WT::S3::Signer#presigned_get_url
                         9.325k i/100ms
S3Signer_SDK#presigned_get_url
                       154.000  i/100ms
Calculating -------------------------------------
WT::S3::Signer#presigned_get_url
                         81.422k (±18.9%) i/s -    391.650k in   5.042435s
S3Signer_SDK#presigned_get_url
                          1.865k (± 9.3%) i/s -      9.240k in   5.009593s

Comparison:
WT::S3::Signer#presigned_get_url:  81421.7 i/s
S3Signer_SDK#presigned_get_url:     1864.9 i/s - 43.66x  slower