-
Notifications
You must be signed in to change notification settings - Fork 877
Direct s3 download via redirect #293
Direct s3 download via redirect #293
Conversation
Thanks for the contribution! Please make sure the tests pass, both unit tests and pep8. You can run them locally using |
Note, I changed |
Great feature. I actually just went scouring through the existing code looking for this, and I'm very happy to see it in a PR. At |
# offload a lot of expensive I/O and get faster I/O | ||
content_redirect_url = store.content_redirect_url(path) | ||
if content_redirect_url: | ||
return flask.redirect(content_redirect_url, 301) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think 302 would be more broadly useful. I'm thinking especially about a backend that generates signed URLs that expire, which is something S3 can actually do. 301 means that future requests should use the content_redirect_url
instead of the url to this view. However, I think most use cases would benefit from having future requests for this resource still come to this URL.
Generally speaking, if we want to
- change the redirect's destination URL from time to time (in order to do load balancing, signed URLs, or any other purpose)
- authorize access
- log access
then 302 is the way to go.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good, catch we should definitely be using 302
. I'll get right on this.
I suspect this wont merge automatically anymore... I don't mind unbitrotting it, if there is an interest in merging this. Note, I'm using a variation of this on EC2 with S3 in the same region, my instances pull with about 10 MB/s. |
|
||
# If store allows us to just redirect the client let's do that, we'll | ||
# offload a lot of expensive I/O and get faster I/O | ||
content_redirect_url = store.content_redirect_url(path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of just checking if the storage driver supports this, I would also enable a config directive ("storage_redirect is True/False").
Yes there is interest to merge this. I left a comment. Sorry for the lag in PR reviewing... Left a comment inline to make this configurable to not break the current behavior. |
Thanks for the feedback I'm all pro configurability...
|
I rebased and squashed all changes into one commit... Added the key Note, regarding region specification, the importance of this is greatly understated in boto documentation. Please let me know if there is any other details I should address... I would like to tweak multipart upload buffer size, but that I'll leave that to another PR :) |
LGTM cc @shin- |
LGTM Thank you for the contribution! |
Allow to configure a s3 region and enable redirecting for faster and cheaper images downloads. Further details: docker-archive/docker-registry#293 Fixes deis#746
Allow to configure a s3 region and enable redirecting for faster and cheaper image downloads. Further details: docker-archive/docker-registry#293 Fixes deis#746
This adds a method to the storage strategy allowing it to provide a URL to which client can be redirected.
For S3 backed storage this offloads I/O to S3 and makes the download go even faster.
For people who don't want to put their registry behind a CDN and persist errors into that CDN (as with cloudflare) this is a decent solution. We could make it optional in config.
I also added S3 region to config. I've always had problems whenever I didn't use the right S3 end-point with boto.
But it could just be who is paranoid, see boto/boto/issues/621
(Credit where credit is due, the redirect idea was prototyped by James Lal, this is just a cleaner implementation).