Skip to content
This repository has been archived by the owner on Sep 12, 2018. It is now read-only.

Direct s3 download via redirect #293

Merged

Conversation

jonasfj
Copy link
Contributor

@jonasfj jonasfj commented Mar 17, 2014

This adds a method to the storage strategy allowing it to provide a URL to which client can be redirected.
For S3 backed storage this offloads I/O to S3 and makes the download go even faster.

For people who don't want to put their registry behind a CDN and persist errors into that CDN (as with cloudflare) this is a decent solution. We could make it optional in config.

I also added S3 region to config. I've always had problems whenever I didn't use the right S3 end-point with boto.
But it could just be who is paranoid, see boto/boto/issues/621

(Credit where credit is due, the redirect idea was prototyped by James Lal, this is just a cleaner implementation).

@shin-
Copy link
Contributor

shin- commented Mar 18, 2014

Thanks for the contribution!

Please make sure the tests pass, both unit tests and pep8. You can run them locally using tox.

@jonasfj
Copy link
Contributor Author

jonasfj commented Mar 18, 2014

Note, I changed s3_region to be an optional configuration key. I think people should specify it, as it ensures the right S3 end-point is used. I've seen this cause intermittent errors with boto in other projects.
(But like I said, it could just be me who is paranoid).

@mhrivnak
Copy link
Contributor

Great feature. I actually just went scouring through the existing code looking for this, and I'm very happy to see it in a PR. At $WORK, we may end up creating a new storage plugin for our purposes, and this feature is a must-have.

# offload a lot of expensive I/O and get faster I/O
content_redirect_url = store.content_redirect_url(path)
if content_redirect_url:
return flask.redirect(content_redirect_url, 301)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think 302 would be more broadly useful. I'm thinking especially about a backend that generates signed URLs that expire, which is something S3 can actually do. 301 means that future requests should use the content_redirect_url instead of the url to this view. However, I think most use cases would benefit from having future requests for this resource still come to this URL.

Generally speaking, if we want to

  • change the redirect's destination URL from time to time (in order to do load balancing, signed URLs, or any other purpose)
  • authorize access
  • log access

then 302 is the way to go.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good, catch we should definitely be using 302. I'll get right on this.

@jonasfj
Copy link
Contributor Author

jonasfj commented Apr 14, 2014

I suspect this wont merge automatically anymore... I don't mind unbitrotting it, if there is an interest in merging this.

Note, I'm using a variation of this on EC2 with S3 in the same region, my instances pull with about 10 MB/s.
So big 4GB images aren't much of a problem. Using the old code, I can only imagine the strains it would put on my registry server, when multiple nodes wants to pull a 4G image.


# If store allows us to just redirect the client let's do that, we'll
# offload a lot of expensive I/O and get faster I/O
content_redirect_url = store.content_redirect_url(path)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of just checking if the storage driver supports this, I would also enable a config directive ("storage_redirect is True/False").

@samalba
Copy link
Contributor

samalba commented Apr 15, 2014

Yes there is interest to merge this. I left a comment. Sorry for the lag in PR reviewing...

Left a comment inline to make this configurable to not break the current behavior.

@jonasfj
Copy link
Contributor Author

jonasfj commented Apr 15, 2014

Thanks for the feedback I'm all pro configurability...

  • I'll take look at fixing this tomorrow :)

@jonasfj
Copy link
Contributor Author

jonasfj commented Apr 15, 2014

I rebased and squashed all changes into one commit... Added the key storage_redirect to configure if redirects should be used.

Note, regarding region specification, the importance of this is greatly understated in boto documentation.
But for performance and stability this is very important. At least that is my experience..

Please let me know if there is any other details I should address... I would like to tweak multipart upload buffer size, but that I'll leave that to another PR :)

@samalba
Copy link
Contributor

samalba commented Apr 16, 2014

LGTM

cc @shin-

shin- added a commit that referenced this pull request Apr 16, 2014
@shin- shin- merged commit 6732a5f into docker-archive:master Apr 16, 2014
@shin-
Copy link
Contributor

shin- commented Apr 16, 2014

LGTM

Thank you for the contribution!

@jonasfj jonasfj deleted the direct-s3-download-via-redirect branch April 16, 2014 16:45
johanneswuerbach added a commit to johanneswuerbach/deis that referenced this pull request Apr 17, 2014
Allow to configure a s3 region and enable redirecting for
faster and cheaper images downloads.
Further details: docker-archive/docker-registry#293

Fixes deis#746
johanneswuerbach added a commit to johanneswuerbach/deis that referenced this pull request Apr 17, 2014
Allow to configure a s3 region and enable redirecting for
faster and cheaper image downloads.
Further details: docker-archive/docker-registry#293

Fixes deis#746
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants