Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for pull-through cached mirroring #19

Closed
stevvooe opened this issue Jan 2, 2015 · 29 comments
Closed

Support for pull-through cached mirroring #19

stevvooe opened this issue Jan 2, 2015 · 29 comments
Assignees
Labels
Milestone

Comments

@stevvooe
Copy link
Collaborator

stevvooe commented Jan 2, 2015

The registry should be able to operate as a pull-through mirroring cache. This means that if a pull operation cannot proceed locally due to missing content, the registry should be able to defer the request to an upstream registry.

Please see docker-archive/docker-registry#658 for some background information on this issue.

Pull

When pulling a manifest, if the content is available locally, it will be served as is. Optionally, the local may check the remote if the content has been updated, using conditional http requests and update the local content. If it is not available locally, the request should be forwarded to the remote registry. If the remote request proceeds, the manifest should be stored locally then served in response to the local request.

When pulling a blob, if the content is available locally, serve it as is. If not available locally, forward the request to the remote registry. If the data is directly available, the data should be forwarded to the client and stored locally, concurrently. If the remote issues a redirect, the local registry should download the data into the local cache and serve the data directly.

Push

All push operations are only attempted on the "local" registry. If they fail, they will not be forwarded to the remote registry.

Open Questions
  1. How should authorization behave? Should credentials be forwarded or should the proxy client obtains its own credentials?
  2. Should we allow one to configure an ACL for outgoing remote requests?
@stevvooe stevvooe added this to the Registry/Beta milestone Jan 2, 2015
@dmp42
Copy link
Contributor

dmp42 commented Jan 2, 2015

  1. Possible credentials sent from the client to the mirror are meant to control access to the mirror itself. They should not be forwarded to the upstream, and the mirror should use anonymous when talking to upstream.
  2. Yes, I believe we should let the admin configure a user/password (instead of anonymous, default).
  3. TTLs (for the manifest) should be configurable (default to 5 minutes?)

2 cents...

@stevvooe
Copy link
Collaborator Author

stevvooe commented Jan 2, 2015

@dmp42 Regarding item 2, above, by ACLs, I meant do we provide control over which names would actually be forwarded? For example, one may not want to ever forward requests for "mydomain.com/" but allow "library/" to be forwarded.

However, local credential configuration for remote registry makes a lot of sense.

@stevvooe
Copy link
Collaborator Author

stevvooe commented Jan 2, 2015

@dmp42 It seems like we should employ conditional http requests to the remote for manifest caching. I'll update the description above to capture this.

@stevvooe
Copy link
Collaborator Author

stevvooe commented Jan 2, 2015

cc @samalba

@samalba
Copy link

samalba commented Jan 11, 2015

Sorry to be late on this...

I agree the credentials should not be sent upstream, the Mirror should be anonymous (at least for short term). However it's important to have some identification (like a modified User-Agent?) to be able to identify that the request is coming from a Mirror. I would not make this configurable for the sake of simplicity.

About the TTL, not configurable either. It would be set from the cache TTL provided upstream (Cache-Control). In this case.

@stevvooe stevvooe added ready and removed 1 - Ready labels Jan 15, 2015
@stevvooe stevvooe modified the milestones: Registry/RC, Registry/Beta Jan 19, 2015
@stevvooe stevvooe added blocked and removed Ready labels Feb 24, 2015
@dmp42
Copy link
Contributor

dmp42 commented Feb 27, 2015

Interesting feedback on this: the TTL (even for long-lived resources) has to be set to a relatively short value (eg: < 2 weeks), otherwise from a legal POV the mirror is no longer considered "caching" but "hosting".

@stevvooe
Copy link
Collaborator Author

@dmp42 The main use case that this proposal covers is for an on-premise, private registry. Users can push and pull content to the "local" storage. If content is not found during a pull, the data can be requested from upstream. If the upstream content is found, it is "cached" locally. It's a cache, whether there is a TTL or not. A cache is a place to store things conveniently. A TTL is merely one technique to manage the a cache. Conversely, putting a TTL on a datastore does not make it a "cache".

What are the ramifications of "hosting" vs "caching" in relation to docker images?

@dmp42
Copy link
Contributor

dmp42 commented Feb 28, 2015

@stevvooe I will not speak to you about that without my lawyer being present :-)

@stevvooe stevvooe removed the Urgent label Mar 18, 2015
@stevvooe stevvooe modified the milestones: Registry/2.1, Registry/2.0.0-rc Mar 18, 2015
@noisy
Copy link

noisy commented Apr 9, 2015

+1

This is "must have" for me!

@munhitsu
Copy link

+1

@zhitaoli
Copy link

+1

@stevvooe
Copy link
Collaborator Author

@zhitaoli @munhitsu @noisy Please take some time to review the proposal in #459.

@mkjaer
Copy link

mkjaer commented Apr 29, 2015

@stevvooe Will that make it possible to solve problems on closed networks as mentioned here so it's possible to pull (from the mirror) without internet access?

@munhitsu
Copy link

@stevvooe see comments in #459
It's a piece of good design

@emopinata
Copy link

@stevvooe what I would love to see is the registry mirror being able to store a push and make it available for use while still pushing it to the hub. I'm in a situation currently where it's taking a significant amount of time to push images to the hub and need the speed of pushing/pulling to/from a local registry, but still want the images to make their way to the Docker Hub.

@stevvooe
Copy link
Collaborator Author

@jalmansor That use case is outside the scope of the proxy caching feature. Once the feature is available, it should be trivial to setup such a system. Simply setup the registry as a proxy cache and push your image. It will then be available locally. Have another process come by and push that to the hub, separately. As long as they have the same name, the caching will work correctly.

@stevenschlansker
Copy link

An important part of this is to allow the Docker client to have configured fallback mirrors -- once I have a "central" registry and "regional" mirrors, I want to configure fallback to central registry if a regional mirror should fail.

@dmp42
Copy link
Contributor

dmp42 commented May 21, 2015

@stevenschlansker definitely an interesting idea. Maybe this can be worked on as a second step? At first, it can act like a single reverse proxy cache, where failure handling should be done at the proxy level with proper load balancing etc.

@stevenschlansker
Copy link

Especially once you are geographically diverse, client-side fallback will be much easier to configure and administer than server-side.

We already have to manage our Docker daemon configuration across all our machines.

We do not have a way to set up HA spanning, say, US west coast and London.

@stevvooe
Copy link
Collaborator Author

@stevenschlansker Agreed this is a valid model but its outside the scope of this particular feature. This proposal is for proxy-caching, which is different from intelligent mirror selection.

If you're interested in this kind of support, please see the proposal on namespaces, issue #303. We are focused on making the client smarter and more configurable to handle the kinds of topologies you describe.

@noisy
Copy link

noisy commented May 29, 2015

Is it possible right now to do mirroring like this on docker-distribution?

@dmp42
Copy link
Contributor

dmp42 commented May 29, 2015

@noisy not yet - but it's coming.

@tobegit3hub
Copy link

Looking forward to this feature!

I noticed the patch was merged. When can we use it? @RichardScothern @dmp42

@RichardScothern
Copy link
Contributor

The intention is for 1.7. Watch this PR for updates @tobegit3hub

@dalanlan
Copy link

Remarkable feature! Is it good to go? @RichardScothern
Let's say I wanna run a mirror registry for gcr.io, what param should i set for MIRROR_SOURCE& MIRROR_SOURCE_INDEX exactly? Seems to work for v1 registry currently though;-)

@RichardScothern
Copy link
Contributor

It's currently in review @dalanlan at #779. Documentation is therein.

@dalanlan
Copy link

Saw it. But it's kinda heavy for me to read through so i just picked up am easy patch from here.

@RichardScothern
Copy link
Contributor

Fair enough @dalanlan There is a markdown documentation file in the pull request that should answer your questions. Please comment there if its not clear.

@RichardScothern
Copy link
Contributor

Merged

openshift-publish-robot referenced this issue in openshift/docker-distribution May 21, 2018
Use docker client in the integration tests

Image-registry-commit: 0beb7ee80381f3ebc49aa17e3a9a4391e0f55934
openshift-publish-robot referenced this issue in openshift/docker-distribution Jun 18, 2018
Use docker client in the integration tests

Image-registry-commit: 0beb7ee80381f3ebc49aa17e3a9a4391e0f55934
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests