NG: Image Federation #662
Comments
On Thu, Oct 30, 2014 at 05:59:11AM -0700, Aaron Weitekamp wrote:
So docker pushes to the push-time registry by default, but skips any Personally, I'd rather keep this out of the image metadata. Instead,
Which you setup in your ~/.dockercfg. Then for pushing you have: $ git push rhel7 https://cdn.redhat.com/ to only push layers in isv/app but not rhel7 to cdn.isv.example.com. |
Thanks a lot @aweiteka A couple questions, and some infos about what's going on with v2:
Does image signing (coming with v2) change that situation for them? Also, if I understand well, crane does 302 to where the actual bits are. So, the company (content owner) has to trust the ISV's registry (/crane) to do what's right here - which to me kind of weakens the "control-point" - eg: I'm not sure I see a difference then between 302, proxy-pass and mirroring, from a control standpoint. About the v2 protocol - it's quite likely that layers urls are going to be namespaced, eg:
The reason for that change is simpler access control (flat namespace for layers - we currently have - doesn't work well). That doesn't mean content is actually duplicated on the registry - but that inside the registry mechanics, there are "mount points" for layers into namespaced url. Right now, the way I see it, it should be pretty easy to:
... the part where I'm not that confident is the engine bits allowing to instruct a registry selectively that a given layer is to be found "elsewhere". Actually, I'm wondering if this is at all an engine decision to make. |
On Thu, Oct 30, 2014 at 11:15:00AM -0700, Olivier Gambier wrote:
But the ISV can put their own auth in front of their registry. That
Hmm. GitHub seems to do fairly well with a flat namespace. Is the
Ah, maybe you indent to have multiple hierarchies? That would be as
Not inside the client to check several repositories for a given layer?
I don't see why image hosting would be anyone's decision to make. |
Do you suggest there may be several layers of authentication (authenticate -> 302 -> authenticate again)? Either way, the point still stands: is there actually a strong "control-point" in trusting a downstream registry to do a redirect?
I don't think we are talking about anything requiring authentication, but rather publicly available content.
With caching, that's exactly the kind of stuff people want.
This is exactly what I am saying. Layers must be accessed under a namespace (foo/myimage), like the manifest itself. Doing otherwise (eg: like we have, a flat namespace for layers) is a mess to get right as far as authorization is concerned.
Here:
means when wking wants to fetch this, the registry/auth has to decide whether wking is entitled to read SOMEID. And this has to be done for every layer.
means we need to verify that wking has access to foo/bar - and this authorization is the same for all layers. Whether SOMEID was "authorized" to be made available under foo/bar in the first place is a one-time operation, at push.
This sounds messy. Having to configure your client with multiple registry endpoints in order to be able to pull a single image - ending with situations where you try to figure out why you are missing some layers.
You are missing the point.
Now the question raised here is how to allow certain registries to delegate the responsibility of delivering specific layers to other registries. |
On Thu, Oct 30, 2014 at 01:00:19PM -0700, Olivier Gambier wrote:
I'm suggesting we skip the 302 entirely. The client would:
I'm not trusting anyone to redirect.
I don't think a few extra auth attempts are going to sink the service.
If the ISV wants to handle the extra bandwidth (with my scheme), they
Ah, I see. If we kept 1: PUT /v1/images/(image_id)/layer or some such, then the auth decision would be something like (for a
With cheap image-id → repository lookup, I think that should be fairly
Registries that want to support stand-alone usage for their users are
This we agree on ;).
They can't do that with 302s, or proxy-pass, or whatever they like? |
Simple answer: no. :-) Blind trying a bunch of services one after the other for every layer is nonsense.
What about we let @aweiteka speak for himself? This is what they do with crane currently if I understand correctly.
This discussion is getting largely of-topic. All this is just muddying the water here - @aweiteka has a use-case, let's try to see clear in it instead of trying to defend your opinionated opinions on things like atomic storages Mr. @wking :-) To sum-it up:
Hang-on ;) |
@dmp42 Signing helps but I am assuming image signing here. Third party distribution agreements may exist but I suspect they present a large legal barrier that would kill partnership and innovation. I understand that leveraging the layered image format removes this barrier. That said, I'm not a lawyer. ;)
Yes, this is how it works. The content owners control and ensure access to the actual bits. If a third party's redirect service fails, that's the third party's problem. This assumes the content owners also have a redirect service for their direct customers.
This may be a separate but related discussion. It does get tricky. We have an x509 scheme passed to the client that provides access to specific paths on our CDN (Akamai). It would be good to think through and discuss this a bit more.
@dmp42 You rightly picked up that Crane has a flat namespace. We're assuming world-readable access at the application and control authn/authz using the above mentioned x509 scheme. The proposed namespace has a lot of benefits so I wouldn't want to discourage that. I don't know if you have have it both ways. I'm not convinced 302 redirects are ideal, but it works and is flexible. Open to other ideas.
Pulp does the same thing using symlinks on traditional block storage. Copies are cheap and content is never duplicated.
@wking I'm assuming URL information is never in image metadata. This is registry metadata only. It's doesn't go with the layer.
@dmp42 Right, "some mechanism." I don't have strong opinions on the specific way users manage registry metadata. Ultimately we're talking about a method to sync distributed metadata efficiently, reliably, securely. Pulse, IPFS, bittorrent etc. all look interesting. @vbatts may have some thoughts here. That may be for V2.$LATER but I suggest we start v2.0 with some fundamental support. |
On Fri, Oct 31, 2014 at 07:25:26AM -0700, Aaron Weitekamp wrote:
Ah, good, that means it won't conflict with image signing :). When |
@wking Good catch. That part of the example suggests the ISV image layers are hosted on the docker hub. That's a valid use case but it doesn't match with my "pull" example where ISV layers come from cdn.isv.com. |
Superseded by distribution/distribution#88. |
What is image layer federation?
Image federation is where dependent image layers are served from different servers. For example, an ISV builds on a Red Hat base image. The ISV layers are served from
cdn.isv.com
and the Red Hat layers are served fromcdn.redhat.com
.The content-addressable v2 image format and registry makes this an ideal time to consider this model.
Why is it important? Who cares?
Many companies require to host their own bits. It's their control point. It's an important legal and provenance issue for them.
How does it work?
A simple example:
What might implementation look like?
Support pushing of metadata only. This assumes image has landed on CDN by another means.
When an image based on the above is pushed the layer upload is skipped.
Example Implementation
This has been implemented in Crane, a component of Pulp. Red Hat uses this as its production registry. Crane is a read-only implementation of the docker registry protocol. Registry metadata (json) is created by the Pulp server.
Crane serves calls to
/v1/repositories/<namespace>/<repository>/images|tags
directly and then redirects (302) any calls to/v1/images/<image_id>/*
.In the following example note the two URL values.
Red Hat base image
A child ISV image file redirects to another URL.
ISV image
The text was updated successfully, but these errors were encountered: