Distribution reference update #30043

Open
wants to merge 2 commits into
from

Projects

None yet

6 participants

@dmcgowan
Member
  • Updates distribution vendor to pull in changes for upstreamed reference package with support for Docker specific normalization.
  • Updates cli package to use real reference package rather than forked.
  • RepoInfo is used as splitting point, taking in real reference and returning forked reference.
  • Fixes test in push v2 which used invalid reference. Considering creating a test in distribution reference package to handle ambiguity around "incorrect" names under "docker.io/library". Previously test passed when should have failed (was considering docker.io/library/hello-world better match than docker.io/library/busybox/subapp to docker.io/library/busybox/).
  • Updates forked reference package to use upstream normalization.

This is the first part of the work to remove the forked reference package. The work is split up to keep the size of the PR manageable.

Ping @aaronlehmann @stevvooe

@dmcgowan dmcgowan Vendor updated distribution
Signed-off-by: Derek McGowan <derek@mcgstyle.net> (github: dmcgowan)
e62a79b
cli/command/container/create.go
@@ -160,16 +159,21 @@ func createContainer(ctx context.Context, dockerCli *command.DockerCli, config *
}
var trustedRef reference.Canonical
- _, ref, err := reference.ParseIDOrReference(config.Image)
+ var namedRef reference.Named
@aaronlehmann
aaronlehmann Jan 11, 2017 Contributor

nit: combine into one var statement

cli/command/formatter/image.go
if err != nil {
continue
}
if nt, ok := ref.(reference.NamedTagged); ok {
- repoTags[ref.Name()] = append(repoTags[ref.Name()], nt.Tag())
+ repoTags[reference.FamiliarName(ref)] = append(repoTags[reference.FamiliarName(ref)], nt.Tag())
@aaronlehmann
aaronlehmann Jan 11, 2017 edited Contributor

I'd suggest breaking out reference.FamiliarName(ref) into a variable for readability.

@aaronlehmann
Contributor

A few tests need to be updated. They are expecting Error parsing reference but instead they get invalid reference format.

Also, a build test is trying to create an invalid reference and failing.

var err error
- trustedRef, err = image.TrustedReference(ctx, dockerCli, ref, nil)
+ trustedRef, err = image.TrustedReference(ctx, dockerCli, taggedRef, nil)
@aaronlehmann
aaronlehmann Jan 11, 2017 edited Contributor

Below, there's config.Image = trustedRef.String(). Should this be modified to config.Image = reference.FamiliarString(trustedRef) now that trustedRef is a distribution reference?

@dmcgowan
dmcgowan Jan 11, 2017 Member

Probably, reason I did it there was because there are 2 spots further on that rely on the familiar reference. However I think you are right in that the normalized form should be passed into TrustedReference since it resolves repository info.

@dmcgowan
dmcgowan Jan 11, 2017 Member

Will just update TagTrusted to call FamiliarString, will make the code a bit easier to follow.

cli/command/image/push.go
@@ -48,10 +48,10 @@ func runPush(dockerCli *command.DockerCli, remote string) error {
requestPrivilege := command.RegistryAuthenticationPrivilegedFunc(dockerCli, repoInfo.Index, "push")
if command.IsTrusted() {
- return trustedPush(ctx, dockerCli, repoInfo, ref, authConfig, requestPrivilege)
+ return trustedPush(ctx, dockerCli, repoInfo, ref.Familiar(), authConfig, requestPrivilege)
@aaronlehmann
aaronlehmann Jan 11, 2017 Contributor

I'd rather trustedPush did the "familiarization" itself. It's a bit scary to be passing ref.Familiar() to a function because it has the same type as ref, so it would be very easy to accidentally pass an unfamiliarized reference.

@aaronlehmann
aaronlehmann Jan 11, 2017 edited Contributor

(By same type, I mean NormalizedNamed also satisfies the Named interface.)

cli/command/plugin/install.go
@@ -142,14 +130,14 @@ func runInstall(dockerCli *command.DockerCli, opts pluginOptions) error {
remote = trusted.String()
@aaronlehmann
aaronlehmann Jan 11, 2017 Contributor

Should this be the familiarized version?

@dmcgowan
dmcgowan Jan 11, 2017 Member

I did this on purpose since it does not represent a tag but probably better to consistently use the familiar form when passing strings to the client. The daemon will always normalize values from the API anyway.

cli/command/plugin/push.go
- if reference.IsNameOnly(named) {
- named = reference.WithDefaultTag(named)
+ if _, ok := named.(reference.Canonical); ok {
+ return fmt.Errorf("invalid name: %s", named.String())
@aaronlehmann
aaronlehmann Jan 11, 2017 Contributor

named.String() is the same as name, right?

@dmcgowan
dmcgowan Jan 11, 2017 Member

you are right, and this should use name since that is the user input

cli/command/service/trust.go
}
+ taggedRef := reference.EnsureTagged(namedRef.Familiar())
@aaronlehmann
aaronlehmann Jan 11, 2017 Contributor

Again, I think it's better not to pass Familiar() refs around.

cli/command/service/trust.go
}
+ taggedRef := reference.EnsureTagged(namedRef.Familiar())
+
resolvedImage, err := trustedResolveDigest(context.Background(), dockerCli, taggedRef)
if err != nil {
return fmt.Errorf("Failed to resolve image digest using content trust: %v", err)
@aaronlehmann
aaronlehmann Jan 11, 2017 Contributor

This can use errors.Wrap, like the change you made above.

@dmcgowan
Member

Updated to use familiar more consistently. After making the update the familiar form is only used when outputting to user, sending a string to client, or setting a string on a structure for the client. I am wondering if by not using the NormalizedNamed interface we could completely hide it from the reference package interface and rather prefer the FamiliarString and FamiliarName methods.

Still need to push an update for the test failures from error string changes.

@aaronlehmann
Contributor

I am wondering if by not using the NormalizedNamed interface we could completely hide it from the reference package interface and rather prefer the FamiliarString and FamiliarName methods.

I like that idea.

@dmcgowan dmcgowan Remove use of forked reference package for cli
Use resolving to repo info as the split point between the
legitimate reference package and forked reference package.

Signed-off-by: Derek McGowan <derek@mcgstyle.net> (github: dmcgowan)
61738b9
@dmcgowan
Member

Tests have been fixed. Opened docker/distribution#2139 for the normalization change. Also docker/distribution#2137 to handle the ambiguity mentioned in description. I will vendor these changes in the second part of the update, does not need to effect this PR.

Ping @stevvooe @tonistiigi @nishanttotla for additional review.

@stevvooe
Contributor

LGTM

@tonistiigi
Contributor

I don't quite understand how the follow-up of this would look like for the daemon side changes. Will all daemon functions only take in NormalizedNamed call Familiar() on it but then make sure to pack it to NormalizedNamed again for passing to other functions. Seems error-prone and reminds to bugs we used to have with strings and conversion functions.

tbh it also feels very strange to me that engine defaults like index.docker.io and library are defined in distribution repo. Why would anyone running local registry need them and accidentally using them will only result in wrong results. If Docker makes these fields configurable we need another change in distribution.

@dmcgowan
Member

@tonistiigi the NormalizedNamed type is going away. We decided its always best to deal with normalized references and only use the familiar form when interacting with the user or existing storage. In those cases, the FamiliarName and FamiliarString will be used.

@tonistiigi
Contributor

@dmcgowan So there is no safe type at all. Named is the same as string was - unsafe value that needs to be parsed/converted before used. You would need to do ParseNormalized(ref.String())+FamiliarString() in every safe method that takes a reference(FamiliarString being one example of at least 3-4 helper functions).

@dmcgowan
Member

We originally added the type because we thought having the explicit type would make it simpler, but after trying to integrate the package the extra typed proved unnecessary as there was nowhere that needed to be passing around the familiar form of Named. Therefore it seemed cleaner to get rid of the extra type and rather always treat Named as normalized within the engine code. The distribution version of reference.ParseNamed is still accessible through the package but so far I haven't see a use for it.

As for the needs to check everywhere, reference.Named should be treated as the strongly typed form for canonical references since it cannot easily be cast to like a string. If we find the package being misused we can add options to the reference package to disable creation of the non-normalized form of the type through reference.ParseNamed. Either way, let's move forward with reference.Named being our strong reference type and enforce that we are using the canonical form of it everywhere. Then we can actually go ahead with updating the remaining backend interfaces which still use string to pass in the reference. As I said before, only the API handling and storage objects should even care about the difference between normalized and non-normalized, the rest of the code shouldn't. Many of the calls to FamiliarName and FamiliarString in the backend can probably be done away with anyway since it mostly writing logs or sending user facing messages from deep in the backend anyway.

There are 2 helper functions related to familiarizing. We are considering adding Domain and Path directly to the Named interface but didn't want to break the existing interface in this change.

@aaronlehmann
Contributor

I suspect the experimental failure is unrelated, but does anyone know?

23:10:16 --- FAIL: Test (3439.83s)
23:10:16 panic: DockerTrustSuite.TestBuildContextDirIsSymlink test timed out after 5m0s [recovered]

I think the general approach of treating references as always being normalized makes sense. It's a little unfortunate that we're reusing the reference.Named type for this, since there's the potential for someone to end up with an unnormalized reference if they use reference.ParseNamed. Perhaps we should reintroduce reference.NormalizedNamed, and use it everywhere in docker/docker. I know I supported unexporting reference.NormalizedNamed, but my real concern was that we were mixing reference.Named and reference.NormalizedNamed in the codebase, and reference.Named is ambiguous as to whether the reference is normalized or not.

The other thing I've thought about is passing around a meta-object with Normalized() and Familiar() methods that return references. That way you always have access to both forms, and since you have to make an explicit choice, it's harder to get confused about what kind of reference you're dealing with.

One challenge I see with this PR and especially the followup in the daemon is that we're effectively changing the meaning of the Name() method of Named. Instead of returning the familiar form, it will now return the full name. If we miss any places where it's called, or some merges cross, the behavior will be wrong. Either of the ideas above would potentially help.

@dmcgowan
Member

I feel pretty strongly that enforcing reference.Named as always canonical and the only type is the best direction for the code. It removes ambiguity and makes the code much easier to follow. Having helper functions to generate the strings we use for logs and storage is trivial and in most cases the explicitness of the helper function name makes it much clearer. Certainly Name() will get confused with the helper sometimes, not really a way to prevent misuse other than code review, I found multiple occurances in updating the code where FullName() and Name() were not used in the correct context. However I agree that reference.ParseNamed is currently a problem and we should solve it as part of this effort. I am proposing that we update ParseNamed to error out if the input is not in the canonical form and I will update the registry's use of this function to handle this. Adding another type to distinguish the familiar and normalized forms will just end up feeling bloated as only the normalized form is needed in interfaces and internal functions.

@tonistiigi will restricting ParseNamed from creating non-canonical references satisfy your concerns about the type not being strict enough?

@tonistiigi
Contributor

If there is a type that always has fixed form and there is no way to make object of this type without the fixed form that would help. If this object has methods for different variations then this is exactly the current solution.

As discussed offline, moving to this fixed type(removing current ParseNamed) would not be that easy for distribution repo because although these objects are both called "reference" they represent very different concepts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment