Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docker: add docker-archive transport #148

Merged
merged 3 commits into from
Apr 10, 2017
Merged

docker: add docker-archive transport #148

merged 3 commits into from
Apr 10, 2017

Conversation

cyphar
Copy link
Contributor

@cyphar cyphar commented Nov 2, 2016

This transport is effectively just a copy of the docker-daemon://
transport but replacing all of the Docker client code with simple
filesystem code. It currently works as well as docker-daemon:// (from my
testing).

Further refactoring is necessary, this was just a first pass.

Implements #140.

Signed-off-by: Aleksa Sarai asarai@suse.com

@cyphar cyphar changed the title docker: add docker-archive transport [wip] docker: add docker-archive transport Nov 2, 2016
@cyphar
Copy link
Contributor Author

cyphar commented Nov 2, 2016

Note: Currently skopeo copy docker-archive:local ... fails because it looks like the hashes change between pushing and pulling.

./skopeo.new --policy ../skopeo/default-policy.json copy docker://alpine:latest docker-archive:test:latest
Getting image source signatures
Copying blob sha256:3690ec4760f95690944da86dc4496148a63d85c9e3100669a318110092f6862f
 2.16 MB / 2.21 MB [=========================================================>-]
Copying config sha256:baa5d63471ead618ff91ddfacf1e2c81bf0612bfeb1daf00eb0843a41fbfade3
 0 B / 1.25 KB [---------------------------------------------------------------]
Writing manifest to image destination
Storing signatures
% ./skopeo.new --policy ../skopeo/default-policy.json copy docker-archive:test:latest oci:test
Getting image source signatures
Copying blob sha256:011b303988d241a4ae28a6b82b0d8262751ef02910f0ae2265cb637504b72e36
FATA[0000] Error reading blob sha256:011b303988d241a4ae28a6b82b0d8262751ef02910f0ae2265cb637504b72e36: unknown blob sha256:011b303988d241a4ae28a6b82b0d8262751ef02910f0ae2265cb637504b72e36

To be honest, I'm not really sure where 011b303988d241a4ae28a6b82b0d8262751ef02910f0ae2265cb637504b72e36 is coming from...

This does appear to work with docker-daemon so I'm just going to read more of the code to see how it works:

./skopeo.old --policy ../skopeo/default-policy.json copy docker-daemon:test:latest docker-archive:test:latest
Getting image source signatures
Copying blob sha256:011b303988d241a4ae28a6b82b0d8262751ef02910f0ae2265cb637504b72e36
 0 B / 4.82 MB [---------------------------------------------------------------]
Copying config sha256:baa5d63471ead618ff91ddfacf1e2c81bf0612bfeb1daf00eb0843a41fbfade3
 0 B / 1.25 KB [---------------------------------------------------------------]
Writing manifest to image destination
Storing signatures

@cyphar
Copy link
Contributor Author

cyphar commented Nov 2, 2016

Scratch that, it's because sha256:3690ec4760f95690944da86dc4496148a63d85c9e3100669a318110092f6862f is compressed...

Copy link
Collaborator

@mtrmac mtrmac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

// Close removes resources associated with an initialized ImageDestination, if any.
func (d *archiveImageDestination) Close() {
d.writer.Close()
// FIXME: We don't clean up here if we errored out...
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

? It should be the responsibility of the ImageDestination user to always call Close.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking something about cleaning up temporary files we created (we create the tar file even if we fail to actually make a valid image). So if we return an error at some point we have a half-built archive.

}

// archiveReference is an ImageReference for Docker images stored on the local filesystem.
type archiveReference string // FIXME FIXME
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, the docker-daemon: ImageReference implementation is unfinished, and not a good model ATM. We will be fixing that soonish ( #143 ).

If the semantics of the docker-archive: references is that they are simply paths to tarballs, probably see what the dir: transport does for ValidatePolicyConfigurationScope, StringWithinTransport, PolicyConfiguration{Identity,Namespaces} etc.

Note that because these things interact with signature validation, the _transport.go files really should have tests covering every single line if possible.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add some tests once I've ported docker-daemon over to this refactored interface.

// NewImage returns a types.Image for this reference.
// The caller must call .Close() on the returned Image.
func (ref archiveReference) NewImage(ctx *types.SystemContext) (types.Image, error) {
panic("FIXME FIXME")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably

        src := newImageSource(ctx, ref)
        return image.FromSource(src)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe. I just copied this from docker-daemon.

@cyphar cyphar changed the title [wip] docker: add docker-archive transport docker: add docker-archive transport Nov 3, 2016
@cyphar
Copy link
Contributor Author

cyphar commented Nov 3, 2016

@mtrmac I've done a lot of cleanups and implemented quite a few things. Here's the list, maybe you can take another look (specifically to tell me what I need to implement for archive_transport.go -- though I don't think that there's any sense to "signing a docker-archive image" [because in Docker image signatures are stored in a notary setup]):

  • Fixed the previous compression issue -- everything works now within docker-archive. You can push and pull with skopeo to a docker-archive and then manually loading it into Docker confirms that the format is fine.
  • Massive cleanups of the code to use io.ReadSeeker and io.Writer when implementing the generic Docker image format code (namely dockerArchive.Image).
  • Ported docker-daemon to use dockerArchive, so now the code has been completely refactored out.
  • Implemented tarCache which allows us to implement all of the ImageSource code with just single file handle -- and also improving the O(n) lookup time to being O(1). It also makes the code nicer IMO.
  • I also implemented reference parsing using reference.Named, which is what we actually want to use. I also tried implementing some of the Policy things, but as I said before I'm not sure if it makes sense.

PTAL.

Copy link
Collaborator

@mtrmac mtrmac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks.

I’m afraid I have only made one pass through the code, so this is really not comprehensive. The way this PR mixes moving code around and various renames and refactorings makes it very non-obvious to determine how the original and the changed code differs, and after the ~2 hours I have spent on this today I need to stop. Do you by any chance have a working branch where the original effort was done in smaller commits?

I guess the two big items I have noticed are

  1. The seek+jump code is IMHO too fragile to be worth it (@runcom ?)

  2. AFAIK the silently symlink support is necessary

func (d *archiveImageDestination) SupportedManifestMIMETypes() []string {
// XXX: Should this be exposed in docker/archive?
return []string{
manifest.DockerV2Schema2MediaType, // FIXME: Handle others.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These comments are obsolete now, see https://github.com/mtrmac/image/tree/docker-daemon .

But overall it might be cleanest to keep a single implementation of these methods for both docker-archive: and docker-daemon:, perhaps by embedding the shared tar handler as an anonymous struct member.

return nil, "", err
}

// FIXME: Do we want to suppport manifest lists?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How? AFAICS the tar format can not do that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe it does support manifest lists. By construction the format is a manifest list, but it only contains one entry (there's some support in the daemon IIRC because people have created multi-arch images).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See the other comment on manifest lists; a set of manifests != application/vnd.docker.distribution.manifest.list.v2+json.

// XXX: We aren't caching this because GetManifest() specifically states
// that it can use a slow service. Caching makes this uglier (and also
// makes implementing read-write images harder).
m, err := s.img.GenerateManifest()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caching is useful not just for its performance; it by construction guarantees that the manifest won’t change just because some variable somewhere was mistakenly updated, or because serializing JSON may use a random order depending on heap state, or for any other reason.

(Read-write images? The linear tar format seems especially unsuitable for such functionality.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, I'll cache the output of this.

if cleaned != scope {
return fmt.Errorf("Invalid scope %s: Uses non-canonical format, perhaps try %s", scope, cleaned)
}
// XXX: Is this enough? Should we also make sure scope isn't a directory?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A scope which is a directory is very much intended: policy can be either defined for an individual image path (/home/mitr/my-precious.tar), or for a whole directory subtree (/mnt/nfs/our-company/trusted-images).

// WARNING: Do not use the return value in the UI to describe an image, it does not contain the Transport().Name() prefix;
// instead, see transports.ImageName().
func (ref archiveReference) StringWithinTransport() string {
// FIXME: Maybe we should be using explicitfilepath.ResolvePathToFullyExplicit?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That transformation should really happen already in NewReference; types.ImageReference are supposed to be immutable/have immutable semantics, and ResolvePathToFullyExplicit depends on filesystem contents.

// FIXME: This caching is ugly. The only reason we have it here is because
// the Docker image format isn't a CAS (you need to store a lookup
// table for diffID -> layerPath).
cache *tarCache
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

… and then tarCache would not be a separate object but just some members of the TarSource.


// Reset everything.
tc.offsets = map[string]int64{}
if _, err := tc.file.Seek(0, os.SEEK_SET); err != nil {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Considering that prime() is called exactly once, immediately after creating the object, it would be simpler to call prime() first, and only then return a new object with the results of prime(); then the questions of “resetting” and “prior state” disappear.

}

// prime regenerates all of the offset information in the tarCache.
func (tc *tarCache) prime() error {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, we can afford the electrons for descriptive names. collectFileOffsets, createFileOffsetMap or anything else would work just as well and be more explicit at the call site.

if (cur+hdr.Size)%512 > 0 {
lastEntry += tarBlockSize - (lastEntry % tarBlockSize)
}
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, no.

Overall, it is a nice idea to do one scan and then only seek, and of course it is asymptotically correct and maximally efficient, but when the archive/tar.Reader does not provide a suitable interface, reaching behind its back to see the offset of the underlying file and then adjusting it due to a knowledge of its internal implementation is way too fragile.

Meanwhile, the OS is perfectly capable of keeping a few kilobytes of data in memory, so the real cost of rescanning is primarily the syscall overhead, and we expect, well, half a dozen files for manifests or so, and one file per layer, about a dozen in total.

Without a stable maintainable interface to rely on, and with the risk of having to tinker with this every time archive/tar changes something about its buffering, this is just not worth the headache.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's disappointing. I'll switch it instead to doing a seek to the start then. 😢

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could actually implement this without "reaching behind tar.Reader's back". Rather than re-parsing the metadata you can just store the offset when the data portion starts (which is cur in the code you've quoted) and then also store the header information in the map (so you can just return the header and tar.Reader won't be reparsing it).

I mean, you'd have to implement tar.Reader's ReadCloser functionality but it shouldn't be that bad... I'll float this in a separate patch on top of your PR.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That’s still relying on undocumented implementation properties: There is no API promise that after tar.Reader.Next() returns, the underlying stream is positioned at the start of the content of the file; e.g. nothing prohibits tar.Reader from buffering input (e.g. reading it one st_blksize at a time instead of one tar block at a time).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sigh, yeah you're right. Okay, I'll just stick with using Seek(0, SEEK_SET).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(It does seem to me that there could be other users who would benefit from archive/tar.Reader adding support for this, either an explicit one like SaveStreamPosition/SeekToSavedPosition or a documented API promise that the underlying file position matches at specific points in time. But that would need to happen in golang upstream.)

}
return nil, nil, err
}
return hdr, tr, nil
Copy link
Collaborator

@mtrmac mtrmac Nov 3, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(At least validate that hdr.Name matches path? Anyway, moot.)

@mtrmac
Copy link
Collaborator

mtrmac commented Nov 3, 2016

@cyphar How about https://github.com/mtrmac/image/tree/split-tarfile for a starting point? Yes, it is also a big diff, but it obviously makes only trivial changes to the docker-daemon: implementation, other than removing most of it, and

git diff master:docker/daemon/daemon_src.go docker/tarfile/src.go
git diff master:docker/daemon/daemon_dest.go docker/tarfile/dest.go

are also very readable to show that this is only moving code around without semantic changes.

Any refactorings and improvements can then be added as separate commits, with confidence.

And hooking up the docker-archive implementation, with your existing work on the ImageTransport and ImageReference specifics, should be quite small.

(Disclaimer: confidence or not, completely untested and for all I know may be completely broken. Only to show a possible path forward.)

@cyphar
Copy link
Contributor Author

cyphar commented Nov 3, 2016

@mtrmac

How about https://github.com/mtrmac/image/tree/split-tarfile for a starting point? Yes, it is also a big diff, but it obviously makes only trivial changes to the docker-daemon: implementation, other than removing most of it, and

Yeah, that could work. The only issue with that patchset is that it mixes some policy decisions that are specific to docker-daemon into tarfile. But I can fix that if you like. The only question is how was I meant to know that that patchset existed 😉?

I can also switch this back to being a purely "implementation" patchset and then we can do all of my fixing / improvements later.

@mtrmac
Copy link
Collaborator

mtrmac commented Nov 3, 2016

Yeah, that could work. The only issue with that patchset is that it mixes some policy decisions that are specific to docker-daemon into tarfile.

I’m sorry, I haven’t noticed that docker-archive is making different policy decisions; or perhaps I don’t understand what you mean by policy in this context.

The only question is how was I meant to know that that patchset existed 😉?

It hadn’t existed; I created that as a proof-of-concept after writing the review, “only to show a possible path forward”.

I can also switch this back to being a purely "implementation" patchset and then we can do all of my fixing / improvements later.

Both a single PR and separate implementation/improvements PRs are perfectly fine with me; but, please, as much as possible, on idea per commit. A 10-commit PR is perfectly fine, one-liner commits are perfectly fine, a sequence of 3 commits doing 3 improvements on the same 10 lines is also fine. f4529ce , as a single idea, also works (though separating the interface changes and the implementation improvement would be even better). But comparing the removed code in 9404cfa with the fairly different implementation in `docker/archive/tar.go is difficult, and as you see I have missed some of the intentional changes you have made in that.

@cyphar
Copy link
Contributor Author

cyphar commented Nov 17, 2016

@mtrmac Sorry for not working on this recently (lots of fun exams). But I'm going to work on this next week (and over the weekend), basing it off your split-tarfile commit.

@mtrmac
Copy link
Collaborator

mtrmac commented Nov 23, 2016

FWIW I have updated https://github.com/mtrmac/image/tree/split-tarfile to apply on top of #165 ; it is still completely untested.

@cyphar
Copy link
Contributor Author

cyphar commented Nov 28, 2016

@mtrmac I've rebased on top of your changes. It does work but there's probably more stuff you want me to implement (especially with regards to the scoping code that I'm not sure I understand the interface for).

Copy link
Collaborator

@mtrmac mtrmac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is pretty close.

The policy scoping based on otherwise completely ignored Docker reference, and making that mandatory, is probably the biggest issue. One possible way to solve this would be to make the reference optional: Then, tarfileImageSource could just be used by specifying a path; and tarfileImageDestination would either fail when a Docker reference is missing, or it could possibly create a tarfile which specifies no RepoTags. (This would be a bit difficult to use with Docker, but it would be perfectly fine as an input to docker-archive: again.)

WRT policy scoping, I’d suggest basing it on paths and subdirectories, exactly the way dirTransport does it (in ValidatePCS, accept absolute no-symlink paths; in NewReference, resolve to an absolute no-symlink path, etc.)

Or, if you really don’t want to care, I it would be OK to punt and do what daemonTransport does: the identity is "", no other scopes are permitted. This would still make it implement a more expressive scope design in the future, without limiting it in any way, so I guess there’s little point in being a pain about it.

Ultimately, due to the impact on signatures, the new archive+transport.go should have tests with as close to 100% coverage as possible; that can of course wait after the design has been fleshed out, and can probably be cobbled together by copy&pasting tests from the most similar existing transport.

"github.com/containers/image/types"
)

// Transport is an ImageTransport for Docker registry-hosted images.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is no longer precise,

type tarfileReference struct {
ref reference.NamedTagged
path string
tag string
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If tag is always .ref.String(), it would probably be simpler to drop the duplicate tag member.

return nil, err
}
ref = reference.WithDefaultTag(ref)
refTagged := ref.(reference.NamedTagged)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ref can also be reference.Canonical, in which case this panics.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't .WithDefaultTag force the type to be .NamedTagged?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at https://github.com/containers/image/blob/master/docker/reference/reference.go#L141 , no: a reference.Canonical is IsNameOnly and is returned unchanged.

// a hostname-qualified reference.
// See https://github.com/containers/image/issues/72 for a more detailed
// analysis and explanation.
refString := fmt.Sprintf("%s:%s", namedTaggedRef.FullName(), namedTaggedRef.Tag())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this moved into the docker-daemon:-specific part? AFAICT exactly the same compatibility concerns apply to docker-archive: as well.

The commit message says “make tagging work properly”, so presumably the original is broken in some way, but I am not sure how.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code is wrong for docker-archive: transports, because the filename of a .tar archive is not related to what RepoTags you're going to export. There might be a way around it, but for now this code doesn't really belong in tarfile.Transport.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, but refString is not the file name. That is set by giving a tarfile.Destination an io.Writer, in our case from fh, err := os.OpenFile(ref.path, ….

tarfile.Destination.repoTag is only used to set the RepoTags field inside the manifest in the exported tar file (which Docker uses to decide what tags, if any, to use for the docker load-ed tarball). This has, indeed, no relationship with the archive filename, so the filename:_referenceformat makes sense for specifyingRepoTag; but, _independently of that_, the formatting using FullNameis useful to make the projectatomic/docker (i.e. Fedora, CentOS, RHEL) versions update tag consistently withdocker pull`.

Now, in the case of docker-archive: it is not as obvious that the tag behavior should be consistent with docker pull, the way it is fairly clear for docker-daemon:, but, on balance, I still think it might be more useful to create tarballs which use hostname-qualified names (using the projectatomic/docker terminology). OTOH there is also a fairly strong argument that when the qualified/unqualified tags have different semantics, docker-archive: should just let the user use either of them without interfering.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is now in docker/tarfile.

return tarfileReference{
ref: refTagged,
path: path,
tag: fmt.Sprintf("%s:%s", refTagged.Name(), refTagged.Tag()),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense to require a tag/Docker reference even when specifying a source, which completely ignores it?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

… and even more, does it make sense to base a PolicyConfigurationIdentity on this ignored value?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently we only support a single image inside a docker-archive, but there's no reason this can't be expanded upon in later versions. The tag references what image from that .tar you want to stream from.

If you still think it doesn't make sense we can drop it.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case, would it make sense to enforce that the tag must match? (I realize that it would be an extra code which is not adding new functionality right now, but it would also make sure that nobody writes incorrect shell scripts which would later be broken by adding enforcement.)

[Returning to this too late, I apologize if this no longer makes sense given other context.]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've made it so that only destinations have tags. We can expand this later.

}

// tagOrDigest returns a tag or digest from the reference.
func (ref tarfileReference) tagOrDigest() (string, error) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is unused.

@@ -0,0 +1,3 @@
// Package tarfile is an internal implementation detail of some transports.
// Do not use outside of the github.com/containers/image repo!
package tarfile
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

… so, I'd suggest moving the docker-archive: transport into a separate containers/image/docker/archive package.

(docker:, i.e. docker/docker_image_*, should also have been a separate subpackage, and eventually we’ll fix that).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough.

@@ -22,6 +23,7 @@ func init() {
for _, t := range []types.ImageTransport{
directory.Transport,
docker.Transport,
tarfile.Transport,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider adding a smoke test for the naming to transports_test.go. Not strictly necessary.

@runcom
Copy link
Member

runcom commented Dec 1, 2016

@cyphar how is this working with Docker archives that contains multiple images? I can't find anything, could you point me to that? Cause Docker archives can contain multiple images specified in the repositories file.

needs a rebase though

@mtrmac
Copy link
Collaborator

mtrmac commented Dec 1, 2016

how is this working with Docker archives that contains multiple images?

It has inherited https://github.com/containers/image/pull/148/files#diff-e5badaf0bc3c9d10e866102bfd533457L236 / https://github.com/containers/image/pull/148/files#diff-7af67cd96906421aeee11932f166d1f2R180 from the docker-daemon: code, rejecting any multi-image tarballs.

(Because the reference format supports a ref:tag for destinations, sources could use this to choose a single image from a multi-image tarball. But that can definitely be a separate PR, if anyone needs that functionality at all; just insisting on a single-image tarball works perfectly fine.)

@runcom
Copy link
Member

runcom commented Dec 1, 2016

(Because the reference format supports a ref:tag for destinations, sources could use this to choose a single image from a multi-image tarball. But that can definitely be a separate PR, if anyone needs that functionality at all; just insisting on a single-image tarball works perfectly fine.)

fair enoiugh, worth creating an issue for multi images tarballs though as I think people will come to us asking for that

@cyphar
Copy link
Contributor Author

cyphar commented Dec 20, 2016

/me is rebasing. Lots of changes to docker-daemon so it's taking me a while. Sorry about the delay. I'll get to your points once I can actually build my branch on master again.

@cyphar
Copy link
Contributor Author

cyphar commented Dec 21, 2016

Alright. Tests aren't failing anymore so I can now go about implementing @mtrmac's requests.

@cyphar
Copy link
Contributor Author

cyphar commented Dec 24, 2016

@mtrmac I've covered all of your comments. The tagging has been fixed, so now docker-archive and docker-daemon produce identical repoTags. In fact, the only difference is that docker-archive requires a prefix of the path:. I also made the reference optional for sources (and will emit a warning because the code is not used at the moment).

PTAL. I will squash all of the "squashme" commits as soon as you're happy with the state.

@cyphar
Copy link
Contributor Author

cyphar commented Dec 27, 2016

Issues with OCI -> docker-archive conversion are fixed by #193.

giuseppe pushed a commit to giuseppe/image that referenced this pull request Jan 24, 2017
@cyphar
Copy link
Contributor Author

cyphar commented Feb 1, 2017

This is currently blocked on #193.

@cyphar
Copy link
Contributor Author

cyphar commented Feb 7, 2017

Actually this isn't necessarily blocked on #193. This can be merged now and then the #193 fixes can be applied later.

@mtrmac Can we please merge this. I'm getting very tired of having to rebase this on top of minor changes in docker/daemon (which then break rebasing and I have to manually review a bunch of changes).

@mtrmac
Copy link
Collaborator

mtrmac commented Feb 7, 2017

Actually this isn't necessarily blocked on #193. This can be merged now and then the #193 fixes can be applied later.

@mtrmac Can we please merge this. I'm getting very tired of having to rebase this on top of minor changes in docker/daemon (which then break rebasing and I have to manually review a bunch of changes).

ACK, #193 can be done separately. And yeah, I’ve been maintaining the split-tarfile branch as well and I’d like to stop :) Taking a look…

Copy link
Collaborator

@mtrmac mtrmac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, one more rebase, please, due to #221 . You might be able to use the rebased https://github.com/mtrmac/image/tree/split-tarfile .

I will write the necessary _transport.go tests now…


// A :tag was specified, which is only necessary for destinations.
if len(parts) == 2 {
ref, err := reference.ParseNamed(parts[1])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will as of #221 need to be reference.ParseNormalizedNamed.

if !isTagged {
// Really shouldn't be hit...
return nil, errors.Errorf("docker-archive reference %s has ':' but reference isn't tagged", refString)
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this really shouldn’t be hit, because of the WithDefaultTag and because we now know !isDigest. The : in refString does not matter because we were parsing only parts[1].

It’s fine to say internal error: reference is not tagged even after reference.WithDefaultTag or something like that.

// It is STRONGLY recommended for the first element, if any, to be a prefix of PolicyConfigurationIdentity(),
// and each following element to be a prefix of the element preceding it.
func (ref archiveReference) PolicyConfigurationNamespaces() []string {
return policyconfiguration.DockerReferenceNamespaces(ref.destinationRef)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This also needs a check for ref.destinationRef == nil.

(Alternatively, perhaps ignore destinationRef in here as well, and return "" in …Identity and []string{} in …Namespaces, at least for now; the policy is, at least currently, only used on sources, not destinations.)

func (ref archiveReference) NewImageDestination(ctx *types.SystemContext) (types.ImageDestination, error) {
if ref.destinationRef == nil {
return nil, errors.Errorf("docker-archive: destination reference not supplied (must be of form <path>:<reference:tag>)")
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check could be moved into newImageDestination, so that the check and the code which relies on that check is in the same file.

@@ -34,6 +34,8 @@ func TestImageNameHandling(t *testing.T) {
{"docker", "//busybox:notlatest", "//busybox:notlatest"}, // This also tests handling of multiple ":" characters
{"docker-daemon", "sha256:0123456789abcdef0123456789abcdef0123456789abcdef0123456789abcdef", "sha256:0123456789abcdef0123456789abcdef0123456789abcdef0123456789abcdef"},
{"docker-daemon", "busybox:latest", "busybox:latest"},
{"docker-archive", "/var/lib/oci/busybox.tar:busybox:latest", "/var/lib/oci/busybox.tar:busybox:latest"},
{"docker-archive", "busybox.tar:busybox:latest", "busybox.tar:busybox:latest"},
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I expect these test cases to fail now; either update them, or perhaps use FamiliarString in the StringWithinTransport method.)

if err := d.writer.Close(); err != nil {
return err
}
return nil
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(The above 4 lines can be simplified to return d.writer.Close() but this is perfectly fine as well.)

func newImageDestination(ctx *types.SystemContext, ref archiveReference) (types.ImageDestination, error) {
fh, err := os.OpenFile(ref.path, os.O_WRONLY|os.O_EXCL|os.O_CREATE, 0644)
if err != nil {
// FIXME: It should be possible to modify archiges, but the only really
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/archiges/archives/

Copy link
Collaborator

@mtrmac mtrmac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As promised, tests for transport.go are available as (the last commit in) https://github.com/mtrmac/image/tree/docker-archive-tests . Do note the FIXMEs.

Below pointing out a few things the tests have uncovered.

src, err := newImageSource(ctx, ref)
if err != nil {
return nil, err
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error value returned by newImageSource is always nil; please drop that return value, and then we can drop these three untestable lines. (We can always re-add them if it became necessary in the future.)


// DeleteImage deletes the named image from the registry, if supported.
func (ref archiveReference) DeleteImage(ctx *types.SystemContext) error {
return os.Remove(ref.path)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is this supposed to interact with the destinationRef? It seems that it would make sense to remove only the image indicated by destinationRef, not the whole multi-image tarball. I suppose we can’t easily do that now, which is fine—but then it might be better to just return an error in here instead of doing something which will become dangerous in the future.

_, isTagged := ref.(reference.NamedTagged)
_, isDigest := ref.(reference.Canonical)

if isDigest {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(These tests can be simplified to if _, ok := ref.(reference.Canonical); ok {… and the like, eliminating the two variables. But this works as well.)

// ParseReference converts a string, which should not start with the ImageTransport.Name prefix, into an Docker ImageReference.
func ParseReference(refString string) (types.ImageReference, error) {
parts := strings.SplitN(refString, ":", 2)
if len(parts) == 0 {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is never true: parts will always contain at least an empty string.

@cyphar
Copy link
Contributor Author

cyphar commented Feb 17, 2017

I've rebased and fixed your comments @mtrmac. The build failures are for the same reasons discussed in #193.

Copy link
Collaborator

@mtrmac mtrmac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only minor comments now (most importantly it should be buildable :) )

Updated https://github.com/mtrmac/image/tree/docker-archive-tests , please add the last commit to this PR.

// SupportsSignatures returns an error (to be displayed to the user) if the destination certainly can't store signatures.
// Note: It is still possible for PutSignatures to fail if SupportsSignatures returns nil.
func (d *Destination) SupportsSignatures() error {
return errors.Errorf("Storing signatures for docker tarfile: destinations is not supported")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The transport: form in error messages usually refers to the transports.ParseImageName syntax. Please s/tarfile: destinations/tar files/, and similarly update the other tarfile: error m essages.

// DeleteImage deletes the named image from the registry, if supported.
func (ref archiveReference) DeleteImage(ctx *types.SystemContext) error {
// Not really supported, for safety reasons.
return nil
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then this should fail instead of pretending to succeed. Something like return errors.New("Deleting images not implemented for docker-archive: images")

if err := d.Destination.Commit(); err != nil {
return err
}
return d.writer.Close()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Calling io.Closer.Close() twice (once in Commit and once in Close) is undefined in general, and for an os.File it seems the second call would reliably fail. That’s fine for now, but if archiveImageDestination starts returning an error, this will need more logic to avoid a double close (~ the daemon_dest.go committed flag).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you want, I can remove .Close from .Commit. IMO it's a bit silly for us to protect against a user of this library from doing both .Close and .Commit -- if they aren't meant to be mutually exclusive operations then they shouldn't both try to do the same thing.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(The .Commit() and .Close() aren’t meant to be mutually exclusive. Close() destroys temporary state like file handles and temporary files; Commit() tells the destination “now you have everything, you can mark the image as finished / start transmitting the result as a single unit”.

See how daemonImageDestination terminates its helper goroutine in Close() (to ensure that it never lingers around), but correctly closes the generated tar file (making it acceptable to dockerd so that the input isn’t all thrown away) in Commit().

The expected pattern for a writer is to do

dest, … := someRef.NewImageDestination()
defer dest.Close() // Or perhaps a more thorough variant with error propagation
// steps which may fail, e.g. reading input
… dest.PutBlob(…)
// other steps which may fail, including other writes to dest
// we get here only if everything has succeeded so far
err = dest.Commit()

i.e. the caller can do a blind defer dest.Close(). If the two were expected to be exclusive, the calller would have to do something like

succeeded = false
dest, … : someRef.NewImageDestination()
defer func() {
  if succeeded { dest.Commit() } else { dest.Close() }
}
// steps which may fail, e.g. reading input
… dest.PutBlob(…)
// other steps which may fail, including other writes to dest
// we get here only if everything has succeeded so far
succeeded = true

which could make the ImageDestination implementations simpler, but it would be notably more cumbersome for the caller. Especially because so many ImageDestination have no state to speak of and their .Close() is empty, it seems overall simpler to force the ImageDestination implementations—only those who care—to track success/failure, like daemonImageDestination.committed does, than to force all writers to track this.

Perhaps I should make the comments in types.go documenting ImageDestination and ImageDestination.Close more explicit about this?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've removed the d.writer.Close from Commit in docker/archive/dest because d.Destination.Commit will "save" the contents of the tar-file. But the actual closing of the file will happen in d.writer.Close.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we need to change this later, I can make a PR for it. But I don't think the way it's done right now is going to output truncated tar files.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAICT the current way (tar.Close() in Commit(), writer.Close() in Close()) is fine.

// FIXME? We could be verifying the various character set and length restrictions
// from docker/distribution/reference.regexp.go, but other than that there
// are few semantically invalid strings.
return nil
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there are no namespaces defined, this should fail the way daemonTransport.ValidatePolicyConfigurationScope does.

// ParseReference converts a string, which should not start with the ImageTransport.Name prefix, into an Docker ImageReference.
func ParseReference(refString string) (types.ImageReference, error) {
parts := strings.SplitN(refString, ":", 2)
if refString == "" {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(It’s not obvious to me that we need this check, an empty path will fail sooner or later, but if it exists, it seems cleaner to place it before the computation of parts. This does work though.)

return nil, errors.Errorf("internal error: reference is not tagged even after reference.TagNameOnly: %s", refString)
}

destinationRef = ref.(reference.NamedTagged)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for not noticing ealier, please do

refTagged, isTagged := ref.(reference.NamedTagged)
if !isTagged { /* fail */ }
destinationRef = refTagged

to make it absolutely that the conversion is valid; this way has the cast done two times and it may get out of sync.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand how a cast can get out of sync (variable types don't change randomly). But sure, I'll change this.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(This could get out of sync if the type check were modified for some reason without noticing that the destinationRef = ref.(reference.NamedTagged) cast depends on it. I agree the function is small enough that it was not terribly likely that the two pieces of code would get that far apart.

But when we can have the compiler do a 100%-reliable verification cheaply enough, it seems a waste not to take advantage of it.)


import (
"fmt"
"os"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

docker/archive/transport.go:5: imported and not used: "os"

@runcom
Copy link
Member

runcom commented Apr 3, 2017

Ping @cyphar

@cyphar
Copy link
Contributor Author

cyphar commented Apr 4, 2017

Rebased and fixed most of @mtrmac's comments. I'll squash it all once it gets LGTM'd.

@runcom
Copy link
Member

runcom commented Apr 4, 2017

@mtrmac could you take another look at this?

tarfile.Source partially implements ImageSource reading from any regular
file, tarfile.Destination partially implements ImageDestination writing
to any stream.

The interactions with the Docker engine API remain in the docker-daemon:
transport.

Does not change behavior, apart from a few error messages now
generically talking about “docker tar files” instead of docker-daemon: .

[@cyphar: Rebased and re-did the splitting over several rebases.]

Signed-off-by: Miloslav Trmač <mitr@redhat.com>
Signed-off-by: Aleksa Sarai <asarai@suse.de>
@cyphar
Copy link
Contributor Author

cyphar commented Apr 4, 2017

Rebased (again).

}

if inputInfo.Size == -1 { // Ouch, we need to stream the blob into a temporary file just to determine the size.
logrus.Debugf("docker tarfile: input with unknown size, streaming to disk first ...")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/tarfile:/tar file/, but in a debug message is absolutely NOT blocking.

return nil, errors.Errorf("internal error: reference is not tagged even after reference.TagNameOnly: %s", refString)
}

destinationRef = ref.(reference.NamedTagged)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(This could get out of sync if the type check were modified for some reason without noticing that the destinationRef = ref.(reference.NamedTagged) cast depends on it. I agree the function is small enough that it was not terribly likely that the two pieces of code would get that far apart.

But when we can have the compiler do a 100%-reliable verification cheaply enough, it seems a waste not to take advantage of it.)

Copy link
Collaborator

@mtrmac mtrmac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK, thanks!

Please cherry-pick the last commit from (now updated) https://github.com/mtrmac/image/tree/docker-archive-tests , squash as needed (but please keep the “split tarfile code from docker-daemon code” part a separate commit) and I’ll merge.

// not required/guaranteed that it will be a valid input to Transport().ParseReference().
// Returns "" if configuration identities for these references are not supported.
func (ref archiveReference) PolicyConfigurationIdentity() string {
// Punt, the justification is similar to dockerReference.PolicyConfigurationIdentity.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/dockerReference/daemonReference/ I think.

cyphar and others added 2 commits April 8, 2017 18:51
This archive allows you to export a docker-load-friendly archive without
needing a Docker daemon to facilitate the exporting. Effectively this is
just an on-disk version of the docker-daemon transport (they both use
docker/tarfile).

Signed-off-by: Aleksa Sarai <asarai@suse.de>
Signed-off-by: Miloslav Trmač <mitr@redhat.com>
Signed-off-by: Aleksa Sarai <asarai@suse.de>
@cyphar
Copy link
Contributor Author

cyphar commented Apr 8, 2017

Squashed + rebased. I can make a follow-up PR to clean up the non-blocking concerns you had.

@runcom
Copy link
Member

runcom commented Apr 8, 2017

LGTM assuming tests pass

Approved with PullApprove

@mtrmac
Copy link
Collaborator

mtrmac commented Apr 10, 2017

👍

Thanks!

Approved with PullApprove

@mtrmac mtrmac merged commit 0d8bb75 into containers:master Apr 10, 2017
@cyphar cyphar deleted the implement-docker-imagefile-transport branch April 10, 2017 14:43
@cyphar
Copy link
Contributor Author

cyphar commented Apr 10, 2017

🎉 Now to add the patches to skopeo. 😉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants