New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce the sync command #524

Open
wants to merge 3 commits into
base: master
from

Conversation

Projects
None yet
@flavio
Contributor

flavio commented Jul 13, 2018

The skopeo sync command can sync images between a SOURCE and a destination.

The purpose of this command is to assist with the mirroring of container images from different docker registries to a single docker registry.

Right now the following transport matrix is implemented:

  • docker:// -> docker://
  • docker:// -> dir:
  • dir: -> docker://

The dir: transport is supported to handle the use case of air-gapped environments.
In this context users can perform an initial sync on a trusted machine connected to the internet; that would be a docker:// -> dir: sync.
The target directory can be copied to a removable drive that can then be plugged into a node of the air-gapped environment. From there a dir: -> docker:// sync will import all the images into the registry serving the air-gapped environment.

The image namespace is changed during the docker:// to docker:// or dir: copy. The FQDN of the registry hosting the image will be added as new root namespace of the image. For example, the image registry.example.com/busybox:latest will be copied to registry.local.lan/registry.example.com/busybox:latest.

The image namespace is not changed when doing a
dir: -> docker:// sync operation.

The alteration of the image namespace is used to nicely scope images coming from different registries (the Docker Hub, quay.io, gcr, other registries). That allows all of them to be hosted on the same registry without incurring in clashes and making their origin explicit.

TODO

I hope you like this feature and the direction it's going. Once we agree on its final design we will update this PR to extend the current test suites.

Future work

Currently sync will keep adding missing content from SOURCE to DESTINATION. It would be nice to add a --delete flag to remove from DESTINATION contents that are no longer available inside of SOURCE. That would be a bit like rsync's --delete` flag.

If wanted, that should be addressed with a separate PR.

Signed-off-by: Flavio Castelli fcastelli@suse.com
Co-authored-by: Marco Vedovati mvedovati@suse.com

@runcom

This comment has been minimized.

Member

runcom commented Jul 13, 2018

@mtrmac PTAL as well

@flavio flavio force-pushed the SUSE:sync branch from 3ae6c66 to 65bf6bc Jul 13, 2018

@marcov marcov force-pushed the SUSE:sync branch from 8f4777c to 4133e72 Jul 20, 2018

@vrothberg

This comment has been minimized.

Member

vrothberg commented Aug 8, 2018

@runcom @rhatdan @mtrmac friendly ping. I'd love to see this feature in an upcoming Skopeo release.

@giuseppe

I've added some comments. @mtrmac PTAL

type sourceCfg map[string]registrySyncCfg
func newSourceConfig(yamlFile string) (cfg sourceCfg, err error) {

This comment has been minimized.

@giuseppe

giuseppe Aug 16, 2018

Member

extra empty line

return false, fmt.Errorf("Cannot find 'docker' transport type")
}
dirTransport := transports.Get("dir")

This comment has been minimized.

@giuseppe

giuseppe Aug 16, 2018

Member

could we add to validTransports only the ones that are present? i.e. if dirTransport is missing we just skip it and not add it to validTransports as anyway the docker -> docker case doesn't need it.

err)
}
} else {
return nil, fmt.Errorf("Error while checking existance of directory %s: %v",

This comment has been minimized.

@giuseppe

giuseppe Aug 16, 2018

Member

typo here: existance -> existence

When DESTINATION is a local directory one directory per 'image:tag' is going
to be created.
`),
ArgsUsage: "[--source SOURCE] [--source-file SOURCE-FILE] DESTINATION",

This comment has been minimized.

@giuseppe

giuseppe Aug 16, 2018

Member

I see why we have two different options, but it looks a bit unnatural to add a mandatory argument in this way.

Could it be like "[--source-yaml] SOURCE DESTINATION"

if --source-yaml is specified then SOURCE points to a YAML file.

Not ideal either, but in the simplest case you have skopeo sync docker://... docker://....

}
logrus.Warn("Registry disallows tag list retrieval; skipping")
}
}

This comment has been minimized.

@TomSweeneyRedHat

TomSweeneyRedHat Aug 16, 2018

Contributor

Maybe I need more tea, but should you return 'tags, nil' here and/or at line 152?

This comment has been minimized.

@TomSweeneyRedHat

TomSweeneyRedHat Aug 16, 2018

Contributor

Oh shoot, forgot golang deals with bare returns. This code is OK as is, but I'd prefer specific values being returned. I find it more readable. My $.02, your mileage may vary.

removeSignatures := c.Bool("remove-signatures")
//TODO: should we assume that's our default manifest type?
manifestType := manifest.DockerV2Schema2MediaType

This comment has been minimized.

@TomSweeneyRedHat

TomSweeneyRedHat Aug 16, 2018

Contributor

I think so, but I'd prefer a way to pass in a different variant. But I'd lean on @mtrmac or @runcom for the final word.

This comment has been minimized.

@mtrmac

mtrmac Aug 16, 2018

Collaborator

Is there a reason for “sync” to prescribe manifest types at all? Just preserve the original form by default, and/or let the existing autodetection of what a destination docker:// registry supports, and automatic conversions, work.

At least in the default configuration. I wouldn’t even add an option to set ForceManifestMIMEType at all unless there were a known an unavoidable need for it.

This comment has been minimized.

@TomSweeneyRedHat

TomSweeneyRedHat Aug 16, 2018

Contributor

My thinking was just to make the code more flexible if ever needed. @mtrmac if you think that's not necessary or the cost isn't worth it, I'm fine with that. Just preserving the manifest as you suggested makes sense to me too.

This comment has been minimized.

@mtrmac

mtrmac Aug 16, 2018

Collaborator

I’m weakly against exposing options just because the underlying code implements them: it takes a possibly unnecessary effort, and after the options are introduced, it’s difficult to both remove them and to gauge whether they have been ever necessary in the first place.

But in this case, I’m primarily saying that the default behavior should, unless there is a specific reason, not be hard-coding manifest types at all. The existing autodetection is expected to just work, and it is future-proof, unlike hard-coding a specific default which is certain to become obsolete at some point.

(eg: dir:///media/usb/).
SOURCE-FILE is a YAML file with a set of source images from different
docker registry. Local directory are not supported.

This comment has been minimized.

@TomSweeneyRedHat

TomSweeneyRedHat Aug 16, 2018

Contributor

Suggest:
SOURCE-FILE is a YAML file with a set of source images from a different

Then either:
docker registry. Local directory is not supported.

or
docker registry. Local directories are not supported.

I prefer the second.

Copy all the images from _source_ (or _source-file_) to _destination_.
Useful to keep in sync a local docker registry mirror. It can be used to populate also registries running inside of air-gapped environments.

This comment has been minimized.

@TomSweeneyRedHat

TomSweeneyRedHat Aug 16, 2018

Contributor

Suggest:
Useful to keep in sync with a local docker registry mirror. It can also be used to populate registries running inside of air-gapped environments.

_source_ can be either a repository hosted on a docker registry (eg: docker://docker.io/busybox) or a local directory (eg: dir:///media/usb/).
_source-file_ is a YAML file with a set of source images from different docker registry. Local directory are not supported.

This comment has been minimized.

@TomSweeneyRedHat

TomSweeneyRedHat Aug 16, 2018

Contributor

Suggest: "different docker" to "a different docker"

_destination_ can be either a docker registry (eg: docker://my-registry.local.lan) or a local directory (eg: dir:///media/usb).
When _destination_ is a local directory one directory per 'image:tag' is going to be created.

This comment has been minimized.

@TomSweeneyRedHat

TomSweeneyRedHat Aug 16, 2018

Contributor

Suggest:
"is going to be" to "will be"

**--src-cert-dir** _path_ Use certificates at _path_ (*.crt, *.cert, *.key) to connect to the source registry or daemon

This comment has been minimized.

@TomSweeneyRedHat

TomSweeneyRedHat Aug 16, 2018

Contributor

rm extra line.

images:
coreos/etcd:
- latest
```
# SEE ALSO
kpod-login(1), docker-login(1)

This comment has been minimized.

@TomSweeneyRedHat

TomSweeneyRedHat Aug 16, 2018

Contributor

I know this isn't part of this change, but this should be changed to :
"podman-login(1), docker-login(1)

Copy all the images from SOURCE to DESTINATION.
Useful to keep in sync a local docker registry mirror. Can be used

This comment has been minimized.

@rhatdan

rhatdan Aug 16, 2018

Member

s'/docker/container/g'

Useful to keep in sync a local docker registry mirror. Can be used
to populate also registries running inside of air-gapped environments.
SOURCE can be either a repository hosted on a docker registry

This comment has been minimized.

@rhatdan

rhatdan Aug 16, 2018

Member

s'/docker/container/g'

Skopeo sync will copy all the tags of an image when SOURCE uses the
'docker://' transport and no tag is specified.
DESTINATION can be either a docker registry

This comment has been minimized.

@rhatdan

rhatdan Aug 16, 2018

Member

s'/docker/container/g'

@rhatdan

This comment has been minimized.

Member

rhatdan commented Aug 16, 2018

I would like to see this a little less "docker' centric. Perhaps use some examples other then docker.io. registry.example.com or some such.

**skopeo sync** will copy all the tags of an image when _source_ uses the docker://' transport and no tag is specified.
_destination_ can be either a docker registry (eg: docker://my-registry.local.lan) or a local directory (eg: dir:///media/usb).

This comment has been minimized.

@mtrmac

mtrmac Aug 16, 2018

Collaborator

The PR introduction says

The image namespace is changed during the docker:// to docker:// or dir: copy. The FQDN of the registry hosting the image will be added as new root namespace of the image. For example, the image registry.example.com/busybox:latest will be copied to registry.local.lan/registry.example.com/busybox:latest

This inflexibly hard-codes policy, and seems entirely unnecessary: whatever script is used for the regular sync can just specify registry.local.lan/registry.example.com/busybox as the destination.

Also, this policy forces sync to invent an entirely new docker://hostname syntax, which is ambiguous with the image name syntax used everywhere else, and has a different meaning (the image docker://hostname == docker://docker.io/library/hostname:latest).

… and AFAICT it s not documented in this man page.

This comment has been minimized.

@marcov

marcov Aug 24, 2018

Contributor

Some kind of scoping is necessary when running sync from a YAML file, as you may be pulling the same image name from different namespaces or domains.

The reason to have this kind of scoping is that the original image location can be easily extrapolated, and this information is useful when running a server that is configured to mirror multiple registries.

Maybe the requirement can be dropped when source references to a docker:// location (and a flag can be added to make the current behaviour optional)

// if the tag is `latest` -> the tag has been automatically added -> no tag
// was specified by the user
return !strings.HasSuffix(reference.TagNameOnly(normName).String(), ":latest"), nil

This comment has been minimized.

@mtrmac

mtrmac Aug 16, 2018

Collaborator

(I’m sorry about not looking at this for so long. I do hope to make a proper review soonish.)

One general comment: podman and buildah have similarly built fair amount of code around manipulating the string forms of image names, and have been rather struggling to correctly handle the various corner cases and ambiguous forms of input.

I very strongly want to avoid introducing such fragility to skopeo.

As a very local case, this should not be using .String() and then HasSuffix to parse it, but an interface conformance test against reference.NamedTagged. [Or maybe something else entirely, to also handle digested references.]

Most of the string parsing and formatting in here looks, at a first glance, like code that should not be using strings (though I haven’t read enough of the PR to be know that it is actually avoidable, to be fair.)

This comment has been minimized.

@marcov

marcov Aug 24, 2018

Contributor

Thanks, it turned out that most the raw strings manipulation could be avoided by rewriting some parts of the code and making use of containers/image libraries.

@marcov marcov force-pushed the SUSE:sync branch 4 times, most recently from 30faa1e to 703b644 Aug 24, 2018

@rhatdan

This comment has been minimized.

Member

rhatdan commented Sep 21, 2018

@mtrmac @runcom Can we merge this?

@mtrmac

This comment has been minimized.

Collaborator

mtrmac commented Sep 26, 2018

I’m afraid I still haven’t actually reviewed this. My fault entirely.

@caiobegotti

This comment has been minimized.

caiobegotti commented Sep 27, 2018

Hi guys, I am currently writing a bunch of scripts to do exactly that for our Kubernetes cluster running on an air-gapped place, then I discovered this PR 👍 ...any chance of getting this reviewed or even merged soon?

@chilicat

This comment has been minimized.

chilicat commented Oct 3, 2018

The image namespace is changed during the docker:// to docker:// or dir: copy. The FQDN of the registry hosting the image will be added as new root namespace of the image. For example, the image
registry.example.com/busybox:latest will be copied to
registry.local.lan/registry.example.com/busybox:latest.

I would prefer to not have the original registry name as namespace in the destination registry. The source registry might be a internal registry where the name changes over time or where we want to hide the actual name on the customer side.

Maybe that is something which could be configurable?

@flavio

This comment has been minimized.

Contributor

flavio commented Oct 4, 2018

I would prefer to not have the original registry name as namespace in the destination registry. The
source registry might be a internal registry where the name changes over time or where we want to
hide the actual name on the customer side.

Maybe that is something which could be configurable?

We could add a --dest-prefix flag to allow something like that:

skopeo sync --dest-prefix acme-inc docker://wip-registry.lan/busybox docker://example.com 

That would copy all the busybox images from wip-registry.lan/busybox to example.com/acme-inc/busybox.

We should also cover this option inside of the source yaml file. Let's take this source.yaml as an example:

docker.io:
    images:
        busybox: []
wip-registry:
    dest-prefix: acme-inc
    images:
        coreos/etcd:
            - latest

Let's assume we run this command:

skopeo sync --source-yaml source.yaml docker://target.com

By this point, inside of the target.com registry, we would have the following images:

  • docker.io/busybox -> docker pull target.com/docker.io/busybox
  • acme-inc/coreos/etcd -> docker pull target.com/acme-inc/coreos/etcd

Would that work with you?

How should we proceed if everybody agrees with this design? Should we get this PR merged and then implement this new flag via a new PR (to limit the amount of code to be reviewed)?

@marcov marcov force-pushed the SUSE:sync branch from a89d5fc to 5b89bc0 Oct 4, 2018

@marcov

This comment has been minimized.

Contributor

marcov commented Oct 4, 2018

Rebased and added a few integration tests for sync. Run with: make test-integration TESTFLAGS="-check.f Sync"

@vrothberg

This comment has been minimized.

Member

vrothberg commented Nov 5, 2018

@marcov, could you rebase this change on master?

@containers/image-maintainers, can we revive this PR? We're forced to run this downstream but would prefer to have a non-patched skopeo :)

If **--source-yaml** is specified, then _source_ points to a YAML file with a list of source images from different container registries (local directories are not supported).
When the source location is a container registry and no tags are specifieds, **skopeo sync** will copy all the tags associated to the source image.

This comment has been minimized.

@muff1nman

muff1nman Nov 5, 2018

Small typo here specifieds

This comment has been minimized.

@marcov

marcov Nov 5, 2018

Contributor

@muff1nman thanks for the review!

@marcov marcov force-pushed the SUSE:sync branch 3 times, most recently from 4ccf14d to 560d4ef Nov 5, 2018

@marcov

This comment has been minimized.

Contributor

marcov commented Nov 5, 2018

@vrothberg Rebased, and also (hopefully) fixed a problem with integration tests failing when repeating the setup of a openshift cluster.

},
cli.BoolFlag{
Name: "source-yaml",
Usage: "Interpret SOURCE as a YAML file with a list of images from different container regitries",

This comment has been minimized.

@muff1nman

muff1nman Nov 5, 2018

Typo with regitries

@marcov marcov force-pushed the SUSE:sync branch 4 times, most recently from 498ed73 to 4bf0ddb Nov 5, 2018

@marcov marcov force-pushed the SUSE:sync branch 6 times, most recently from 63efc4c to 0e445e8 Nov 13, 2018

flavio and others added some commits Jul 9, 2018

Introduce the sync command
The skopeo sync command can sync images between a SOURCE and a
destination.

The purpose of this command is to assist with the mirroring of
container images from different docker registries to a single
docker registry.

Right now the following transport matrix is implemented:

  * `docker://` -> `docker://`
  * `docker://` -> `dir:`
  * `dir:` -> `docker://`

The `dir:` transport is supported to handle the use case
of air-gapped environments.
In this context users can perform an initial sync on a trusted machine connected
to the internet; that would be a `docker://` -> `dir:` sync.
The target directory can be copied to a removable drive that can then be plugged
into a node of the air-gapped environment. From there a `dir:` -> `docker://`
sync will import all the images into the registry serving
the air-gapped environment.

The image namespace is changed during the  `docker://`
to `docker://` or `dir:` copy. The FQDN of the registry
hosting the image will be added as new root namespace of
the image. For example, the image `registry.example.com/busybox:latest`
will be copied to
`registry.local.lan/registry.example.com/busybox:latest`.

The image namespace is not changed when doing a
`dir:` -> `docker://` sync operation.

The alteration of the image namespace is used to nicely scope images
coming from different registries (the Docker Hub, quay.io, gcr,
other registries). That allows all of them to be hosted on the same
registry without incurring in clashes and making their origin explicit.

Signed-off-by: Flavio Castelli <fcastelli@suse.com>
Co-authored-by: Marco Vedovati <mvedovati@suse.com>
openshift cluster: remove .docker directory on teardown
Remove the $HOME/.docker directory when tearing down a cluster,
so that subsequent cluster creations can be carried out successfully.

Signed-off-by: Marco Vedovati <mvedovati@suse.com>
Add sync integration tests
Add sync integration tests, covering all the possible combinations of
SOURCE,DEST.

Signed-off-by: Marco Vedovati <mvedovati@suse.com>

@marcov marcov force-pushed the SUSE:sync branch from 0e445e8 to 2922575 Nov 15, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment