Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support layer deltas #902

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

Conversation

alexlarsson
Copy link
Contributor

Deltas are a way to avoid downloading a full copy if a layer tar file
if you have a previous version of the layer available locally. In
testing these deltas have been shown to be around 10x to 100x smaller
than the .tar.gz files for typical Linux base images.

In the typical client-side case we have some previous version of the
image stored in container-storage somewhere, which means that we have
an uncompressed files available, but not the actual tarball
(compressed or not).

This means we can use github.com/alexlarsson/tar-diff which takes
two tar files and produces a delta file which when applied on the
untar:ed content of the first tarfile produces the (bitwise identical)
content of the uncompressed second tarfile. It just happens that the
uncompressed tarfile is exactly what we need to reproduce, because that
is how the layers are refered to in the image config (the DiffIDs).

How this works is that we use OCI artifacts to store, for each regular
image a manifest with information about the available deltas for the
image. This image looks like a regular manifest, except each layer
contains a tar-diff (as a blob) an uses the existing annotations key
to record which DiffIDs the layer applies to.

For example, a manifest would look like this:

{
  "schemaVersion": 2,
  "config": {
    "mediaType": "application/vnd.oci.image.config.v1+json",
    "digest": "sha256:ca3d163bab055381827226140568f3bef7eaac187cebd76878e0b63e9e442356",
    "size": 3
  },
  "layers": [
    {
      "mediaType": "application/vnd.redhat.tardiff",
      "digest":
"sha256:49402288de20a465616174a38aca4746f46be2c3f9519fe4d14fc7f83f44a32a",
      "size": 7059734,
      "annotations": {
          "com.redhat.deltaFrom":
"sha256:b9137868142acd7ce4d62216e2b03e63e9800e2b647bf682492d3e9c5e66277c",
          "com.redhat.deltaTo":
"sha256:c88d2d437799c2879fded33ee358429e1eb954968a25f3153e2e0e26fef7ef28"
      }
    }
  ]
}

The config blob is just an json file containing "{}". Ideally it
should not be of type application/vnd.oci.image.config.v1+json,
because that is reserved for docker-style images. However, as
explained in oras-project/oras#129, docker hub
doesn't currently support any other type. For registries that support
OCI artifacts we should instead use some other type so that tooling
can know that this is not a regular image.

The way we attach the delta manifest to the image is that we store it
in the same repo and a tag name based on the manifest digest like
"delta-${shortid}".

The delta layers record which DiffID they apply to, which is what we
want to use to look up the pre-existing layers to use as delta source
material, and it is what the delta apply will generate. This means
however that using the deltas only works if we're allowed to
substitute blobs, but this doesn't seem to be an issue in the typical
case.

@alexlarsson
Copy link
Contributor Author

To test this I've been using "skopeo copy", it works fine if you just
use the new version of containers/image. However, to generate deltas I
have a branch that supports the generate-delta command here:

https://github.com/alexlarsson/skopeo/tree/wip/deltas

I also create a set of test images on docker hub:
https://hub.docker.com/repository/registry-1.docker.io/alexl/fedoratest

This has a few different versions of fedora 32, including deltas between then, which were created like this:

$ skopeo generate-delta docker://alexl/fedoratest:32-2019-09-27 docker://alexl/fedoratest:32-2019-08-26
Generating delta for layer sha256:c88d2d437799c2879fded33ee358429e1eb954968a25f3153e2e0e26fef7ef28 from sha256:b9137868142acd7ce4d62216e2b03e63e9800e2b647bf682492d3e9c5e66277c
Downloading 'from' blob sha256:ab8d40319bf2ab4567c7e7cfa69aca0983e443e96ee67afc4e45f19aba0d07d1
Downloading 'to' blob sha256:a39edc9e7bc3a586926c94144a8c7ebc83dbfaa17c2a60f4ad56df7066cba285
Generating delta
Generated delta size 6.7 MB, original layer 66.5 MB
Uploading delta layer sha256:49402288de20a465616174a38aca4746f46be2c3f9519fe4d14fc7f83f44a32a
Uploading delta manifest

As an example of using this is:

# First a full image
$ skopeo copy docker://alexl/fedoratest:32-2019-08-26 containers-storage:fedoratest:32-2019-08-26
Getting image source signatures
Copying blob ab8d40319bf2 [============>-------------------------] 23.2MiB / 66.3MiB
Copying blob ab8d40319bf2 [====================================>-] 64.3MiB / 66.3MiB
Copying blob ab8d40319bf2 done
Copying config 5895d009c5 done
Writing manifest to image destination
Storing signatures
# Now one with delta
$ skopeo copy docker://alexl/fedoratest:32-2019-09-27 containers-storage:fedoratest:32-2019-09-27
Getting image source signatures
Copying delta 49402288de2 [--------------------------------------] 0.0b / 6.7MiB
Copying delta 49402288de2 done
Copying config e13031c001 done
Writing manifest to image destination
Storing signatures

There are some outstanding questions.

I put tar-diff on my github. Should we move it to the containers
organization?

The names of the media type and annotations now contain "redhat",
maybe we should try to register some more independent names for these.

The use of tags for the deltas makes it necessary to manually garbage
collect deltas. Is there some other way we could refer to them from an
image? I would like to avoid having to modify the image itself when
adding the deltas, because doing so changes the digest of it and may
invalidate things refering to it. Maybe we could use image lists? I'm
not sure how they are used in practice, should they also avoid
changing in the same way as the actual image manifests?

Also, currently the code is always using a regular OCI image config
media type for the config blob in delta manifest. It is more typical
in OCI artifact use to have a custom media type for these so that
regular container tools can filter out these images. However, docker
hub doesn't currently support such media types so for now I'm using
the old one so it can be demoed easily. We can easily try a custom
media type and use that if it works and fall back otherwise.

@alexlarsson
Copy link
Contributor Author

Force-pushed with sign-off and adding the test scaffolding for the new interface methods.

@vrothberg
Copy link
Member

Hi @alexlarsson, thanks! I am quite excited about your work.

At the moment, I am deeply buried in Podman work, so I'll have a hard time to find enough time to review such a fairly complex PR. I'll try to allocate a couple of hours next week.

@giuseppe @mtrmac @nalind PTAL

CC @fatherlinux @mrunalp @rhatdan as it's super rad.

@alexlarsson
Copy link
Contributor Author

Also, @josephschorr, you might want to take a look at this as you're involved with both quay and OCI artifacts which are related to this.

@alexlarsson
Copy link
Contributor Author

The use of tags for the deltas makes it necessary to manually garbage
collect deltas. Is there some other way we could refer to them from an
image? I would like to avoid having to modify the image itself when
adding the deltas, because doing so changes the digest of it and may
invalidate things refering to it. Maybe we could use image lists? I'm
not sure how they are used in practice, should they also avoid
changing in the same way as the actual image manifests?

Maybe we can use a single tag deltas which maps to a image index, where each image in the index refers to a delta manifest, specifying which using the annotation fields of the index.manifests array elements. That way we only have one tag for deltas, simplifying delta garbage collection. We would still need to remove unnecessary deltas from the index, but it seems more localized.

@alexlarsson
Copy link
Contributor Author

Also, CC @owtaylor

@giuseppe
Copy link
Member

CC @vbatts

@giuseppe
Copy link
Member

ideally it should be part of the OCI specs, but since it is not a breaking change and if there are no deltas we just fallback to the existing code path, we could probably carry it without blocking on OCI.

@vbatts is https://github.com/opencontainers/artifacts the proper place to discuss and develop such stuff?

@alexlarsson
Copy link
Contributor Author

The artifacts spec is more about detailing how in general you would store something other than pure docker images on a registry, they are not involved in standardizing the format of individual artifacts (other than possibly to ratify/list known media types).

If anything this should be related to the image spec itself, but as you say it is quite independent of it and it doesn't necessarily have to be affected at all by this, which is nice as getting such changes in would be a very slow work...

@alexlarsson
Copy link
Contributor Author

I pushed a new commit (which hasn't shown up here yet, whats up with github?) that changes it to use a single branch "deltas" which has an OCI index and then for each image in that a com.redhat.deltaTarget annotation that specifies the manifest id it is for.

The one issue with this is that updating the deltas tag seems a bit racy. We download the old one, update it and put it back, and if someone else does this at the same time we would lose one of the updates. Is there any way to do atomic updates of oci images? Like, set the tag to this new blob, but fail if the current version is not $digest. Then we could loop it until we successfully updated it.

@alexlarsson alexlarsson force-pushed the wip/deltas branch 2 times, most recently from 2d26699 to 1f8944d Compare April 23, 2020 14:39
Copy link
Collaborator

@mtrmac mtrmac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

I need to handle a few bugs in the coming days, so for now here just a few quick thoughts on things that caught my eye.

// It may use a remote (= slow) service.
// If instanceDigest is not nil, it contains a digest of the specific manifest instance to retrieve deltas for (when the primary manifest is a manifest list);
// this never happens if the primary manifest is not a manifest list (e.g. if the source never returns manifest lists).
GetDeltaManifest(ctx context.Context, instanceDigest *digest.Digest) ([]byte, string, error)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hum, an API break.

I suppose we’ll now have to figure out how to extend these interfaces without breaking API, it had to come one day.

(Let’s figure out the right approach first and deal with the API later, just highlighting that this will have to happen.)

types/types.go Outdated Show resolved Hide resolved
types/types.go Outdated Show resolved Hide resolved
copy/copy.go Outdated
var matchingLayers []*types.BlobInfo
for i := range deltaLayers {
deltaLayer := &deltaLayers[i]
to := deltaLayer.Annotations["com.redhat.deltaTo"]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The annotation name, and the concept of “RootFS”, feels like something that should be a part of image.genericManifest/types.Image instead of assumed to be generally applicable. I’m not immediately sure how practical that is.

@josephschorr
Copy link

Also, @josephschorr, you might want to take a look at this as you're involved with both quay and OCI artifacts which are related to this.

@alexlarsson The referenced blobs in the deltaFrom... from where are they being served? Are these annotations the only reference to them?

@josephschorr
Copy link

Then we could loop it until we successfully updated it.

Gah! Please don't design calls that need to loop against registries!

If you need atomic updates, then it likely needs to be done as an extension to the registry protocol.

@alexlarsson
Copy link
Contributor Author

Also, @josephschorr, you might want to take a look at this as you're involved with both quay and OCI artifacts which are related to this.

@alexlarsson The referenced blobs in the deltaFrom... from where are they being served? Are these annotations the only reference to them?

The "from" blobs need not be served at all. The setup is like this:

  • Tag latest points to sha256:AAA, which has an uncompressed id sha256:aaa
  • Tag deltas points to an image list, which points to the delta manifest
  • The delta mainfest points to the actual blobs that are tardiffs
  • The delta manifest has a key from: sha256:bbb, and to: sha256:aaa

Here, sha256:bbb need not be served from the registry at all, instead what is important is that the client may have it available locally (for example if he pulled a previous version of the latest tag. If so, he can use this data with the tardiff to reprocude the sha256:aaa layer

@alexlarsson
Copy link
Contributor Author

Then we could loop it until we successfully updated it.

Gah! Please don't design calls that need to loop against registries!

If you need atomic updates, then it likely needs to be done as an extension to the registry protocol.

It seems like a standard way to do an atomic update though. You get the current version of the index, update it locally, and the PUT it with an If-Match header containing the current ETAG. The PUT will then succeed only if (atomically) the current version of the tag is the same as the one in the header. If it is not it will fail, and we loop.

Its not like we're busy looping, we will only loop in the case where there is an actual conflict.

@josephschorr
Copy link

Its not like we're busy looping, we will only loop in the case where there is an actual conflict.

It could, however, result in a lot of clients looping because they are all hitting against one another. This is a real concern with any design consideration for an API that will likely be used by millions of clients.

The config blob is just an json file containing "{}"

Yeah, be aware Quay will (correctly) fail to validate this.

Also, it feels like if this "delta" manifest is a manifest representing the delta, then the information about from and to should, in theory, be contained in the config (it could be duplicated in the annotations, I guess)

We can easily try a custom media type and use that if it works and fall back otherwise.

Please don't fallback. If you do, someone is invariably going to push the content with the "wrong" media type to a registry that does not do validation, then they'll try to mirror that content into a registry that does do validation, it will fail, and the mirroring registry will get blamed. We should not be abusing the content types simply because registries do not yet support artifacts.

The use of tags for the deltas makes it necessary to manually garbage
collect deltas. Is there some other way we could refer to them from an
image? I would like to avoid having to modify the image itself when
adding the deltas, because doing so changes the digest of it and may
invalidate things refering to it. Maybe we could use image lists? I'm
not sure how they are used in practice, should they also avoid
changing in the same way as the actual image manifests?

I think using an OCI Index here is the way to go: The index could contain the "normal" manifest as an entry, as well as the delta manifest as a secondary entry. That way, the delta+normal manifest can be tagged as a unit, and it'll also hold the references to things as a single GCable unit. If a new delta needs to be generated, the previous should be contained in the new index as well (up to some limit, after which point you could compact the deltas or somesuch).

@mtrmac
Copy link
Collaborator

mtrmac commented Apr 23, 2020

The use of tags for the deltas makes it necessary to manually garbage
collect deltas. Is there some other way we could refer to them from an
image? I would like to avoid having to modify the image itself when
adding the deltas, because doing so changes the digest of it and may
invalidate things refering to it. Maybe we could use image lists? I'm
not sure how they are used in practice, should they also avoid
changing in the same way as the actual image manifests?

I think using an OCI Index here is the way to go: The index could contain the "normal" manifest as an entry, as well as the delta manifest as a secondary entry. That way, the delta+normal manifest can be tagged as a unit, and it'll also hold the references to things as a single GCable unit.

(That does not quite work in full generality; OCI in theory supports many possible shapes of index trees, but in practice clients are not going to be interperable by doing a recursive search and combining the metadata from various levels of the tree using completely undefined criteria. There are going to be more restrictive “profiles” of the standard.

As far as c/image is concerned, OCI use mirrors the Docker schema2 use: index ~ manifest list, a single-level lookup for multi-arch images. Sure, adding delta pseudo-architectures or something like that to the existing single level of an OCI index would be possible, and it would contain registry-recognizable references to deltas.)

But…


Ultimately “garbage collection” is going to be problematic either way. If the registry supports deletion at all (many don’t), it won’t by default delete untagged manifests (because digest references exist, and if nothing else there is a race between looking up an index by tag and reading the component manifest by digest vs. deleting the index tag). So, manifests that reference deltas are going to keep piling up and prevent deletion of the layer deltas, unless maybe the user explicitly opts in into deleting untagged manifests, and we shouldn’t assume such an opt in.

So:

  • Never refer to any component of deltas by digest from the primary images (because primary images can be referred to by digest, and are ~never deleted). So, no reference to delta blobs in primary manifests, no reference to delta manifests in primary indices.
  • By implication, create some other way to look up the current set of deltas. Because “the current set of deltas” for a static primary image can change over time, that probably implies a tag, separate from the tag of the primary image.
  • Make delta manifests/indices recognizable to the registry, so that it can treat older versions of the manifests, if untagged, as garbage to be collected.

@owtaylor
Copy link
Contributor

owtaylor commented Apr 23, 2020

It could, however, result in a lot of clients looping because they are all hitting against one another. This is a real concern with any design consideration for an API that will likely be used by millions of clients.

It doesn't seem likely to me to have millions, or a handful constantly updating a single repository. Two clients trying to update a single repository at once is a rare situation - and the registry load of an extra rejected HTTP request - trivial.

I think using an OCI Index here is the way to go: The index could contain the "normal" manifest as an entry, as well as the delta manifest as a secondary entry. That way, the delta+normal manifest can be tagged as a unit, and it'll also hold the references to things as a single GCable unit. If a new delta needs to be generated, the previous should be contained in the new index as well (up to some limit, after which point you could compact the deltas or somesuch).

If I understand you, you are suggestion that the deltas appear in the image index that tags point to - latest points to an OCI index, that has (presumably) both the different arch images, and also the delta manifests for those images. This would mean that if you were telling someone to pull a version by digest instead of tag (install foo@sha256:<....>) the digest is the image index digest and dependent on the set of deltas. Deltas basically become part of the bundle you are pushing to the registry.

This isn't ideal for a couple of reasons:

  • the set of deltas that you want to include depends on context where you are publishing - you don't care what versions you have internally, you want deltas from the versions that have appeared in your public repository.
  • you might want to remove deltas archived image versions - there's no reason to keep deltas that say how to go to an old version from a yet-older version.

Maintenance of deltas might also be also be something that is offered as a service on some registries - a user could define a policy (deltas from the last 5 versions of the 'latest' tag) and let the registry or an external service handle creating the deltas. In this case, you wouldn't want the registry creating a new index and changing where the tag points.

Those are reasons why we have generally thinking of deltas as being side-car information to an image rather than being part of the tagged image/image-manifest.

@alexlarsson
Copy link
Contributor Author

Its not like we're busy looping, we will only loop in the case where there is an actual conflict.

It could, however, result in a lot of clients looping because they are all hitting against one another. This is a real concern with any design consideration for an API that will likely be used by millions of clients.

The way I see it there are three ways to do an atomic update API

  • Atomic compare and swap (what i proposed)
  • Locking
  • Adding server side support for updating the index

The last one seems vastly more complex to even specify. I think CAS would perform better in the non-contended case, and locking if you have contention. However, I doubt you'll actually see contention in updating a specific tag, so imho CAS is the better choice. But anyway that is beyond this discussion.

The config blob is just an json file containing "{}"

Yeah, be aware Quay will (correctly) fail to validate this.

If we claim it is of type application/vnd.oci.image.config.v1+json, yes. But what if we give it a different type? Does quay support that?

Also, it feels like if this "delta" manifest is a manifest representing the delta, then the information about from and to should, in theory, be contained in the config (it could be duplicated in the annotations, I guess)

It could easily be in the config (only, or in both), and maybe that is conceptually more correct. However, in practice all that means is that we have to load and update two files (manifest and config) instead of just one.

We can easily try a custom media type and use that if it works and fall back otherwise.

Please don't fallback. If you do, someone is invariably going to push the content with the "wrong" media type to a registry that does not do validation, then they'll try to mirror that content into a registry that does do validation, it will fail, and the mirroring registry will get blamed. We should not be abusing the content types simply because registries do not yet support artifacts.

What is the current status of artifacts in quay? I'm working on this because redhat uses OCI instead of ofstree for the fedora flatpak runtime and the performance (vs ostree) is horrible due to the lack of deltas. Lots of people are complaining about this, and if the solution is going to have to wait until everyone implements oci artifacts that may be a problem.

I think using an OCI Index here is the way to go: The index could contain the "normal" manifest as an entry, as well as the delta manifest as a secondary entry. That way, the delta+normal manifest can be tagged as a unit, and it'll also hold the references to things as a single GCable unit. If a new delta needs to be generated, the previous should be contained in the new index as well (up to some limit, after which point you could compact the deltas or somesuch).

Yeah, this would definately work. I'm not really aware of how oci indexes are used in the wild. In ostree, a delta is a completely independent optimization and doesn't affect the digest of the content (which in ostree parlance would be the commit id).

So, If at some point we upload a version of an image the tag will point to some digest, and then later we add a delta, this will change the digest of the index that the tag points to (even thought the final image digest is the same). Is this a problem to typical OCI use? (I honestly don't know)

@mtrmac
Copy link
Collaborator

mtrmac commented Apr 23, 2020

So, If at some point we upload a version of an image the tag will point to some digest, and then later we add a delta, this will change the digest of the index that the tag points to (even thought the final image digest is the same). Is this a problem to typical OCI use? (I honestly don't know)

Not a breaking problem, but something to consider: Cluster orchestrators like OpenShift may well resolve tags to digests “early”, so that multi-node software deployments all use exactly the same image even if the tag moves. In that case, deltas added “later” would not be found by nodes that are pinned to a specific digest.

@alexlarsson
Copy link
Contributor Author

So, If at some point we upload a version of an image the tag will point to some digest, and then later we add a delta, this will change the digest of the index that the tag points to (even thought the final image digest is the same). Is this a problem to typical OCI use? (I honestly don't know)

Not a breaking problem, but something to consider: Cluster orchestrators like OpenShift may well resolve tags to digests “early”, so that multi-node software deployments all use exactly the same image even if the tag moves. In that case, deltas added “later” would not be found by nodes that are pinned to a specific digest.

Yeah, this seems bad to me, deltas are an independent transport thing an not part of the image itself. The original images are reconstructed perfectly, so nobody need to even be aware of the deltas. So, I think considering them a sidecar thing is the correct thing to do.

Additionally, as mentioned before, referencing the deltas from the main manifests makes it very hard to later remove deltas, and deltas really only are useful for the latest versions of things. You rarely pull older versions, so its not worth wasting space on them.

@alexlarsson
Copy link
Contributor Author

Ultimately “garbage collection” is going to be problematic either way. If the registry supports deletion at all (many don’t), it won’t by default delete untagged manifests (because digest references exist, and if nothing else there is a race between looking up an index by tag and reading the component manifest by digest vs. deleting the index tag). So, manifests that reference deltas are going to keep piling up and prevent deletion of the layer deltas, unless maybe the user explicitly opts in into deleting untagged manifests, and we shouldn’t assume such an opt in.

What we could do is have the deltas tag (which points to an index of all current delta manifests) be the only thing that refers to delta manifests (and indirectly delta blobs). Then when we update this tag we could explicitly delete the old manifest version (and free the blobs for GC). Is that possible?

@josephschorr
Copy link

josephschorr commented Apr 23, 2020

it won’t by default delete untagged manifests

@mtrmac We absolutely do delete untagged manifests. In fact, in Quay, if you have a manifest that is not referenced by a tag (either directly or indirectly via a manifest list), the manifest immediately becomes unpullable and will eventually be GCed.

OCI use mirrors the Docker schema2 use

@mtrmac Based on numerous discussions, it seems that this will not last once we have artifacts supported. Many individuals I've spoke to have stated they intend to place, for example, signatures at the top level along images, or source references, or other even Helm charts, etc, all found in the same index, at the same level; client tooling would then choose to make use of whatever it supports.

It doesn't seem likely to me to have millions, or a handful constantly updating a single repository. Two clients trying to update a single repository at once is a rare situation - and the registry load of an extra rejected HTTP request - trivial.

@owtaylor Sure... until you have automated tooling hitting 10,000 repositories on the same multi-tenant registry service, backed by the same database: then locking across indexes can become a major concern. We know this, because this has already been a concern for Quay, and I've argued strenuously against polling-based APIs as a result.

Deltas basically become part of the bundle you are pushing to the registry.

@owtaylor Yes, since they represent an alternative "view" of the same manifest layers. Otherwise, if we just use another specially-named tag, there is the major concern about overwriting, consistency and scaling.

Basically, if I were generating deltas for a manifest, it would make sense to pull that manifest, compute deltas, then push the manifest back with the deltas "added", for lack of a better term.

But what if we give it a different type? Does quay support that?
What is the current status of artifacts in quay?

@alexlarsson Experimentally, but not in production today. Artifacts spec is, after all, not yet standardized (heck, neither is OCI distribution, for that matter)

However, in practice all that means is that we have to load and update two files (manifest and config) instead of just one.

@alexlarsson Yeah, hence why I said its better spec-wise, but not a requirement in my mind.

deltas are an independent transport thing an not part of the image itself

So, ultimately, I think this discussion is showing why storing deltas-as-artifacts is conceptually not appealing. If the intention is to have this delta support be generic, then we should not be using deltas-in-manifests at all, but rather creating an extension to the OCI image distribution protocol that provides delta computation automatically based on the API call's parameters. That would have the benefit of working for all existing manifests, it would allow for far less storage of duplicative information, and it could be added to existing registries without too much overhead; any registry that didn't want to support it, simply wouldn't.

@mtrmac
Copy link
Collaborator

mtrmac commented Apr 23, 2020

it won’t by default delete untagged manifests

@mtrmac We absolutely do delete untagged manifests. In fact, in Quay, if you have a manifest that is not referenced by a tag (either directly or indirectly via a manifest list), the manifest immediately becomes unpullable and will eventually be GCed.

That’s good to know; still, docker/distribution doesn’t by default (although there is an option), so the design has to account for that.

OCI use mirrors the Docker schema2 use

@mtrmac Based on numerous discussions, it seems that this will not last once we have artifacts supported. Many individuals I've spoke to have stated they intend to place, for example, signatures at the top level along images, or source references, or other even Helm charts, etc, all found in the same index, at the same level; client tooling would then choose to make use of whatever it supports

Sure; c/image is a library for handling container images, not arbitrary payloads (and deltas of this kind are clearly not applicable to arbitrary payloads). It may well not make any sense to support Helm charts in that codebase. It all depends on how much they end up having in common and what contributors are interested in.

If the intention is to have this delta support be generic, then we should not be using deltas-in-manifests at all, but rather creating an extension to the OCI image distribution protocol that provides delta computation automatically based on the API call's parameters. That would have the benefit of working for all existing manifests, it would allow for far less storage of duplicative information, and it could be added to existing registries without too much overhead; any registry that didn't want to support it, simply wouldn't.

It doesn’t make much sense to me to make delta computation a registry responsibility: the publisher of the image is at least as much invested in making images quick to pull, and can invest the resources (delta creation is fairly resource-intensive) instead of the registry. Registries computing deltas completely on-demand seems fairly unlikely given the resource requirements anyway. (Registries may want to publish deltas even if the image publisher didn’t, because they would benefit from the deltas — but that would be a better fit as a data-based targeted process.)

It is also definitely easier and simpler to deploy deltas to and make them easy to mirror if that doesn’t require server-side support. Sure, that argument is treacherous because it can motivate all kinds of nasty hacks, but in this case I’m not sure that integrated server-side support would give us benefits that are not easily obtainable otherwise, and outweigh the limited/delayed support.

@mtrmac
Copy link
Collaborator

mtrmac commented Apr 23, 2020

What we could do is have the deltas tag (which points to an index of all current delta manifests) be the only thing that refers to delta manifests (and indirectly delta blobs). Then when we update this tag we could explicitly delete the old manifest version (and free the blobs for GC). Is that possible?

That does seem possible in the API.

@alexlarsson
Copy link
Contributor Author

What we could do is have the deltas tag (which points to an index of all current delta manifests) be the only thing that refers to delta manifests (and indirectly delta blobs). Then when we update this tag we could explicitly delete the old manifest version (and free the blobs for GC). Is that possible?

That does seem possible in the API.

How would i do this in containers/image? There is a ImageReference.DeleteImage() operation, but it works on named references, and that would in this case be the deltas tag of the repository, but I don't want to delete that, i want to delete the previous manifests it pointed to (which i know the digest of since I replaced them).

alexlarsson added a commit to flatpak/flatpak that referenced this pull request Jun 5, 2020
@alexlarsson
Copy link
Contributor Author

I merged the flatpak support for this, hope to get it into an unstable release so we can start using it.

@rhatdan
Copy link
Member

rhatdan commented Jun 5, 2020

@alexlarsson Needs a rebase.
@mtrmac PTAL

@rhatdan
Copy link
Member

rhatdan commented Jun 5, 2020

@lumjjb PTAL

Deltas are a way to avoid downloading a full copy if a layer tar file
if you have a previous version of the layer available locally. In
testing these deltas have been shown to be around 10x to 100x smaller
than the .tar.gz files for typical Linux base images.

In the typical client-side case we have some previous version of the
image stored in container-storage somewhere, which means that we have
an uncompressed files available, but not the actual tarball
(compressed or not).

This means we can use github.com/containers/tar-diff which takes
two tar files and produces a delta file which when applied on the
untar:ed content of the first tarfile produces the (bitwise identical)
content of the uncompressed second tarfile. It just happens that the
uncompressed tarfile is exactly what we need to reproduce, because that
is how the layers are refered to in the image config (the DiffIDs).

How this works is that we use OCI artifacts to store, for each regular
image a manifest with information about the available deltas for the
image.  This image looks like a regular manifest, except each layer
contains a tar-diff (as a blob) an uses the existing annotations key
to record which DiffIDs the layer applies to.

For example, a manifest would look like this:

```
{
  "schemaVersion": 2,
  "config": {
    "mediaType": "application/vnd.oci.image.config.v1+json",
    "digest": "sha256:<config file hash>",
    "size": 3
  },
  "annotations": {
        "io.github.containers.delta.target": "sha256:<image_manifest_hash>",
  },
  "layers": [
    {
      "mediaType": "application/vnd.tar-diff",
      "digest": "sha256:<tar-diff delta file hash>",
      "size": 7059734,
      "annotations": {
          "io.github.containers.delta.from": "sha256:<old layer diffid>",
          "io.github.containers.delta.to": "sha256:<new layer diffid>"
      }
    }
  ]
}
```

The config blob is just empty. Ideally it should not be of type
application/vnd.oci.image.config.v1+json, because that is reserved for
docker-style images. However, docker hub (and other registries)
currently don't support any other type. For registries that support
OCI artifacts we should instead use some other type so that tooling
can know that this is not a regular image.

The way we attach the delta manifest to the image is that we store it
in the same repo and the we use a single tag named `_deltaindex`
pointing to an index with all the delta manifest in the repository,
with the digest of each target image in the
`io.github.containers.delta.target` annotation key.

The delta layers record which DiffID they apply to, which is what we
want to use to look up the pre-existing layers to use as delta source
material, and it is what the delta apply will generate. This means
however that using the deltas only works if we're allowed to
substitute blobs, but this doesn't seem to be an issue in the typical
case.

Signed-off-by: Alexander Larsson <alexl@redhat.com>
Copy link
Contributor

@lumjjb lumjjb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alexlarsson This is a nice feature to have! I'm trying to think about how this would play with the OCI layer suffixes. My understanding that it is a diff on the tar level, so it will work on compressed/encrypted layers and that will be handled on the client side from the build perspective.

I would imagine that this would work very well with encryption as well... that the deltas themselves could be encrypted as well... So from a user perspective they could use deltas even on encrypted images.

@rhatdan @mtrmac I think there may be some interactions between the two flows that need to be worked out (possibly a separate PR in the future?). I am wondering if for now we add a couple comments on encryption interactions, and maybe add a few checks to make sure the features are not used together yet.

I've put in a few comments where I think we could check on some of these interactions.

// Convert deltaStream to uncompressed tar layer stream
pr, pw := io.Pipe()
go func() {
if err := tar_patch.Apply(wrappedDeltaStream, deltaDataSource, pw); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are the deltas always uncompressed/unencrypted?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deltas are inherently compressed. I.e. they are not wrapped in a compression layer but instead compressed on the parts inside that need it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That said, they are always unencrypted, so layering wise we might want to wrap some encryption around them.

switch srcInfo.MediaType {
case manifest.DockerV2Schema2LayerMediaType, manifest.DockerV2SchemaLayerMediaTypeUncompressed:
return true, manifest.DockerV2SchemaLayerMediaTypeUncompressed
case imgspecv1.MediaTypeImageLayer, imgspecv1.MediaTypeImageLayerGzip, imgspecv1.MediaTypeImageLayerZstd:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interaction for encrypting path with decryption routine

Encryption mediatypes https://github.com/containers/ocicrypt/blob/master/spec/spec.go

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

canUseDeltas() currently returns false (due to the above media checks) for encrypted manifests, because for now we don't support encrypted deltas. However, once we do these checks need to be widened, yes.

@@ -1064,6 +1145,52 @@ func (ic *imageCopier) copyLayer(ctx context.Context, srcInfo types.BlobInfo, to
}
}

// First look for a delta that matches this layer and substitute the result of that
if ok, deltaResultMediaType := ic.canUseDeltas(srcInfo); ok {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interaction for encrypting path with toEncrypt flag

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what you mean here, it seem alright to me. Applying deltas naturally create the uncompressed layers, so we get the diffid etc. That said, I'm not well versed in how crypto works here, so I might be missing something.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm... so the toEncrypt flag tells the copy process to encrypt the blob when passed into copyBlobFromStream. My concern was if any deltas are created by this process, that they would need to support encryption. If they are not, then it should probably be fine.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

copy never generates deltas, it just applies them to regenerate the tar. Creating deltas is currently done by a separate patch to skopeo.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case of encrypted images, skopeo generate-delta will have to take encryption key as an additional argument to encrypt the deltas. skopeo copy already supports taking decryption key as an argument to decrypt the image layers, so that can be re-used to decrypt deltas too.

@lumjjb
Copy link
Contributor

lumjjb commented Jun 9, 2020

@alexlarsson Thanks for the responses!

Agreed on the points and design, just kind of going through the thought exercise.. In general, it looks like there are no design conflicts. I am fine with maybe having a comment or line in a docs to make this more apparent that encrypted layers are not currently supported with deltas. And once this is in, I can open up an issue to keep track of some of the TODOs for support once both technologies become mature and there is an ask.

@harche
Copy link

harche commented Jun 9, 2020

@alexlarsson I wasn't able to test https://github.com/alexlarsson/skopeo/tree/wip/deltas using local registry.

 ./skopeo generate-delta  docker://localhost:5000/nginx:1.17.7 docker://localhost:5000/nginx:1.17.8
FATA[0000] Error creating image: error pinging docker registry localhost:5000: Get "https://localhost:5000/v2/": http: server gave HTTP response to HTTPS client 

Just like skopeo copy command maybe generate-delta should also support --dest-tls-verify and --src-tls-verify flags. Right now it fails with following error,

$ ./skopeo generate-delta  --src-tls-verify=false docker://localhost:5000/nginx:1.17.7 docker://localhost:5000/nginx:1.17.8
FATA[0000] unknown flag: --src-tls-verify               
[harshal@localhost skopeo]$ ./skopeo generate-delta  --dest-tls-verify=false docker://localhost:5000/nginx:1.17.7 docker://localhost:5000/nginx:1.17.8
FATA[0000] unknown flag: --dest-tls-verify    

@alexlarsson
Copy link
Contributor Author

@harche Added a commit to the skopeo branch that does this.

@alexlarsson
Copy link
Contributor Author

I'm off on PTO from some time next week, so might not see comments here.

bogdando added a commit to bogdando/tripleo-common that referenced this pull request Jun 30, 2020
Implements the logic in
containers/image#902 by
adapting it for Image-Serve local registry.

Use custom mediaTypes for TripleO tar-diff blob and media/index
OCI artifacts. This indicates that layer deltas are only for local
UC-to-OC containers images distribution purposes and have no plans
to support fetching deltas from external registries, like dockerhub.

Change-Id: Id50bb4fc91e1bcb8f3eb53c3601c358b51b1a468
Signed-off-by: Bogdan Dobrelya <bdobreli@redhat.com>
@alexlarsson
Copy link
Contributor Author

alexlarsson commented Aug 19, 2020

I noticed that containerd 1.4 is out with stargz support. Would have been cool to get delta support in before that...

@tpopela
Copy link

tpopela commented Nov 4, 2020

Could this have some attention please? @mtrmac

@cgwalters
Copy link
Contributor

cgwalters commented Dec 6, 2021

This came up again when discussing ostreedev/ostree-rs-ext#69

I'll just write this here briefly; there are 3 distinct things which make sense separately, and can be combined as they're conceptually orthogonal:

  • Generating images with "split reproducible blobs"; the advantage of this is it works with the existing ecosystem today. The main disadvantage is it's hard to impossible to do this from Dockerfile, but that can be fixed.
  • Dynamic/on-demand fetching: This is what the zstd:chunked work is about. The advantage of this is that the delta computation is dynamic, and the zstd issue aside, it will be easy for registries and operators to use this. (Though, the number of HTTP requests issue is outstanding)
  • Pre-computed deltas (bsdiff/rsync/etc). That's what this PR is about. Alex's work here is basically hardcoded to bsdiff, but IMO an advantage of the ostree design is it supports "rsync style" rollsum deltas which cover a fair number of binary cases too, and are much less CPU intensive. But in constrast to "zstd dynamic", pre-computed deltas require distinct maintenance, storage space and garbage collection.

The way I'd say this is that "split reproducible blobs" aids both "zstd dynamic" and "pre-computed deltas" - it means less work for deltas. "zstd dynamic" and "pre-computed deltas" more heavily overlap, but they have somewhat opposite advantages/disadvantages, so it depends on the situation.

@hdonnay
Copy link

hdonnay commented Nov 7, 2023

Stumbled on this, and it occurs to me that this may be easier to do using the diff blobs as artifacts, now that OCI referrer/"artifact" support exists (or is very close to existing).

@cgwalters
Copy link
Contributor

@hdonnay
Copy link

hdonnay commented Nov 7, 2023

Ah, rad. I was briefly concerned for how people would expect this to interact with Clair, but if the invariant that "a client can always pull the original blob" is maintained, it shouldn't be an issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature A request for, or a PR adding, new functionality
Projects
None yet
Development

Successfully merging this pull request may close these issues.