-
Notifications
You must be signed in to change notification settings - Fork 379
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support additional layer store (patch for containers/image) #1109
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a very quick initial skim to highlight concerns. I have literally not read the PR description in full yet.
How does this relate to #1084 ?
If this extends the image format, there should eventually be formal documentation for interoperability — maybe after the code settles, sure.
copy/copy.go
Outdated
// Add default labels containing information of the target image and layers. | ||
srcLayer.Annotations[layerTargetDigest] = srcLayer.Digest.String() | ||
if dref := ic.c.rawSource.Reference().DockerReference(); dref != nil { | ||
srcLayer.Annotations[layerTargetReference] = dref.String() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- This exposes private information; it’s not expected that the destination of a copy is told about the name used at the source.
- Why the full Docker reference, anyway? c/storage keys things off an image ID, and a Docker reference can refer to many different images over time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Registry-backed additional layer store filesystem (e.g. eStargz, zstd:chunked) needs to query layer data to the registry. The registry API requires image reference so this needs to be passed to the c/storage's Store
and the underlying filesystem somehow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having now read a bit more, containers/storage#795 (comment) : this is just data that should be passed as parameters, not by modifying the manifest, which affects every single skopeo copy
including things like inter-registry copies exposing locations of private staging registries when making a product release.
(Even given that, it’s not at all clear to to me how this is going to work with authenticated registries; the credentials used by copy.Image
are not made available to the on-demand additional store implementations by this. But that’s a secondary concern and probably solvable at least for the simple cases.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could we use a lookaside cache for storing such information (something implemented with: containers/storage#787)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if 3 different images that share a layer are pulled, one after another; 2 from a single registry, the third from a completely different registry; the three pulls are using different credentials, and later some of the credentials are invalidated? The credentials are conceptually a property of (end user, repository), not a layer. (And the concept of “end user” gets rather tricky once CRI-O is involved.)
I suppose just starting with unauthenticated repositories would still be useful. Getting the complex cases right might be more interesting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is just data that should be passed as parameters, not by modifying the manifest, which affects every single skopeo copy including things like inter-registry copies exposing locations of private staging registries when making a product release.
Reference is now pased as parameters to c/storage.Store
, through a new API *c/image.storageImageDestination.TryReusingBlobWithRef()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even given that, it’s not at all clear to to me how this is going to work with authenticated registries
As initial implementation, the underlying filesystem's authentication logic (creds management) is separated from copy.Image
.
We can come up with the way to sync creds between them but this can be following-up PRs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can come up with the way to sync creds between them but this can be following-up PRs.
I can’t much imagine how that would look like; so is this “will never happen”? The tradeoffs are relevant for choosing an an approach/implementation.
The underlying filesystem (additional layer store) accessed from c/storage can support mounting layers from the registry leveraging eStargz/zstd:chunked. |
storage/storage_image.go
Outdated
@@ -486,6 +489,16 @@ func (s *storageImageDestination) TryReusingBlob(ctx context.Context, blobinfo t | |||
}, nil | |||
} | |||
|
|||
// Check if the layer can be used from the additional layer store with fresh ID. | |||
if l, err := s.imageRef.transport.store.Layer(stringid.GenerateRandomID(), storage.WithAnnotations(blobinfo.Annotations)); err == nil { | |||
s.blobLayerIDs[blobinfo.Digest] = l.ID |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With blobDiffIDs
not set, this will cause computeID
to fail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With blobDiffIDs not set, this will cause computeID to fail.
Fixed this. Now filesystem (ALS) provides diffID.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does the filesystem obtain the diffID? By reading all of the layer? If so, wouldn’t it be more efficient to avoid it and just read all of the layer ourselves? (Or is that somehow cached across machines?)
So should we have only one of the two features? If so, which one? Or if both, why have two so different implementations working with the same format? (I don’t have any opinion yet, I assume you have thought about these things much more than me.) |
We should have both of eStargz (fully compatible with legacy gzip-based images) for backward-compatibility to the current ecosystem and zstd:chunked (zstd-based) for "skippable frame" support & possibly better performance. |
I was not asking about the formats, but about the two structurally different implementations of zstd:chunked . |
5efe653
to
97321ea
Compare
97321ea
to
6ed7eb3
Compare
cb69bce
to
837e7e4
Compare
copy/copy.go
Outdated
// Add default labels containing information of the target image and layers. | ||
srcLayer.Annotations[layerTargetDigest] = srcLayer.Digest.String() | ||
if dref := ic.c.rawSource.Reference().DockerReference(); dref != nil { | ||
srcLayer.Annotations[layerTargetReference] = dref.String() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can come up with the way to sync creds between them but this can be following-up PRs.
I can’t much imagine how that would look like; so is this “will never happen”? The tradeoffs are relevant for choosing an an approach/implementation.
837e7e4
to
f6c18b7
Compare
I think the file system API isn't enough for achieving this. Another dedicated communication channel (e.g. unix socket) will need to be exposed by the filesystem (FUSE daemon) for communicating creds with the runtime (Podman, CRI-O, ...). |
needs rebase |
f6c18b7
to
37229aa
Compare
37229aa
to
1d45144
Compare
@ktock Could you rebase this PR>]? |
1d45144
to
013f2c5
Compare
Rebased. |
And sadly it needs to be rebased again. |
013f2c5
to
3378aec
Compare
@giuseppe Do we need a new tag for containers/storage#795 to make CI of this PR pass? |
@vrothberg Thanks for the review. |
CI failure (Skopeo) seems not related to this PR
|
I concur and restarted the CI job. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thank you!
@mtrmac, if you have time to skim the changes. I'd feel more comfortable with your blessings.
Can we move this forward? |
Thanks for the ping. Yes, let's get it merged. Can you rebase? |
This commit adds support for "Additional Layer Store". Pull is one of the time-consuming steps in the container lifecycle. Additional Layer Store enables runtimes (e.g. Podman, CRI-O, etc) to startup containers using layers stored in a specified directory, instead of pulling them from the registry. One of the expected use cases of this feature is "lazy pulling". This enables the runtimes to startup containers without waiting for the entire image contents to be locally available, but necessary chunks of the contents are fetched on-demand (lazily). There are several image formats and filesystems to enable lazy pulling in the community. They includes stargz/eStargz, zstd:chunked, CVMFS, etc. Additional Layer Store makes it easy to integrate with these filesystems for performing lazy pulling. Signed-off-by: Kohei Tokunaga <ktokunaga.mail@gmail.com>
3449032
to
341204f
Compare
Rebased :) |
@ktock do you have some instructions how I can play with the stargz-snapshotter and additional layer stores? How do I run it and what setup to use for |
thanks that is helpful! I've followed the installation tutorial and I think it is running now. Do you have any image ready I can use to lazy pull? |
@giuseppe There are pre-converted estargz images on ghcr.io/stargz-containers : https://github.com/containerd/stargz-snapshotter/blob/main/docs/pre-converted-images.md Other methods to get estargz images: https://github.com/containerd/stargz-snapshotter#getting-estargz-images |
Thanks! That worked well |
is there also support for zstd:chunked? I've tried with |
That is on containerd/stargz-snapshotter#293 but hasn't merged yet. I'll work on it soon. |
@ktock We are looking to packge stargz-snapshotter as a separate package in Fedora. @lsm5 is taking the lead on this, are you interested in working on it? If we are going to truly allow users to use this method of pulling containers, we need to make it as easy as possible for them to do it. This includes building and pushing images to registries, and then configuring container/storage to use the fuse snapshotter file system out of the box. |
I am adding support for estargz to the chunked package in containers/storage. @ktock what is the easiest way to create estargz images? Is it worth to support the original stargz format as well or we can safely assume estargz is the format used by containerd? |
Thank you!
On command line:You can use
On golang code:You can use Stargz Snapshotter requires the following OCI annotations (source) in the OCI descriptor of the layer.
Yes, we can. |
@lsm5 Thank you. I sent an email. |
@ktock thanks for the update. That is a great news :) |
containers/podman#4739
Reconsidered the design on 2021/1/12
This enables podman to create containers using layers stored in a specified directory instead of pulling them from the registry. Leveraging this feature with remotely-mountable layers provided by stargz/zstd:chunked or CVMFS, podman can achieve lazy pulling.
Changes in contianers/storage: containers/storage#795
That directory is named "additional layer store" (call ALS in this doc) and has the following structure.
diff
directory contains the extracted layer diff contents specified by the key-value pairs.info
file contains*c/storage.Layer
struct that indicates the information of this layer contents (*c/storage.Layer.ID
,*c/storage.Layer.Parent
and*c/storage.Layer.Created
can be empty as it'll be filled byc/storage.Store
).On each pull,
c/storage.Store
searches the layer diff contents from ALS using pre-configured key-value pairs.Each key-value pair is base64 encoded.
By default, the following
key=value
pairs can be used as elements of the path in ALS,reference=<image reference>
layerdigest=<digest of the compressed layer contents>
Additionally, layer annotations (defined in the image manifest) prefixed by
containers/image/target.
can be used as well.The prefix
containers/image/target.
will be trimmed from the key when it's used in the path on ALS.Overlay driver supports an option to specify which key-value pair to be used and how the order should they be when
c/storage.Store
searches layers in the ALS.In the above case, on each pull,
c/storage.Store
searches the following path in ALS,The underlying filesystem (e.g. stargz/zstd:chunked-based filesystem or CVMFS) should show the exploded view of the target layer diff and its information at these locations.
Example filesystem implementation (currently stargz-based) is https://github.com/ktock/stargz-snapshotter/tree/als-pool-example (this must be mounted on )
If the layer content is found in ALS,
c/storage.Store
creates layer using<ALS root>/.../info
as*c/storage.Layer
and using<ALS root>/.../diff
as its diff directory.So
c/image
's copier doesn't need to pull this layer from the registry.Changes in containers/image: #1109
Now
c/image
's copier leverages this store.Every time this pulls an image, it first tries to reuse blobs from ALS.
That copier passes each layer's OCI annotations (key-value pairs) + the following key-value to
c/storage.Store
.*c/image.storageImageDestination.TryReusingBlob()
cannot pass image reference toc/storage.Store
so this commit adds a new API `*c/image.storageImageDestination.TryReusingBlobWithRef() for achieving this.When this copier successfully acquires that layer, this reuses this layer without pulling.
Changes in containers/podman: none
Command exapmle
In the above cases,
c/storage.Store
looks up/tmp/storage/base64("reference=ghcr.io/stargz-containers/rethinkdb:2.3.6-esgz")/base64(<layer digests>)/{diff, info}
in ALS.The example filesystem implementation (https://github.com/ktock/stargz-snapshotter/tree/als-pool-example) is mounted at
/tmp/storage
shows the extracted layer at that location.Then
rethinkdb:2.3.6-esgz
can run without pulling it from registry.Known limitation and request for comments
Some operations (e.g. save) requires correct value to be set to
c/storage.Layer.UncompressedSize
. This field seems to be the size of the layer but without compression (i.e. the size of the tar-archived format of that layer). For registry-backed ALS, getting this information is difficult because neither of OCI/Docker image nor registry API provides the way to get the uncompressed size of layers. We cannot get this information without actually pull and decompress the layer, which is not lazy pulling that this PR aims to.I'll check the codebase deeper to come up with the way to get this information from somewhere or the way to safely allow
c/storage.Layer.UncompressedSize
to be unknown during operations. But if someone has good idea for managing this, please let me know.cc: @siscia @giuseppe