-
Notifications
You must be signed in to change notification settings - Fork 903
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support package base layer descriptor in package images #2932
Conversation
Refactors the image cache to accept and return an io.ReadCloser, which it expects to deliver a valid package YAML stream. The stream is stored as a gzipped file. Signed-off-by: hasheddan <georgedanielmangum@gmail.com>
Updates the package cache tests to reflect the use of gzipped YAML streams rather than full OCI images. Signed-off-by: hasheddan <georgedanielmangum@gmail.com>
Updates the mock package cache to implement the required Has method. Signed-off-by: hasheddan <georgedanielmangum@gmail.com>
Updates the path name builder to allow passing file extension. Signed-off-by: hasheddan <georgedanielmangum@gmail.com>
Updates the crank build command to explicitly pass the .xpkg extension to the path builder. Signed-off-by: hasheddan <georgedanielmangum@gmail.com>
Refactors the image parser backend to not depend on cache and instead only deal with pulling and extracting YAML streams from images. The cache will now be managed separately from the image backend, and the image backend will only be called in the event we don't have the data in the cache. Signed-off-by: hasheddan <georgedanielmangum@gmail.com>
Updates image backend tests to remove caching functionality and exercise the layer annotation checks. Signed-off-by: hasheddan <georgedanielmangum@gmail.com>
Updates the revision reconciler to directly manage the cache instead of deferring the reponsibility to the image backend. The uncompressed file contents are read by the package parser and forwarded to the cache concurrently in the event that the contents are not already cached. Signed-off-by: hasheddan <georgedanielmangum@gmail.com>
Updates revision reconciler tests with direct caching such that image backend is no longer responsible for writing new packages to the cache. Signed-off-by: hasheddan <georgedanielmangum@gmail.com>
Updates the crossplane start entrypoint to use PackageCache instead of ImageCache. Signed-off-by: hasheddan <georgedanielmangum@gmail.com>
gzipFileReader allows returns an io.ReadCloser while still ensuring that the underlying file can be read, while the TeeReadCloser wraps TeeReader such that the Writer is closed when Close() is called. Signed-off-by: hasheddan <georgedanielmangum@gmail.com>
JoinedReadCloser joins an io.Reader and an io.Closer. It is typically used in the case where a ReadCloser is wrapped by another Reader, but the caller needs to be able close the underlying ReadCloser when finished. Signed-off-by: hasheddan <georgedanielmangum@gmail.com>
Updates the image backend for the package parser to only read the package file contents from an image or image layer, rather than reading the entire filesystem contents into memory just to be able to extract the package file contents. Signed-off-by: hasheddan <georgedanielmangum@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@muvaf @negz I updated to eliminate the tarfs
usage as mentioned previously and when using a non-annotated fat package (using hasheddan/provider-aws-fat:v0.0.2
for testing), the reduction in allocations is significant as we are no longer reading the whole filesystem into memory just to get the package.yaml
.
Note: these benchmarks are run only on
ImageBackend.Init
, not on the subsequent parsing, which is still quite inefficient due to how we are consuming the upstream k8s tooling.
With tarfs
:
馃 (revision) go test -bench=. -benchmem -memprofile memprofiletarfs.out -benchtime=20s
goos: linux
goarch: amd64
pkg: github.com/crossplane/crossplane/internal/controller/pkg/revision
cpu: Intel(R) Core(TM) i7-8750H CPU @ 2.20GHz
BenchmarkInit-12 15 1536714280 ns/op 291899693 B/op 237475 allocs/op
PASS
ok github.com/crossplane/crossplane/internal/controller/pkg/revision 55.692s
Without:
馃 (revision) go test -bench=. -benchmem -memprofile memprofile.out -benchtime=20s
goos: linux
goarch: amd64
pkg: github.com/crossplane/crossplane/internal/controller/pkg/revision
cpu: Intel(R) Core(TM) i7-8750H CPU @ 2.20GHz
BenchmarkInit-12 69 336935786 ns/op 4640937 B/op 5012 allocs/op
PASS
ok github.com/crossplane/crossplane/internal/controller/pkg/revision 29.939s
A comparison of some of the cached content sizes as well:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few thoughts, but nothing blocking. I like the refactoring to pull the cache out of the image backend. Thanks @hasheddan!
// verify that we aren't just using the first layer annotated as xpkg | ||
// base. | ||
if foundAnnotated { | ||
return nil, errors.New(errMultipleAnnotatedLayers) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense to fall back to using the artifacts from the 'flattened' image in this case? I'm thinking no; it would perhaps be more compatible, but also less predictable and in violation of the spec.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be a violation of the spec so I would concur that we should not fall back if multiple annotated layers 馃憤馃徎
// thread-safe manner. | ||
type ImageCache struct { | ||
// PackageCache stores and retrieves package content in a filesystem-backed | ||
// cache in a thread-safe manner. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I imagine changing from a cache of packages to a cache of YAML streams has some implications:
- We're slightly more committed to "the content of a Crossplane package" as far as the things the package manager needs to operate on a package being a single YAML stream. Probably not a problem?
- Folks updating to this implementation will presumably have their package caches invalidated. Is It worth adding logic to cleanup the old image based cache? IMO probably not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- I don't think moving to caching just the YAML stream necessarily prohibits us from extending the package format in the future -- all packages would just need to be pulled again, which shouldn't be a huge issue.
- I thought about this, but given that there is not a possibility for conflict with content previously written by the cache, I agree that the added complexity of trying to clean up is not worth it. One area I do want to note in the release notes is that if you had manually populated the cache, you'll need to repopulate it again with a different process. I don't think this is a common use case today though.
internal/xpkg/cache.go
Outdated
type ImageCache struct { | ||
// PackageCache stores and retrieves package content in a filesystem-backed | ||
// cache in a thread-safe manner. | ||
type PackageCache struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Naming nit: Aren't all hypothetical instances of Cache
(the interface) "a package cache"? What do you think about:
type Cache interface
->type PackageCache interface
type PackageCache struct
->type FilesystemPackageCache struct
(or similar)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great call out -- I think I'll opt for PackageCache
and FsPackageCache
馃憤馃徎
Get(id string) (io.ReadCloser, error) | ||
Store(id string, content io.ReadCloser) error |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we clarify what format content io.ReadCloser
is expected to contain - i.e. a gzipped xpkg compliant YAML stream? In particular the fact that it's gzipped wasn't obvious to me until I read the implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The content
here is actually not gzipped. Content comes into the cache uncompressed, Store
wraps the file being written to in a gzip writer, and Get
wraps the file being read in a gzip reader.
internal/xpkg/reader.go
Outdated
} | ||
|
||
// GzipFileReader constructs a new gzipFileReader from the passed file. | ||
func GzipFileReader(f afero.File) (io.ReadCloser, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this need to take afero.File
as an argument? At a glance it seems like an io.ReadCloser
would work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope -- good catch! This can just be a GzipReadCloser
馃憤馃徎
} | ||
|
||
var rc io.ReadCloser | ||
var err error |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this err
be more tightly scoped? I think it's only used in the if r.cache.Has
block, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Absolutely, I think this is a relic of a previous iteration, thanks for pointing out!
Changes gzipFileReader to gzipReadCloser and accepts a ReadCloser rather than an afero File. Signed-off-by: hasheddan <georgedanielmangum@gmail.com>
Uses more specific PackageCache for Cache and FsPackageCache implementation for filesystem backed package cache. Signed-off-by: hasheddan <georgedanielmangum@gmail.com>
3cd87f7
to
49d656c
Compare
Description of your changes
Updates the package manager (primarily the package revision controller) to honor
io.crossplane.xpkg: base
layers in Crossplane packages, as described in thexpkg
specification. A few notes on the implementation:ImageBackend
used for the package parser, which IMO was pretty bloated (and still is to some extent, but further improvements require restructuring parser interface incrossplane-runtime
), is refactored to only deal with fetching package images and extracting the YAML stream. It does not handle caching.io.crossplane.xpkg: base
, theImageBackend
will only download it for extracting the YAML stream. If no layer descriptor is annotated, the whole image is downloaded and flattened to extract the YAML stream, but only the stream contents are cached.ImageBackend
, which created a leaky separation of concern.Fixes #2888
I have:
make reviewable
to ensure this PR is ready for review.backport release-x.y
labels to auto-backport this PR if necessary.How has this code been tested
Tested with both existing packages (
registry.upbound.io/upbound/platform-ref-azure:v0.1.0
) and one with annotated layer descriptors (hasheddan/cool-layers:v0.0.3). Cache used a local directory mount:Also verified that unzipped contents of items in the package cache matched expected YAML stream.