Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pkg/blobinfocache: Split implementations into subpackages #602

Merged
merged 1 commit into from
Mar 13, 2019

Conversation

wking
Copy link
Contributor

@wking wking commented Mar 10, 2019

This allows consumers who don't need the defaulting logic to pick their desired implementation without pulling in unrelated dependencies. For example, it allows you to consume the memory or NoCache implementations without pulling in github.com/boltdb.

The blobinfocache API remains unchanged, but its NoCache variable and New* getters have been deprecated in favor of the new subpackages. Once consumers have migrated, we can drop the deprecated API.

Having single-implementation subpackages also allows for more generic names (e.g. NewMemoryCache is now just New), because memory.New is clear enough while memory.NewMemoryCache stutters.

Copy link
Member

@vrothberg vrothberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, @wking! Just a small nit/question. Besides that, LGTM!

@@ -1,4 +1,5 @@
package blobinfocache
// Package none implements a dummy BlobInfoCache which records no data.
package none
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

none.NoCache seems redundant. Could we leave this in the blobinfocache package? It also seems convenient to have a dummy implementation in the same package as the interface.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

none.NoCache seems redundant.

I'm fine renaming NoCache, but couldn't think of anything I liked better.

Could we leave this in the blobinfocache package?

No, because that package has DefaultCache, which pulls in all the implementations. So NoCache there would require NoCache consumers vendor (or have non-vendor access to) the BoltDB libraties.

It also seems convenient to have a dummy implementation in the same package as the interface.

The interface is over in types ;).

@mtrmac
Copy link
Collaborator

mtrmac commented Mar 11, 2019

This allows consumers who don't need the defaulting logic to pick their desired implementation without pulling in unrelated dependencies. For example, it allows you to consume the memory or NoCache implementations without pulling in github.com/boltdb.

Sure, this is conceptually cleaner, but what are you ultimately trying to do?

The cache works much better when everyone updates the same one, so almost everyone should be using DefaultCache (that the in-memory cache is currently used only as a fallback, and in tests). Providing the individual cache implementations as user-selectable components is a bit counter-productive in that sense.

In particular, I really don’t want users of this library choosing the memory cache just because they can avoid a dependency on BoltDB.

Do you have a special use case that really shouldn’t be using the persistent cache? (Sure, there could well be one; I just want to make sure that the dependency is not the only motivation.)

The blobinfocache API remains unchanged, but its NoCache variable and New* getters have been deprecated in favor of the new subpackages. Once consumers have migrated, we can drop the deprecated API.

(FWIW so far there hasn’t been any strong demand for not breaking the API (I did ask; and sure, a string demand can come at any time, including from you), so we’ve historically been preferring a clean API to compatibility wrappers.)

Copy link
Collaborator

@mtrmac mtrmac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See the earlier comment, most of all I’m unsure whether there is a legitimate user for the split.

@@ -58,7 +58,7 @@ func TestGetPutManifest(t *testing.T) {
func TestGetPutBlob(t *testing.T) {
ref, tmpDir := refToTempDir(t)
defer os.RemoveAll(tmpDir)
cache := blobinfocache.NewMemoryCache()
cache := memory.New()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Absolutely non-blocking: memory.New() can be anything, and boltdb.New() is even a bit confusing. If we are retaining NoCache, maybe NewCache would still be worth it? )

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Absolutely non-blocking: memory.New() can be anything...

But the import path is github.com/containers/image/pkg/blobinfocache/memory, which includes "cache". And the receiving cache here also reminds readers about what this is. Folks who want to include more information are free to name their import:

import (
  blobinfomemorycache "github.com/containers/image/pkg/blobinfocache/memory"
)
...
cache := blobinfomemorycache.New()

So I like New. But I'm not a maintainer; if you want me to make it NewCache or whatever, just tell me to change it ;).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did say that it is absolutely non-blocking :)

Just something to consider, and somewhat a counterweight to Valentin’s naming concerns.

pkg/blobinfocache/deprecated.go Outdated Show resolved Hide resolved
pkg/blobinfocache/prioritize/prioritize.go Outdated Show resolved Hide resolved
pkg/blobinfocache/test/test.go Outdated Show resolved Hide resolved
@wking
Copy link
Contributor Author

wking commented Mar 11, 2019

In particular, I really don’t want users of this library choosing the memory cache just because they can avoid a dependency on BoltDB.

Nope, that's exactly why I worked this up ;).

Do you have a special use case that really shouldn’t be using the persistent cache? (Sure, there could well be one; I just want to make sure that the dependency is not the only motivation.)

openshift/installer#1286 is fetching a release image, looking in that filesystem for a release-manifests/image-references file, fetching the referenced machine-os-content image, and looking at its annotations. This is effectively a handful of HTTP requests, unmarshalling into manifests, and untar'ing some layers. The associated vendor bump is crazy though, pulling in BoltDB and even Prometheus (openshift/installer@e189de133ffc). Caching is nice, but in this case we're pulling some small manifests and one single, smallish layer:

$ podman pull quay.io/openshift-release-dev/ocp-release:4.0.0-0.7
$ podman save -o release.tar quay.io/openshift-release-dev/ocp-release:4.0.0-0.7
$ tar xvf release.tar
$ jq -r '.rootfs.diff_ids[]' 2f361c5c569b16577e706fa542ce415604d48c7fee50696f479eb02647e72d72.json 
sha256:20b48f7ee69e4b7f0633e7c6201b9dc54373747e8875d75cd99851e1f8186bc2
sha256:5975baa1f989ae582ded9d754602ed73e606cc2243078d31f89657b70fba4168
sha256:8e52d812c30eb9a92d03b931d769d62f24f26bc5362d31e0291d4131a49c1d31
sha256:2c53f1a826d3020c53d2eddf9a34f4c5e6e6200a2ebd3ef747b0cdd78fd4972f
sha256:7900974d6b5b9209c525f97f0e68d214168004d7082e9d3d45a0c9d1da473fbd
$ ls -lh 7900974d6b5b9209c525f97f0e68d214168004d7082e9d3d45a0c9d1da473fbd.tar 
-r--r--r--. 1 trking trking 471K Dec 31  1969 7900974d6b5b9209c525f97f0e68d214168004d7082e9d3d45a0c9d1da473fbd.tar

Caching seems like an excessive complication when we're that lightweight.

(FWIW so far there hasn’t been any strong demand for not breaking the API (I did ask; and sure, a string demand can come at any time, including from you), so we’ve historically been preferring a clean API to compatibility wrappers.)

I'm fine dropping the deprecated endpoints now if we have no compatibility requirements. Do we publish a change-log or something to help consumers migrate over breaking changes?

@wking wking force-pushed the blob-info-boltdb-separation branch 3 times, most recently from b6e6d91 to 4cd979b Compare March 11, 2019 20:32
wking added a commit to wking/containers-image that referenced this pull request Mar 11, 2019
This allows consumers who don't need the defaulting logic to pick
their desired implementation without pulling in unrelated
dependencies.  For example, it allows you to consume the memory or
NoCache implementations without pulling in github.com/boltdb.

I'd initially included blobinfocache wrappers to preserve backwards
compatibility with the old API while consumers updated to the new
APIs, but Miloslav said we don't have any backwards-compatibility
commitments at the moment [1].

Having single-implementation subpackages also allows for more generic
names (e.g. NewMemoryCache is now just New), because memory.New is
clear enough while memory.NewMemoryCache stutters.

[1]: containers#602 (comment)

Signed-off-by: W. Trevor King <wking@tremily.us>
@wking wking force-pushed the blob-info-boltdb-separation branch from 4cd979b to b2f8be9 Compare March 11, 2019 20:34
wking added a commit to wking/containers-image that referenced this pull request Mar 11, 2019
This allows consumers who don't need the defaulting logic to pick
their desired implementation without pulling in unrelated
dependencies.  For example, it allows you to consume the memory or
NoCache implementations without pulling in github.com/boltdb.

I'd initially included blobinfocache wrappers to preserve backwards
compatibility with the old API while consumers updated to the new
APIs, but Miloslav said we don't have any backwards-compatibility
commitments at the moment [1].

Having single-implementation subpackages also allows for more generic
names (e.g. NewMemoryCache is now just New), because memory.New is
clear enough while memory.NewMemoryCache stutters.

[1]: containers#602 (comment)

Signed-off-by: W. Trevor King <wking@tremily.us>
@mtrmac
Copy link
Collaborator

mtrmac commented Mar 12, 2019

In particular, I really don’t want users of this library choosing the memory cache just because they can avoid a dependency on BoltDB.

Nope, that's exactly why I worked this up ;).

I knew it! ;)

Do you have a special use case that really shouldn’t be using the persistent cache? (Sure, there could well be one; I just want to make sure that the dependency is not the only motivation.)

openshift/installer#1286 is fetching a release image, looking in that filesystem for a release-manifests/image-references file, fetching the referenced machine-os-content image, and looking at its annotations.

Caching is nice, but in this case we're pulling some small manifests and one single, smallish layer:

$ ls -lh 7900974d6b5b9209c525f97f0e68d214168004d7082e9d3d45a0c9d1da473fbd.tar 
-r--r--r--. 1 trking trking 471K Dec 31  1969 7900974d6b5b9209c525f97f0e68d214168004d7082e9d3d45a0c9d1da473fbd.tar

Caching seems like an excessive complication when we're that lightweight.

In principle, the cache could still help avoid re-downloading a differently-compressed version of that blob, but that could happen only if the consumer decompressing that layer explicitly used RecordDigestUncompressedPair, and besides, a single release process for a single coherent product is not all that likely to compress that blob in different ways.

Fair enough; please make sure all the individual constructors include

// Most users should call DefaultCache instead.

(or something even more strongly worded) and I’ll merge this.


I'm fine dropping the deprecated endpoints now if we have no compatibility requirements. Do we publish a change-log or something to help consumers migrate over breaking changes?

We haven’t been maintaining sufficiently detailed release notes so far.

Copy link
Collaborator

@mtrmac mtrmac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few nits.

pkg/blobinfocache/boltdb/boltdb_test.go Outdated Show resolved Hide resolved
pkg/blobinfocache/memory/memory.go Outdated Show resolved Hide resolved
pkg/blobinfocache/memory/memory_test.go Outdated Show resolved Hide resolved
This allows consumers who don't need the defaulting logic to pick
their desired implementation without pulling in unrelated
dependencies.  For example, it allows you to consume the memory or
NoCache implementations without pulling in github.com/boltdb.

I'd initially included blobinfocache wrappers to preserve backwards
compatibility with the old API while consumers updated to the new
APIs, but Miloslav said we don't have any backwards-compatibility
commitments at the moment [1].

Having single-implementation subpackages also allows for more generic
names (e.g. NewMemoryCache is now just New), because memory.New is
clear enough while memory.NewMemoryCache stutters.

[1]: containers#602 (comment)

Signed-off-by: W. Trevor King <wking@tremily.us>
@wking wking force-pushed the blob-info-boltdb-separation branch from b2f8be9 to d545a19 Compare March 12, 2019 20:24
@wking
Copy link
Contributor Author

wking commented Mar 12, 2019

Fair enough; please make sure all the individual constructors include

// Most users should call DefaultCache instead.

Done with b2f8be9 -> d545a19.

@mtrmac
Copy link
Collaborator

mtrmac commented Mar 13, 2019

Thanks!

@mtrmac mtrmac merged commit a911b20 into containers:master Mar 13, 2019
@wking wking deleted the blob-info-boltdb-separation branch April 25, 2019 19:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants