Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DO NOT MERGE] Use OCI store as blob cache #27

Draft
wants to merge 13 commits into
base: main
Choose a base branch
from

Conversation

qweeah
Copy link

@qweeah qweeah commented May 17, 2023

Instead of caching based on the fully qualified reference of an artifact, this PR provides another implementation of caching:

  • Remote registries are cached behind a local OCI layout proxy. Anything pulled from the registry will be cached into the proxy.
  • Manifest and layer blobs are persistently cached in the OCI layout proxy.
  • Tag resolutions are cached in-memory (TODO: add expiration).
  • Blob can be mounted across repositories.

To make the exmaples runnable, this PR renames some of the dependent charts so it cannot be merged.

qweeah added 13 commits May 15, 2023 11:42
Signed-off-by: Billy Zha <jinzha1@microsoft.com>
Signed-off-by: Billy Zha <jinzha1@microsoft.com>
Signed-off-by: Billy Zha <jinzha1@microsoft.com>
Signed-off-by: Billy Zha <jinzha1@microsoft.com>
Signed-off-by: Billy Zha <jinzha1@microsoft.com>
Signed-off-by: Billy Zha <jinzha1@microsoft.com>
Signed-off-by: Billy Zha <jinzha1@microsoft.com>
Signed-off-by: Billy Zha <jinzha1@microsoft.com>
Signed-off-by: Billy Zha <jinzha1@microsoft.com>
Signed-off-by: Billy Zha <jinzha1@microsoft.com>
Signed-off-by: Billy Zha <jinzha1@microsoft.com>
Signed-off-by: Billy Zha <jinzha1@microsoft.com>
Signed-off-by: Billy Zha <jinzha1@microsoft.com>
@vsoch
Copy link
Member

vsoch commented May 17, 2023

@qweeah this looks really interesting! I'm not familiar with stores - can you walk me through the differences, pros and cons for each approach?

@qweeah
Copy link
Author

qweeah commented May 18, 2023

@qweeah this looks really interesting! I'm not familiar with stores - can you walk me through the differences, pros and cons for each approach?

Sure. This PR utilizes the oci.Store provided via oras-go, which is an implementation of OCI layout spec.

The main improvement is on network traffic: In your implementation, files (layer blob with org.opencontainers.image.title) are cached based on the reference, which is a combination of names for registry, repository and tag. If two files in different artifacts have same content, they will still be fetched from remote registry twice, causing extra network overhead. Implementation in this PR utilize an OCI store to do content deduplication, blobs with same digest are guaranteed to be fetched only once from the remote registry.

Still, there are drawbacks, the files need to be unzipped from blob every time when it's requested.
Still, there are drawbacks, the files need to be copied from the OCI store every time when it's requested.

Will add some examples to make my point clearer.

@vsoch
Copy link
Member

vsoch commented May 18, 2023

If two files in different artifacts have same content, they will still be fetched from remote registry twice, causing extra network overhead.

What is the real chance of that? Also with the store - are we then mounting them the same file? That would have issue if, for example, you wanted to change one (but not the other) or if it's just crossing a user namespace.

Still, there are drawbacks, the files need to be unzipped from blob every time when it's requested.

Hmm and that doesn't seem ideal.

@sajayantony what do you think?

@qweeah
Copy link
Author

qweeah commented May 19, 2023

What is the real chance of that?

Well, considering how container images are built with layer blobs shared, the chances are high (if we don't want to exclude caching image artifacts from the scope this project).

Also with the store - are we then mounting them the same file? That would have issue if, for example, you wanted to change one (but not the other) or if it's just crossing a user namespace.

OCI store acts like a read-only cache, cache user provides digest and get what they want from the cache and don't need to modify files within the OCI store.

Hmm and that doesn't seem ideal.

Yes, but it's just copying the file and I guess it unavoidable if user wante to change their own copy but not effecting others.

@vsoch
Copy link
Member

vsoch commented May 19, 2023

OCI store acts like a read-only cache, cache user provides digest and get what they want from the cache and don't need to modify files within the OCI store.

So this would only support read only?

@qweeah
Copy link
Author

qweeah commented May 19, 2023

OCI store acts like a read-only cache, cache user provides digest and get what they want from the cache and don't need to modify files within the OCI store.

So this would only support read only?

OCI store is readonly and not accessible to user, what is mounted to the consumer pod is copied from the readonly store and safe to be modified.

@vsoch
Copy link
Member

vsoch commented May 19, 2023

Gotcha. I can't say I have a strong opinion about this change. I don't like needing to untar archives every time, and I don't like the added complexity, but I see what you are saying about having a sort of shared cache. That shared cache would still need to be unique unique namespace, so I'm not sure it would give us that much.

My preference tends to be to keep things simple until more complexity is afforded. Let's wait to hear what @sajayantony has to say! I do really appreciate that you opened this and started the discussion - let's make sure we decide on a good path forward!

@qweeah
Copy link
Author

qweeah commented May 19, 2023

@vsoch Just to clarify, by the files need to be unzipped from blob every time when it's requested I mean the work of copying the blob from the OCI store to artifact root (see related code here)

I didn't read through the code of pullBlob and I thought it was doing unzip for the blob.

@vsoch
Copy link
Member

vsoch commented May 19, 2023

I didn't read through the code of pullBlob and I thought it was doing unzip for the blob.

If a blob isn't an archive it shouldn't need to be unzipped? But maybe I'm missing a detail.

@qweeah
Copy link
Author

qweeah commented May 19, 2023

If a blob isn't an archive it shouldn't need to be unzipped? But maybe I'm missing a detail.

I didn't read through the code of pullBlob and I thought it was doing unzip for the blob. I know I was wrong now.

In you implementation:

  1. fetched the manifest;
  2. readout manifest content, for all named blobs:
    2.* fetch and save to artifact root

I used to think 2.* contains unzip, but turn out it's just simple read and save with verification.

Just to clarify that there is no unzip in both implementations. Updated my original comment to fill the gap.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants