-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal: References support #3716
Comments
@akashsinghal - it might be good to share the implementation design and details which you have been working on |
@Jamstah @sajayantony The proposal in general makes sense to me. In fact it's very similar to the implementation we were playing around with here: https://github.com/oci-playground/distribution.
I think both changes make sense since it'll allow for artifact type filtering without reading each artifact manifest. And, storing the reference link files under the manifest revision make link file add and delete more intuitive. We didn't get to implementing the DELETE path but one pain point is probably going to be the reverse lookup of the referrer existing in any of the manifest revisions if that referrer manifest revision is deleted. Without some optimizations such as ref count maps, it's going to require a full scan every time which isn't really feasible (maybe the compromise here is to accept that there will be dangling referrers links left that will need to be cleaned up periodically). One question: I don't quite understand why OCI-Playground WorkFor context for others, here's a summary of the functionality we built into the oci-playground to support reference link files (definitely rough around the edges): To power the referrers API, the The
The repository store layout is represented as: <root>
└── v2
└── repositories
└── nginx
└── _manifests
└── revisions
└── sha256
└── 111ma2d22ae5ef400769fa51c84717264cd1520ac8d93dc071374c1be49a111m
└── link Push a signature as blob and an OCI Artifact that contains a blobs property referencing the signature, with the
On
Push another OCI Artifact with the following properties:
This results in an addition to the index as shown below.
We implemented the extra referrer operations as extension to the distribution (this was prior to the current proposal). The associated functionality add can be found here Happy to explain the prototype further if you guys have any more questions. cc: @aviral26 |
What makes you say this? If you can create references when a referrer is added to the registry, you can delete those same references when you remove it. |
Lets say you have an existing registry installation and upgrade to a version that supports references. The registry needs to know that on startup it needs to scan all the existing images for any with a refers tag and add them to the filesystem. If it doesn't do this, those references don't work. If that file doesn't exist, the registry knows to do that work, and create it. If the registry is creating a brand new filesystem structure, it knows it can just create that file because there are no manifests to work with. |
That's no coincidence, I had seen and browsed through that code (as mentioned in the issue) and thought it was a good approach, like you say though, I thought some tweaks were worth considering. I'm sure if/when I attempt an implementation, I'll be back to look at that repo again, maybe lift some code, and of course, make sure the authors appear in the changelog somehow :) |
@Jamstah Thanks for the clarification. Makes sense now. Yeah now that I think about it, it's not a full scan because when deleting the referrer we'd know the referee so we'd know where to clean up too. Disregard my comment :) |
@Jamstah Looks a good proposal to me - the one niggling thought I have is if artifactType could have any chars that could cause problems for a particular storage driver and would therefore need coding around. The one springs to mind that may come up in a mediaType is a I did a sanity check for S3 as that's what I use and while it's documented as a char that may need special handling it all seems to works fine with the GO s3 lib 👍 |
The client SHOULD define an artifactType, which means the artifactType MAY be missing (zero length string in Go). So you'll want an extra case to map that to a reserved string (anything that isn't valid as a future IANA media type would work). The artifactType will also include a |
What is the reason for not making the referrer link file in that case? I assume it's to prevent the referrer list API from yielding referrers for a referree which does not exist. If my assumption isn't wrong, would it not be more efficient to create the referrer link file anyway but check the referree is valid when listing referrers? As opposed to having to create all of the referrer links in the manifest PUT handler. |
It would be to stop that manifest from showing up in the list of manifests for that repository, because we don't have the content of the manifest to serve. You could still look up the referrers for the referee, because that query would specify the referee digest to look up. Edit: You specifically said "why not make the referrer link file?" - that's where I hadn't written it clearly, its not the referrer link file we don't create, its the link file for the manifest blob of the subject. |
I've edited the proposal to try and be clearer, it took me a while to remember why I said that :) |
I also just edited to use the new "subject" language instead of "referee" which was hard to read. I think the garbage collection section probably needs thought too, I'm not sure if that is backwards - if a digest is being referred to (is a subject) then it shouldn't be collected. |
New addition based on slack discussions with @brackendawson
@sudo-bmitch |
@Jamstah |
For the garbage collection, its a bit tricky based on what the reference is for. My thoughts have gone like this but I would appreciate input! I think we can classify references in two ways:
If a subject is tagged, it isn't suitable for garbage collection regardless of existence of its references. If a subject is untagged, but it has a tagged referrer that is used for indirection, we WOULD NOT want to garbage collect the subject so that the indirection still worked. If a referrer used for indirection is not tagged but it has a subject that is tagged, we MIGHT want to garbage collect the referrer because it isn't tagged. To me, the sensible approach is:
Then the GC rules around references would be:
Which means:
This is based on the assumption that references for indirection are a valid use case. It might also be sensible to just say "manifests lists are for indirection, so we don't consider the indirection case here". |
Thanks, I was only looking at the distribution spec 🙈 |
The current garbage collection feature doc is out of date because it doesn't mention the optional The default garbage collect only deletes unreferenced blobs and I think that will continue to work fine for blobs referenced by artefact manifests. So what we are considering here is when the administrator of the registry is requesting to delete untagged:
That's entirely fine.
Now it's less clear, consider if a subject is untagged and has one untagged referrer. The first GC run would delete the subject but not the referrer. A second GC run without any changes would then delete the referrer. Should the result of immediately re-running GC not be that there is no change? I think in the context where we've been asked to "delete untagged" we should only delete a subject if we would also delete its referrers, and we should only delete a referrer if we would only delete it's subject, or if the subject does not exist. I think your second clause should say "If a referrer is untagged, only GC if its reference is untagged (because we don't want to remove metadata for active subjects)" because in the context of a GC with delete untagged, an untagged subject is not active, it's is about to be removed. |
I think GC can use a rewrite in general. It's not safe to run on active servers, and the tagged check isn't recursive through manifest list to manifest. In other words, it will delete an untagged manifest that's referenced by a tagged manifest list. If that's rewritten, then the recursion through the manifests can also include a check for all referrers to each manifest. |
@sudo-bmitch that's a good point. But I think we should still define what the behaviour for GCing artifacts is and implement it in the existing offline GC if artifact support is added before a re-written GC. The mis-handling of manifest lists is probably a bug. I guess this also means a subject of a referrer should be considered tagged if it is referenced in a manifest list which is tagged. The fix for manifest lists should apply to the artifact path when it needs to know if a subject is tagged too. |
See #3178 for details on the multi platform GC. The code itself is pretty straightforward and the issues seem baked into the design. |
I was wondering if we should treat manifest lists as though they were referrers, and treat their references the same, even include them in the referrers API calls. That would solve the GC problems and allow clients to see all referrers easily. That might be something for a spec pr one way or the other.
I don't believe that expectation exists. N stage GC is acceptable to me, because we should fix GC to be able to run regularly. Eventually consistent is fine. |
On the topic of upgrading, I don't think distribution should look for a |
@brackendawson Is there any plan for References API support? |
We are waiting for the stable release to be cut. The whole reference effort by OCI has been plagued with broken ideas and endless discussions. There is still one contentious point that needs be resolved. |
Based on the distribution-spec which now contains details on references between digests within a repository from the reference types working group, I thought I'd write down some thoughts on how we could implement that in this project.
I wanted to outline how I thought this could work before starting any work on implementation.
There is already some work on an implementation here that takes an extension approach: https://github.com/oci-playground/distribution/
My proposal:
Handling manifests with refers fields
Manifests with refers fields are processed by the registry when they are accepted. This would mean the registry needs to be aware of the new
application/vnd.oci.artifact.manifest.v1+json
media type and the addition of therefers
field to the image spec.When a manifest referring to subjects is uploaded, we add a link file for every reference. They are stored by type because it is likely that clients will be interested in a specific type of reference and will be filtering the API calls based on type:
The artifact type is prefixed with
_
because the artifact type may be missing, and a consistent prefix stops there being an empty path segment. To cope with backend storage restrictions, the artifact type in the path will be encoded using url encoding.If the subject digest does not exist yet, we still create the folder structure to create the referrer link. We do not create a link file to the blob store where the content of the subject's manifest is held, because we do not have the content. This link file will be created if/when the subject is created later:
When a manifest referring to other digests is deleted (API or garbage collection), we perform the same in reverse and clean up references from the filesystem.
Deletion of subjects
When a manifest revision is deleted (API or garbage collection) that has
_referrers
, those_referrers
could be checked. If they are untagged and all their references do not exist in the cluster they could be deleted too. It may be better to leave this to garbage collection to avoid extended work when deleting manifests from the registry.Consistency and bootstrapping
Add a new command line option
clean-references
, which can go through the registry, delete any dangling_referrers
that point to artifacts that are not in the registry, and create any missing references by processing all manifests forrefers
fields.Upgrade
On upgrade of a registry instance to a version that includes references, we should run the bootstrap. This could be left to the user, or we could have an identifier in the filesystem used to trigger a bootstrap at startup (eg, if
v2/_references_bootstrapped
does not exist, bootstrap at startup and create it).Garbage collection
When garbage collecting, the subjects of an object being garbage collected need to be checked for existence. If at least one of them exists, the manifest should not be garbage collected.
New API
As documented by the reference types spec, we would add the referrers API. Results would be ordered by type and then digest based on the filesystem layout, which would be walked. This provides for consistent ordering and paging.
The text was updated successfully, but these errors were encountered: