-
Notifications
You must be signed in to change notification settings - Fork 902
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ability to replay deletion with Usage spec #5394
Conversation
// We do the deletion async and after some delay to make sure the usage is deleted before the | ||
// deletion attempt. We remove the finalizer on this Usage right below, so, we know it will disappear | ||
// very soon. | ||
time.Sleep(2 * time.Second) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the piece that I don't like most (which is why the PR is draft).
But without doing async with some delay, it becomes a chicken and egg problem, i.e. as long as the Usage is there, deletions will be blocked. /cc @bobh66
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree it's not ideal, but this is the same solution I ended up using for the python implementation which has been running for over a year in our deployments.
29f54e5
to
126da9c
Compare
Signed-off-by: Hasan Turken <turkenh@gmail.com> (cherry picked from commit b0dad402cbb5f939e068d3fa49c6a2260c1ed57d)
tested this implementation deletion with usage is in my case 56 minutes faster |
126da9c
to
b3a366f
Compare
I fixed unit tests and marked the PR as ready for review! /cc @bobh66 @negz I've also built and pushed an image with the latest state, which can be installed with: helm repo add crossplane-master https://charts.crossplane.io/master --force-update
helm upgrade --install crossplane --namespace crossplane-system crossplane-master/crossplane --devel --create-namespace --set image.repository=index.docker.io/turkenh/crossplane --set image.tag=v1.16.0-rc.0.53.gb3a366f2 --set "args={--debug,--enable-usages}" I'd appreciate some help with testing various cases, with or without |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM - just one question inline.
@@ -68,6 +68,9 @@ type UsageSpec struct { | |||
// Reason is the reason for blocking deletion of the resource. | |||
// +optional | |||
Reason *string `json:"reason,omitempty"` | |||
// ReplayDeletion will trigger a deletion on the used resource during the deletion of the usage itself, if it was attempted to be deleted at least once. | |||
// +optional | |||
ReplayDeletion *bool `json:"replayDeletion,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense to default this to true
? That would provide the best user experience. I don't know why someone wouldn't want this to happen, but it's good to have the option to disable it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer to start with false
and consider enabling it by default while bumping the API to beta, where we will have more freedom to make behavioral changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added as a todo item to the promote to beta ticket: #4622
// very soon. | ||
time.Sleep(2 * time.Second) | ||
// We cannot use the context from the reconcile function since it will be cancelled after the reconciliation. | ||
_ = r.client.Delete(context.Background(), used) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this have to consider passing the right DeleteOptions for foreground propagationPolicy ?
What if the used resource is from a different composition than the Usage resource and the by/using resource and happens to be an XR composite from a claim with Foreground compositeDeletionPolicy. In that case, letting the claim reconciler retrying the delete with appropriate deleteOptions seems to be the right thing ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this have to consider passing the right DeleteOptions for foreground propagationPolicy ?
That's a good point - it would make sense to use the same DeleteOptions, and in my python implementation of this feature I was using Foreground deletion here by default (buried in a method default parameter so I wasn't thinking about it). Theoretically we're just trying to wake up the GC process early, since it has already marked things for deletion and set finalizers I don't think the propagation policy matters here, but it is possible that it could make a difference. Probably better to set it than not. I think we could check for a foregroundDeletion
finalizer on the resource and if it exists then retry with Foreground
propagation policy otherwise let it default to Background
.
I don't know if retrying with the Claim reconciler will affect the lower-level resources that GC is trying to delete. The delete call on the Claim would need to trigger GC on the lower-level resources and I don't know if that happens or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Theoretically we're just trying to wake up the GC process early, since it has already marked things for deletion and set finalizers I don't think the propagation policy matters here, but it is possible that it could make a difference. Probably better to set it than not. I think we could check for a foregroundDeletion finalizer on the resource and if it exists then retry with Foreground propagation policy otherwise let it default to Background.
Did you mean checking for the foregroundDeletion finalizer presence on the parent of used
resource (from its ownerReferences) ? the used resource itself wouldn't have either metadata.deletionTimestamp
or the foregroundDeletion finalizer set on it at this stage, as the presence of Usage would have rejected the previous deletion attempts.
I don't know if retrying with the Claim reconciler will affect the lower-level resources that GC is trying to delete. The delete call on the Claim would need to trigger GC on the lower-level resources and I don't know if that happens or not.
If the used/of
resource referred in the Usage happens to be a nested XR or MR composed as part of a composite (child resources with metadata.OwnerReferences), then only it is possible for the K8s GC Controller to be the one attempting delete, right ?
If the used/of
resource referred in the Usage happens to be a parent XR(without any metadata.OwnerReferences set) from a different composition than the using/Usage itself, the flow of deletion would be :
External Client issues delete on the Claim with Foreground compositeDeletionPolicy -> Claim reconciler attempts to delete the XR with foreground propagation policy -> delete fails due to Usage -> claim reconciler keeps attempting delete on XR with requeues.. there are no GC controller invoked deletes on such a parent XR, in this scenario..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call @ravilr 👍
I updated the PR to replay with the same propagation policy.
8b8462e
to
a60f432
Compare
Signed-off-by: Hasan Turken <turkenh@gmail.com>
a60f432
to
ef512a0
Compare
Backport failed for Please cherry-pick the changes locally and resolve any conflicts. git fetch origin release-1.14
git worktree add -d .worktree/backport-5394-to-release-1.14 origin/release-1.14
cd .worktree/backport-5394-to-release-1.14
git switch --create backport-5394-to-release-1.14
git cherry-pick -x b3a366f2036abade23e5af30cb2e3e9f4934c3e3 ef512a0950ad0f4adc8ab4b0a5965020ae4385c7 |
Backport failed for Please cherry-pick the changes locally and resolve any conflicts. git fetch origin release-1.15
git worktree add -d .worktree/backport-5394-to-release-1.15 origin/release-1.15
cd .worktree/backport-5394-to-release-1.15
git switch --create backport-5394-to-release-1.15
git cherry-pick -x b3a366f2036abade23e5af30cb2e3e9f4934c3e3 ef512a0950ad0f4adc8ab4b0a5965020ae4385c7 |
Description of your changes
This PR adds a new spec field to the
Usage
resource, namelyreplayDeletion
, which would enable replaying a blocked deletion attempt right after theUsage
is gone.When a deletion is blocked, the webhook will add an annotation to the used resource about that.
When the Usage is being deleted, if
replayDeletion
is set totrue
, the controller will trigger adelete
request on the used object if there is such annotation.A possible fix for #5193
I have:
make reviewable
to ensure this PR is ready for review.backport release-x.y
labels to auto-backport this PR.Need help with this checklist? See the cheat sheet.