-
Notifications
You must be signed in to change notification settings - Fork 68
stop reconciling workspace when PVC cleanup job fails #851
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
stop reconciling workspace when PVC cleanup job fails #851
Conversation
if err != nil && !k8sErrors.IsConflict(err) { | ||
return reconcile.Result{}, err | ||
} | ||
if workspace.Status.Phase != dw.DevWorkspaceStatusError { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are potentially other places where the workspace status phase may be set to Error
that do not concern the common PVC cleanup job.
For example, finalizeServiceAccount()
may set the workspace status phase to Error
. Furthermore, it is possible that in the future, other unrelated conditions may set the workspace's status phase to Error
.
Thus, it might be best to also check the workspaces condition message for "Failed to clean up DevWorkspace storage"
in the finalize function (unless this bug also occurs with finalizeServiceAccount()
, which may be likely? 🤔 )
pkg/provision/storage/cleanup.go
Outdated
Args: []string{ | ||
"-c", | ||
fmt.Sprintf(cleanupCommandFmt, path.Join(pvcClaimMountPath, workspaceId)), | ||
"exit 1", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This commit/change needs to be removed before the PR can be merged. It's only here to facilitate testing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
This PR is pretty far behind main
right now, and does not include the PVC cleanup changes -- I rebased on main while testing to verify how this works with "cleanup PVC on deletion".
Also, don't forget to remove the temporary patch before merging :)
I've tested the PR with the provided instructions, and it is working 👍
is being printed quite often, but that is expected since new cleanup pods will start and will fail continuously for our test case, is that right? |
Thanks for testing @dkwon17 :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, please do not forget to replace exit 1 before merging
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: amisevsk, AObuchow, ibuziuk The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@AObuchow , sorry, I meant to write my comment: #851 (comment) on your other PR: #846 I just tested this PR and it is working for me , as I only see the
log only once 👍 |
Fix devfile#845 Signed-off-by: Andrew Obuchowicz <aobuchow@redhat.com>
5203833
to
28a3599
Compare
New changes are detected. LGTM label has been removed. |
Awesome, thanks for confirmation @dkwon17 :) |
What does this PR do?
Currently, in the reconcile loop, there is a bug with marking a workspace as having an error after the common PVC cleanup job fails.
From my understanding, after detecting that the cleanup job has failed and returning a
ProvisionError
fromstorageProvisioner.CleanupWorkspaceStorage()
, the workspace's status phase is set toError
. However, the reconcile function checks for deleted workspaces before checking for workspaces with errors/failures. The existing logic in thefinalize
function (which is called when reconciling deleted workspaces) then overwrites the workspace's status phase (setting it toTerminating
) and eventually runsstorageProvisioner.CleanupWorkspaceStorage()
again, leading to an endless loop.My current fix simply modifies the
finalize
function to check if the workspace's status phase is set toError
, and if so, it does not overwrite the workspace's status phase and returns early.What issues does this PR fix or reference?
Fix #845
Is it tested? How?
kubectl delete dw theia-next -n $NAMESPACE
Error
phase by doingkubectl get devworkspace -n $NAMESPACE
:PR Checklist
/test v8-devworkspace-operator-e2e, v8-che-happy-path
to trigger)v8-devworkspace-operator-e2e
: DevWorkspace e2e testv8-che-happy-path
: Happy path for verification integration with Che