Skip to content

DWO continues to reconcile workspace when common PVC cleanup job fails #845

@AObuchow

Description

@AObuchow

Description

When the common PVC cleanup job occurs, if the initial cleanup pod fails, up to 3 more cleanup pods will be created.
If all 4 cleanup pods fail, DWO will continuously log an error regarding being unable to clean up the workspace storage.

Restarting DWO does not fix the issue - it seems the only way to fix this issue is to uninstall and reinstall DWO on the cluster.

How To Reproduce

Steps to reproduce the behaviour:

  1. Modify the common PVC cleanup job spec (in pkg/provision/storage/cleanup.go) so that the created pods will fail (change the container args):
	Args: []string{
		"-c",
- 		fmt.Sprintf(cleanupCommandFmt, path.Join(pvcClaimMountPath, workspaceId)),
+		 "exit 1",
	},
  1. Start up DWO
  2. Create 2 workspaces that use the common PVC storage-class strategy
  3. Delete one of the workspaces so that the common PVC cleanup job will be run
  4. Wait for all the PVC cleanup job-related pods to fail
  5. DWO will now continuously log an error similar to the following:
{"level":"error","ts":1653690734.265899,"logger":"controllers.DevWorkspace","msg":"Failed to clean up DevWorkspace storage","Request.Namespace":"devworkspace-controller","Request.Name":"theia-next","devworkspace_id":"workspace542919afbaf744fa","error":"DevWorkspace PVC cleanup job failed: see logs for job \"cleanup-workspace542919afbaf744fa\" for details","stacktrace":"github.com/devfile/devworkspace-operator/controllers/workspace.(*DevWorkspaceReconciler).finalize\n\t/home/aobuchow/git/devworkspace-operator/controllers/workspace/finalize.go:63\ngithub.com/devfile/devworkspace-operator/controllers/workspace.(*DevWorkspaceReconciler).Reconcile\n\t/home/aobuchow/git/devworkspace-operator/controllers/workspace/devworkspace_controller.go:130\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/home/aobuchow/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.9.5/pkg/internal/controller/controller.go:298\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/home/aobuchow/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.9.5/pkg/internal/controller/controller.go:253\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/home/aobuchow/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.9.5/pkg/internal/controller/controller.go:214"}

Expected behaviour

DWO should stop trying to reconcile the workspace after a certain number of PVC cleanup job failures (or perhaps after the first failure?). The workspace should be marked as failed, and the above-mentioned error should stop being logged.

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions