-
Notifications
You must be signed in to change notification settings - Fork 67
Open
Description
Description
Currently, the backup jobs created by the operator do not specify a backoffLimit, causing them to default to the Kubernetes standard of 6 retries. When a backup fails, this results in the creation of multiple failing pods (e.g., devworkspace-backup-xxxxx), which can clutter the namespace and consume unnecessary resources.
We need the ability to configure the .spec.backoffLimit for these backup jobs, ideally through the DevWorkspaceOperatorConfig (DWOC)'s backupConfig, to allow users to control the retry behavior.
Current failing backup pod behavior:
NAME READY STATUS RESTARTS AGE
devworkspace-backup-wwmkr-2fl56 0/1 Error 0 69s
devworkspace-backup-wwmkr-86g6g 0/1 Error 0 2m32s
devworkspace-backup-wwmkr-v6d4p 0/1 Error 0 3m39s
devworkspace-backup-wwmkr-vqxxh 0/1 Error 0 3m53s
devworkspace-backup-wwmkr-znz7k 0/1 Error 0 3m16s
Acceptance Criteria
- Add a new field to the
DevWorkspaceOperatorConfig(DWOC) to define thebackoffLimitfor backup jobs. - Update the
backupcronjob_controller.goto inject this configured value into the Job.spec.backoffLimit. - If no value is specified in the DWOC, the system should either use the Kubernetes default (6) or a safe internal default.
- Confirm that setting a lower
backoffLimit(e.g., 1 or 2) successfully limits the number of pods created upon backup failure.
Additional Context
Metadata
Metadata
Assignees
Labels
No labels