Skip to content

Provide ability to configure backup job retry limit in DevWorkspaceOperatorConfig #1579

@rohanKanojia

Description

@rohanKanojia

Description

Currently, the backup jobs created by the operator do not specify a backoffLimit, causing them to default to the Kubernetes standard of 6 retries. When a backup fails, this results in the creation of multiple failing pods (e.g., devworkspace-backup-xxxxx), which can clutter the namespace and consume unnecessary resources.

We need the ability to configure the .spec.backoffLimit for these backup jobs, ideally through the DevWorkspaceOperatorConfig (DWOC)'s backupConfig, to allow users to control the retry behavior.

Current failing backup pod behavior:

NAME                                     READY   STATUS    RESTARTS   AGE
devworkspace-backup-wwmkr-2fl56          0/1     Error     0          69s
devworkspace-backup-wwmkr-86g6g          0/1     Error     0          2m32s
devworkspace-backup-wwmkr-v6d4p          0/1     Error     0          3m39s
devworkspace-backup-wwmkr-vqxxh          0/1     Error     0          3m53s
devworkspace-backup-wwmkr-znz7k          0/1     Error     0          3m16s

Acceptance Criteria

  • Add a new field to the DevWorkspaceOperatorConfig (DWOC) to define the backoffLimit for backup jobs.
  • Update the backupcronjob_controller.go to inject this configured value into the Job .spec.backoffLimit.
  • If no value is specified in the DWOC, the system should either use the Kubernetes default (6) or a safe internal default.
  • Confirm that setting a lower backoffLimit (e.g., 1 or 2) successfully limits the number of pods created upon backup failure.

Additional Context

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions