Skip to content

fix: disable retries on kubernetes-apply deploy jobs to preserve logs#147

Merged
vigneshrajsb merged 1 commit intomainfrom
fix/kubernetes-apply-job-no-retry
Mar 26, 2026
Merged

fix: disable retries on kubernetes-apply deploy jobs to preserve logs#147
vigneshrajsb merged 1 commit intomainfrom
fix/kubernetes-apply-job-no-retry

Conversation

@vigneshrajsb
Copy link
Copy Markdown
Contributor

Summary

  • Kubernetes-apply deploy jobs used backoffLimit: 3 + restartPolicy: OnFailure, which caused K8s to delete the pod when BackoffLimitExceeded was reached
  • This made logs unrecoverable — the JobMonitor couldn't fetch them, resulting in empty logs that skipped archival
  • Failed deploy jobs showed up in the UI with a disabled logs button since there was nothing to display
  • Changed to backoffLimit: 0 + restartPolicy: Never to align with native helm deploy and build jobs, which ensures the pod is preserved for log retrieval and archival

Test plan

  • Deploy a service with a bad manifest to trigger a kubernetes-apply failure
  • Verify the failed pod is preserved (not deleted by K8s)
  • Verify logs are archived to object store after failure
  • Verify the UI shows viewable logs for the failed deploy job

OnFailure restart policy with backoffLimit 3 caused K8s to delete the
pod on BackoffLimitExceeded, making logs unrecoverable and preventing
log archival. Align with native helm and build jobs which use
backoffLimit 0 and restartPolicy Never.
@vigneshrajsb vigneshrajsb requested a review from a team as a code owner March 26, 2026 18:18
@vigneshrajsb vigneshrajsb merged commit fcbf6dd into main Mar 26, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant