Skip to content

feat(kilobase): add backup watchdog CronJob#7766

Merged
h0lybyte merged 1 commit intodevfrom
trunk/backup-watchdog-1772935900
Mar 8, 2026
Merged

feat(kilobase): add backup watchdog CronJob#7766
h0lybyte merged 1 commit intodevfrom
trunk/backup-watchdog-1772935900

Conversation

@h0lybyte
Copy link
Member

@h0lybyte h0lybyte commented Mar 8, 2026

Summary

  • Adds a daily backup watchdog CronJob that monitors CNPG backup health and cleans up stale metadata
  • Health check: Verifies a backup succeeded in the last 48 hours. Exits non-zero if not (visible as failed Job in ArgoCD/monitoring). Catches silent failure streaks like the cert expiry incident.
  • Metadata cleanup: Deletes failed Backup CRDs older than 7 days and completed ones older than 60 days
  • Status report: Logs summary of backup counts and ScheduledBackup status

Details

  • Runs daily at 3:30 AM UTC (between 2 AM backup and 4 AM Sunday fire drill)
  • Full security hardening: pinned bitnami/kubectl:1.31.5, runAsNonRoot, readOnlyRootFilesystem, drop ALL caps, seccomp RuntimeDefault
  • Minimal RBAC: only get/list/delete on backups, get/list on scheduledbackups
  • Single file with SA + Role + RoleBinding + CronJob (follows existing patterns)

Test plan

  • Verify ArgoCD syncs the CronJob, SA, Role, and RoleBinding
  • Trigger manual run: kubectl create job --from=cronjob/backup-watchdog backup-watchdog-test -n kilobase
  • Confirm health check passes (recent backups are succeeding)
  • Confirm old failed Backup CRDs (>7d) are deleted
  • Confirm completed Backup CRDs >60d are deleted
  • Verify status report in logs

… cleanup

Runs daily at 3:30 AM UTC. Detects silent backup failures (exits non-zero
if no successful backup in 48h) and prunes stale Backup CRDs (failed >7d,
completed >60d). Follows existing hardening patterns (pinned image, full
security context, minimal RBAC).
@github-actions
Copy link
Contributor

github-actions bot commented Mar 8, 2026

Dependency Review

✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.

Scanned Files

None

@h0lybyte h0lybyte merged commit ff00777 into dev Mar 8, 2026
5 checks passed
@h0lybyte h0lybyte deleted the trunk/backup-watchdog-1772935900 branch March 8, 2026 03:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant