Skip to content

fix: Failed jobs cleanup attempts are recorded as successful and skipped for a full interval#727

Closed
sam-saffron-jarvis wants to merge 1 commit into
SamSaffron:mainfrom
sam-saffron-jarvis:feat/codereview-5f093a61
Closed

fix: Failed jobs cleanup attempts are recorded as successful and skipped for a full interval#727
sam-saffron-jarvis wants to merge 1 commit into
SamSaffron:mainfrom
sam-saffron-jarvis:feat/codereview-5f093a61

Conversation

@sam-saffron-jarvis
Copy link
Copy Markdown
Contributor

What changed

  • updated maybeRunCleanup() to record lastCleanupCompleted only after pruneOldData() succeeds
  • preserved the existing cleanup interval semantics by storing the cleanup start time on success
  • added a regression test that forces cleanup to fail and verifies the next attempt is not suppressed

Why this is high-value

A transient cleanup failure was being treated as a successful cleanup run. That caused the scheduler to skip retention cleanup for the full configured interval, which defeats the existing error-backoff retry path and can let job_runs_v2 / job_run_events_v2 grow unchecked for hours or days. This fix ensures failed cleanup attempts keep retrying on the scheduler error delay until pruning succeeds.

Validation

  • gofmt -w cmd/serve_jobs_v2.go cmd/serve_jobs_v2_test.go
  • go build ./...
  • go test ./...
  • git diff --stat
  • git diff --check

@SamSaffron SamSaffron closed this May 28, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants