fix: add retry logic to updateRecoveryWindow for concurrent status updates#759
Conversation
…dates When backup completion and retention policy enforcement run concurrently, both call updateRecoveryWindow to update the ObjectStore status. This can cause "object has been modified" errors due to Kubernetes optimistic concurrency control. This change wraps the status update in retry.RetryOnConflict, matching the pattern already used in setLastFailedBackupTime in the same file. The retry logic fetches a fresh copy of the ObjectStore before each update attempt, ensuring the resourceVersion is current. Fixes cloudnative-pg#758 Signed-off-by: Gabriel Mouallem <gabriel@latitude.sh>
|
Hi maintainers, Feel free to adjust this PR as needed and merge when ready. I may not have bandwidth to address review feedback promptly, so if there are minor changes required, please feel free to push directly to the branch or take over the PR entirely. The fix follows the existing pattern in Thanks for maintaining this project! |
|
Your approach is correct and fully consistent with project standards. Wrapping updateRecoveryWindow in retry.RetryOnConflict, fetching a fresh ObjectStore before each update, directly mirrors the established and accepted pattern used in setLastFailedBackupTime (introduced in PR #467). This retry logic is the recommended way to handle Kubernetes optimistic concurrency control errors during concurrent status updates, and it will resolve the transient resourceVersion errors you’ve observed when backup and retention enforcement overlap. This solution is confirmed as best practice in both the codebase and recent discussions (issue #758, PR #467). To reply, just mention @dosu. How did I do? Good | Irrelevant | Incorrect | Verbose | Hallucination | Report 🐛 | Other |
Problem
When backup completion and retention policy enforcement run concurrently, both call
updateRecoveryWindowto update the ObjectStore status. This causes transient errors due to Kubernetes optimistic concurrency control:Solution
Wrap the status update in
retry.RetryOnConflict, matching the pattern already used insetLastFailedBackupTimein the same file. The retry logic fetches a fresh copy of the ObjectStore before each update attempt, ensuring theresourceVersionis current.As noted in #758, @dosu confirmed this approach:
Changes
retry.RetryOnConflictwrapper toupdateRecoveryWindowclient.ObjectKeyFromObjectTesting
This change follows the exact same pattern as
setLastFailedBackupTime(lines 66-89 in the same file), which was added in #467.Fixes #758