-
Notifications
You must be signed in to change notification settings - Fork 91
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
improvements for status-write-retry #88
Conversation
@@ -57,16 +59,43 @@ func syncCluster( | |||
if !reflect.DeepEqual(cr.Status, oldStatus) { | |||
// Write back the status. Don't exit this handler until we | |||
// succeed (will block other handlers for this resource). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this could conceivably never exit. is that ok?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, or at least in the sense that I'm not sure there's much else we can reasonably do here... if we don't update the status on the object then there's no way we can correctly manage it in the future.
A couple of comments on that situation:
- This function not-returning will only block future handling of this specific cluster object. Other objects can still be handled. A corollary of this: the number of these tied-up functions we can have is capped by the number of virtual cluster objects. Since they each block a "normal" handler from running, the worst-case simultaneous number of handler functions is unchanged.
- It would be possible (with the change in this PR) to get out of that situation by deleting the object.
There are other ways to handle this situation without tying up a goroutine, and if these seem attractive/necessary we could create an issue to pursue them. For example:
- We could set ourselves a reminder of "don't allow any more changes to this object", save the desired status object (in memory), and schedule a periodic timer to kick off a retry of the status update. But I'm not sure this is notably different from using time.Sleep.
- Or we could (perhaps after some retries) set ourselves a reminder of "don't allow any more changes to this object" and then just mark the object as permanently corrupted.
I'm hoping this is a quite unusual error BTW. I've certainly never seen it without manually forcing the error case.
pkg/reconciler/cluster.go
Outdated
return | ||
} | ||
} | ||
if currentCluster.DeletionTimestamp != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
currentCluster could be nil here - if clurretClusterErr != nil and the if at line 78 fails
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
- Add backoff for retries. - Bail out if object is going away or has gone away.
Left over from when I was multiplying a Duration by a variable.
7809db3
to
a115489
Compare
issue #40
Add backoff for retries.
Bail out if object is going away or has gone away.