Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

improvements for status-write-retry #88

Merged
merged 5 commits into from
Oct 16, 2018

Conversation

joel-bluedata
Copy link
Member

issue #40

  • Add backoff for retries.

  • Bail out if object is going away or has gone away.

@@ -57,16 +59,43 @@ func syncCluster(
if !reflect.DeepEqual(cr.Status, oldStatus) {
// Write back the status. Don't exit this handler until we
// succeed (will block other handlers for this resource).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this could conceivably never exit. is that ok?

Copy link
Member Author

@joel-bluedata joel-bluedata Oct 11, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, or at least in the sense that I'm not sure there's much else we can reasonably do here... if we don't update the status on the object then there's no way we can correctly manage it in the future.

A couple of comments on that situation:

  • This function not-returning will only block future handling of this specific cluster object. Other objects can still be handled. A corollary of this: the number of these tied-up functions we can have is capped by the number of virtual cluster objects. Since they each block a "normal" handler from running, the worst-case simultaneous number of handler functions is unchanged.
  • It would be possible (with the change in this PR) to get out of that situation by deleting the object.

There are other ways to handle this situation without tying up a goroutine, and if these seem attractive/necessary we could create an issue to pursue them. For example:

  • We could set ourselves a reminder of "don't allow any more changes to this object", save the desired status object (in memory), and schedule a periodic timer to kick off a retry of the status update. But I'm not sure this is notably different from using time.Sleep.
  • Or we could (perhaps after some retries) set ourselves a reminder of "don't allow any more changes to this object" and then just mark the object as permanently corrupted.

I'm hoping this is a quite unusual error BTW. I've certainly never seen it without manually forcing the error case.

return
}
}
if currentCluster.DeletionTimestamp != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

currentCluster could be nil here - if clurretClusterErr != nil and the if at line 78 fails

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

- Add backoff for retries.

- Bail out if object is going away or has gone away.
Left over from when I was multiplying a Duration by a variable.
@joel-bluedata joel-bluedata merged commit 4c27809 into bluek8s:master Oct 16, 2018
@joel-bluedata joel-bluedata deleted the joel-status branch October 22, 2018 22:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants