Handle Watch expiration by resetting the resourceVersion #223
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
We've seen an increase in cases where the Watch (for a reason I don't quite understand yet - some of the things I've been reading suggest that there's maybe something re: k8s API caches and whatnot exipring) starts erroring out with exceptions like:
While in the past this seems to have generally always fixed itself with the pre-existing restart code, in the aforementioned cases we've seen this loop get stuck in a restart-loop that it we either never recover from until a human intervenes OR it takes long enough to recover that we start losing events.
This diff ensures that when we restart the watch loop, we restart as if we were starting a fresh Watch - this does mean that we'll re-process existing events as the current behavior of a
Watch().stream()
includes returning the current state, but that's fine since while GH issues mention that this isn't necessarily ordered, our state machine/transition logic should ensure that out-of-order events don't actually hurt us - at worst, we'll simply waste time.