Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[bug]: Too old resource version #724

Closed
lkt82 opened this issue Feb 22, 2024 · 3 comments · Fixed by #735
Closed

[bug]: Too old resource version #724

lkt82 opened this issue Feb 22, 2024 · 3 comments · Fixed by #735
Labels
bug Something isn't working

Comments

@lkt82
Copy link

lkt82 commented Feb 22, 2024

Describe the bug

Hi.
After having a operator running for a few days I start getting these errors and the operator stops working.

fail: KubeOps.Operator.Watcher.ResourceWatcher[0]
There was an error while watching the resource "XXX".
k8s.KubernetesException: too old resource version: XXXXXX (XXXXXX)

To reproduce

Watch a custom resource for a few days

Expected behavior

The operator continues to work all the time

Screenshots

No response

Additional Context

Version 8.0.1

image

@lkt82 lkt82 added the bug Something isn't working label Feb 22, 2024
@gabrieledemaglie
Copy link

Hi,

I'm facing the same problem. If I leave the operator there without performing any operation for a while(the amount of time needed for the error to appear is random) the same error occurs to me.

I tried to study a bit the situation, this is my understanding:

When the operator wants to watch a list of resources it has to perform a LIST operation. It will obtain the current status of the asked resources. The resourceVersion obtained is used to start a WATCH (and saved in the "_lastResourceVersion" property of ResourceWatcher class).

Every time an event related to the watched resources occurs, the "_lastResourceVersion" property is updated.

When the current WATCH operation is too old, a new WATCH must be started: the saved resourceVersion will be used:

  • If the operator has refreshed recently the "_lastResourceVersion" property, the WATCH will be started again with no issues.
  • If the operator did nothing for a while, the "_lastResourceVersion" property will contain a version that is not accepted anymore from the API Server.

Currently this leads to an infinite loop where the same invalid ResourceVersion is continuously used.
There should be something like a try/catch surrounding the watch() call that set to NULL the "_lastResourceVersion" property in case of the "resource version too old" error.

@buehler am I correct?

Thanks for your work.
Gabriele

@lkt82
Copy link
Author

lkt82 commented Feb 28, 2024

@gabrieledemaglie I think you are correct :)

@robertcoltheart
Copy link
Contributor

I'm facing this issue too, any resolution @buehler?

buehler pushed a commit that referenced this issue Mar 13, 2024
Fixes #724 

Restart the watch loop when we receive HTTP 410 Gone, which seems to
mirror what other k8s frameworks are doing (see: java).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants