Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add timeout_seconds option to Kubeclient::Client#watch_entities #624

Merged
merged 1 commit into from
Dec 27, 2023

Conversation

jperville
Copy link
Contributor

@jperville jperville commented Nov 21, 2023

This small PR adds a "timeout_seconds" option to the "Kubeclient::Client#watch_entities" method (not set by default).
The objective is to limit the duration of a watch to prevent zombification if the watch lasts for too long (see #512).

We see the zombification often on Azure kubernetes cluster, where our watch stays up but does not receive notifications anymore. By setting a timeout_seconds of X seconds we are sure that we are not zombified for more than X seconds.

Happy to take feedback to improve this PR.

@cben
Copy link
Collaborator

cben commented Nov 22, 2023

Huh. I was expecting this would set some network timeout settings on our side (we have these configurable for non-watch actionsy; watch was originally deliberately excluded to be unlimited) — but instead I see this is sending timeoutSeconds query param to the server.

https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.24/

timeoutSeconds Timeout for the list/watch call. This limits the duration of the call, regardless of any activity or inactivity.

Do you know how this differs from client-side timeouts? If you want a timeout on a watch, when would you want it server-side / client-side / both?

In any case, if k8s API takes this param, we should support it 👍
Please update README.md to explain this.

@jperville
Copy link
Contributor Author

Hello @cben , actually I am fighting trying to workaround zombification when watching the k8s API on Azure for changes in a CRD.

After a while, watch appears to still be up but we don't receive any more updates.

My first attempt was making the watch http_client persistent (by hacking Kubeclient::Common::WatchStream#build_client method) and it didn't solve my problem. The http gem does not allow to set a keep-alive on its underlying TCP socket that I know.

My second attempt uses this PR: I pass a timeoutSeconds when creating a watch. It limits the maximum duration of a watch, reducing (but not eliminating) the zombification. I think that this PR should be merged since timeoutSeconds is part of the kubernetes watch API.

What I'm going to try next:

  • limit watch duration to N seconds using timeoutSeconds (this PR)
  • add a livenessProbe which will force a restart of the pod if the watch is more than N + X seconds
    That will workaround the problem even if not the prettiest way.

@jperville
Copy link
Contributor Author

jperville commented Nov 22, 2023

@cben I just updated the README to mention how to force watches to have a maximum duration with the timeout_seconds keyword arguments.

If you know a way to make watches timeout on the client side if inactive I am interested.

EDIT : I just found out about Kubeclient::Informer and I managed to code something that works (even if the Azure K8S apiserver is very aggressive with connection resets).

@cben
Copy link
Collaborator

cben commented Dec 27, 2023

Oh wow, I dropped the ball on this one. LGTM, merging. 👍

Yeah, Kubeclient::Informer strives to handle watch restarts in a correct way, though it's new-ish and there might be loose ends (search issues for "informer").

[I have to confess it's been a long time since I've used kubeclient myself; I'm acting as limping-along-barely-maintainer and that's why I'm trying to "crowdsource" things like README etc to contributors but would be good if active user(s) stepped up to replace me...]

@cben cben closed this Dec 27, 2023
@cben cben reopened this Dec 27, 2023
@cben
Copy link
Collaborator

cben commented Dec 27, 2023

Close-cycled to re-run CI. It now fails to run on ruby 2.7:

ERROR:  Error installing bundler:
	The last version of bundler (>= 0) to support your Ruby & RubyGems was 2.4.22. Try installing it with `gem install bundler -v 2.4.22`
	bundler requires Ruby version >= 3.0.0. The current ruby version is 2.7.8.225.

but this passes tests on linux 2.7 locally.

@cben cben merged commit e6359eb into ManageIQ:master Dec 27, 2023
46 of 51 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants