Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make it easy to debug kube-apiserver client query behaviour #26673

Closed
joestringer opened this issue Jul 6, 2023 · 4 comments
Closed

Make it easy to debug kube-apiserver client query behaviour #26673

joestringer opened this issue Jul 6, 2023 · 4 comments
Labels
area/metrics Impacts statistics / metrics gathering, eg via Prometheus. kind/question Frequently asked questions & answers. This issue will be linked from the documentation's FAQ. sig/agent Cilium agent related. sig/k8s Impacts the kubernetes API, or kubernetes -> cilium internals translation layers.

Comments

@joestringer
Copy link
Member

The kube-apiserver client in the cilium-agent has a default ratelimiter that ensures that the node does not issue too many queries per second.

If Cilium starts to exceed this qps rate, how does a user debug this? Are there metrics that the user can observe to see the current qps rate? Is it easy to break these down by resource type so that the cause of the ratelimit trigger can be identified?

@joestringer joestringer added kind/question Frequently asked questions & answers. This issue will be linked from the documentation's FAQ. area/metrics Impacts statistics / metrics gathering, eg via Prometheus. sig/k8s Impacts the kubernetes API, or kubernetes -> cilium internals translation layers. sig/agent Cilium agent related. labels Jul 6, 2023
@joestringer
Copy link
Member Author

In #26586 , we see log messages in the cilium-agent logs like this:

2023-07-04T14:59:51.959400310Z level=info msg="Waited for 1.395439596s due to client-side throttling, not priority and fairness, request: GET:https://127.0.0.1:6443/apis/coordination.k8s.io/v1/namespaces/kube-system/leases/cilium-l2announce-kube-system-hubble-peer" subsys=klog

@ysksuzuki
Copy link
Member

If Cilium starts to exceed this qps rate, how does a user debug this? Are there metrics that the user can observe to see the current qps rate? Is it easy to break these down by resource type so that the cause of the ratelimit trigger can be identified?

We can visualize the extent of delays caused by the k8s client-side rate limiter.
#25555 (comment)

@ysksuzuki
Copy link
Member

FYI, the default config in client-go is a bit low. cilium-agent's QPS default is 5(client-go), and operator default is 20(controller-runtime).

client-go QPS default is 5
https://github.com/kubernetes/client-go/blob/b46677097d03b964eab2d67ffbb022403996f4d4/rest/config.go#L44

controller-runtime QPS default is 20
https://github.com/kubernetes-sigs/controller-runtime/blob/f6f37e6cc1ec7b7d18a266a6614f86df211b1a0a/pkg/client/config/config.go#L102

@joestringer
Copy link
Member Author

Awesome thanks @ysksuzuki ! Sounds like there's nothing left to do for this then.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/metrics Impacts statistics / metrics gathering, eg via Prometheus. kind/question Frequently asked questions & answers. This issue will be linked from the documentation's FAQ. sig/agent Cilium agent related. sig/k8s Impacts the kubernetes API, or kubernetes -> cilium internals translation layers.
Projects
None yet
Development

No branches or pull requests

2 participants