-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dynamic Per Resource Timeouts #19991
Dynamic Per Resource Timeouts #19991
Conversation
1b04d6e
to
a33f4b8
Compare
/test |
b2e9b54
to
3ecc331
Compare
nm about the GKE tests - missed the memo about those not working. Everything else seems to be ok test wise. |
706e7fd
to
4234e7e
Compare
/test |
Thanks yeah, I've been doing a lot of interactive rebasing. I just ended up mixing changes into commits which made reintegrating them back into the original very trick as they would depend on changes made later in somewhat unrelated commits. |
@christarazi I'll squash down the last commit once you're done reviewing. |
1e1aa6f
to
5d25fcf
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
K8s API changes look good.
@tklauser can I get another review when you have a moment. |
/test Job 'Cilium-PR-K8s-1.16-kernel-4.9' failed: Click to show.Test Name
Failure Output
If it is a flake and a GitHub issue doesn't already exist to track it, comment Job 'Cilium-PR-K8s-1.23-kernel-net-next' failed: Click to show.Test Name
Failure Output
If it is a flake and a GitHub issue doesn't already exist to track it, comment |
/test Job 'Cilium-PR-K8s-1.16-kernel-4.9' hit: #20217 (91.88% similarity) |
While waiting for init of k8s subsystem, timeouts will be calculated from either the start time, or the time of the last received event. Resources that may take longer to sync but do make process by receiving events will be less likely to crash the Pod. Fixes: cilium#18776 Signed-off-by: Tom Hadlaw <tom.hadlaw@isovalent.com>
To allow tracking k8s events per resource/action type, add event scope and action labels to events_ts metrics. Refactored all events_ts metrics into a single gauge vector. API metrics will be labelled with the url resource path and request type. Containerd metrics removed due to not being used. Also added test to ensure that cache with no controller doesn't wait. Signed-off-by: Tom Hadlaw <tom.hadlaw@isovalent.com>
5d25fcf
to
559ee6f
Compare
/test |
ICMP test keeps failing, rebasing to see if recent datapath changes fix that. |
1.23-kernel-net-next failing seems unrelated to branch |
/test |
@tommyp1ckles FYI, you can re-trigger individual tests using the trigger phrases mentioned in brackets next to their name. For example, to re-trigger just k8s-1.23-kernel-net-next you can use |
/test-1.23-net-next |
To prevent k8s resource types that take a long time to sync from prematurely crashing agent Pods upon init. Adding per resource timeouts that do not timeout sync unless the watcher has exceeded the timeout period after the last informer event of that type to be received.
That is, each watcher records the time of the last event, per API resource type. If the K8s-Synced-Timeout is reached, only timeout if event of that type was not received during the timeout period.
As well, to facilitate diagnosing such issues, adding more labels to EventsTS metrics and refactoring all uses of this metric into a single GaugeVec.
Fixes: #18776