New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add go runtime scheduler latency metrics #24745
Conversation
ee5068f
to
a2c16ca
Compare
17bb2fb
to
a1915f7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR!
It looks good to me. There are a couple of things that need to be addressed
- The checkpatch job in CI is failing on a "bad sign-off". I think this can be fixed by amending your commit by doing
git commit --amend --signoff
. - We should add a note in the upgrade guide for v1.14 that a new metric was added. You can add this inside
Documentation/operations/upgrade.rst
under the "Added Metrics" section.
a1915f7
to
0247c51
Compare
FYI @derailed in case you didn't notice, but the sign-off is from your personal email. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM from a (limited) docs perspective.
ffcfe93
to
18ee432
Compare
/test Job 'Cilium-PR-K8s-1.25-kernel-4.19' failed: Click to show.Test Name
Failure Output
Jenkins URL: https://jenkins.cilium.io/job/Cilium-PR-K8s-1.25-kernel-4.19/1669/ If it is a flake and a GitHub issue doesn't already exist to track it, comment |
/test-1.25-4.19 |
@derailed Some possible flakes on the tests, rerunning failed suites. |
18ee432
to
9aa365d
Compare
/test Job 'Cilium-PR-K8s-1.16-kernel-4.19' hit: #24697 (88.40% similarity) |
9aa365d
to
c1a6311
Compare
The Agent/Controller largely depends on various goroutines being scheduled on time to perform critical control plane tasks (i.e. like controllers, etc...) We want to be able to detect/alert goroutine scheduling latency, specifically when CPU contention is so great that things may not be running on time. - Added GOR scheduler latency metric to the default GO metrics collector Signed-off-by: derailed <fernand.galiana@isovalent.com>
c1a6311
to
a682929
Compare
/test |
Tests look good 🙏 |
The Agent/Controller largely depends on various goroutines being scheduled on time to perform critical control plane tasks (i.e. like controllers, etc...)
We want to be able to detect/alert goroutine scheduling latency, specifically when CPU contention is so great that things may not be running on time.
Signed-off-by: Fernand Galiana fernand.galiana@isovalent.com
Please ensure your pull request adheres to the following guidelines:
description and a
Fixes: #XXX
line if the commit addresses a particularGitHub issue.
Fixes: <commit-id>
tag, thenplease add the commit author[s] as reviewer[s] to this issue.