New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pkg/ipam: Update histogram buckets for trigger metrics #25600
pkg/ipam: Update histogram buckets for trigger metrics #25600
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The change overall looks fine by me. But I don't know much about metric best practises. Is there any overlap with the ongoing work in #25256?
I don't think its related since we're only updating the histogram buckets. #25256 doesn't seem to touch bucket values. |
/test Job 'Cilium-PR-K8s-1.26-kernel-net-next' failed: Click to show.Test Name
Failure Output
Jenkins URL: https://jenkins.cilium.io/job/Cilium-PR-K8s-1.26-kernel-net-next/95/ If it is a flake and a GitHub issue doesn't already exist to track it, comment Then please upload the Jenkins artifacts to that issue. |
@gandro net-next and test-runtime seems to be flaky now ? I don't see the corresponding Jenkins jobs for failures. Are we cleaning up jenkins jobs in less than a day now ? |
We had a Jenkins outage last week and had to re-provision all Jenkins instances. During that process, we accidentally fell back to a Jenkins config to only retained the last 30 jobs. That has now been fixed to retain job logs up to 15 days. If you rerun those pipelines, you should see be able to access the failure now. |
Currently, trigger related histogram metrics in pgk/ipam use the default prometheus histogram buckets. Resync operation in cloud providers like Azure tend to take a long time and the current buckets are inadequate to track changes in behavior. This commit extends the buckets to allow for measuring longer durations. Signed-off-by: Hemanth Malla <hemanth.malla@datadoghq.com>
12f34e3
to
3410629
Compare
/test |
[CMPT-1682] Backport cilium#25600 to 1.11
Currently, trigger related histogram metrics in
pgk/ipam
use the default prometheus histogram buckets. Resync operation in cloud providers like Azure tend to take a long time and the current buckets are inadequate to track changes in behavior. This commit extends the buckets to allow for measuring longer durations.Currently, some metrics plateau at 10 secs
Reusing buckets defined in Kubernetes API server to measure request duration