Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hubble: Add a metric for lost events #22865

Merged
merged 3 commits into from Mar 30, 2023

Conversation

lambdanis
Copy link
Contributor

@lambdanis lambdanis commented Dec 23, 2022

Currently numbers of lost events are visible in logs and in the observer, but they tend to be noisy and difficult to consume in such form. A Prometheus counter is much more suitable for this kind of information, thus this patch adds hubble_lost_events_total metric.

The metric is exposed always when any of the "regular" Hubble metrics is enabled. It's low cardinality, and most users will want some of the flows metrics in addition to lost events - so this shouldn't be a problem.

I left the log line with lost events count, but to reduce logs noisiness (it was reported to be noisy in the past) I reduced its level to Debug.

Fixes #15112

Add hubble_lost_events_total metric for the number of events lost by Hubble.

@maintainer-s-little-helper maintainer-s-little-helper bot added the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Dec 23, 2022
@github-actions github-actions bot added the kind/community-contribution This was a contribution made by a community member. label Dec 23, 2022
@rolinh rolinh added the release-note/minor This PR changes functionality that users may find relevant to operating Cilium. label Jan 10, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot removed the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Jan 10, 2023
@aanm aanm removed the kind/community-contribution This was a contribution made by a community member. label Jan 27, 2023
@github-actions
Copy link

This pull request has been automatically marked as stale because it
has not had recent activity. It will be closed if no further activity
occurs. Thank you for your contributions.

@github-actions github-actions bot added the stale The stale bot thinks this issue is old. Add "pinned" label to prevent this from becoming stale. label Feb 27, 2023
@github-actions github-actions bot removed the stale The stale bot thinks this issue is old. Add "pinned" label to prevent this from becoming stale. label Mar 2, 2023
@lambdanis lambdanis force-pushed the pr/lost-events-metric branch 3 times, most recently from 28543ba to 078b60a Compare March 22, 2023 15:23
@lambdanis lambdanis marked this pull request as ready for review March 22, 2023 15:34
@lambdanis lambdanis requested review from a team as code owners March 22, 2023 15:34
@rolinh rolinh added area/metrics Impacts statistics / metrics gathering, eg via Prometheus. sig/hubble Impacts hubble server or relay labels Mar 22, 2023
Copy link
Member

@rolinh rolinh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@michi-covalent
Copy link
Contributor

/test

@qmonnet
Copy link
Member

qmonnet commented Mar 24, 2023

Should there be a note in Documentation/operations/upgrade.rst about the new metric?

@lambdanis
Copy link
Contributor Author

/test-1.26-net-next

Currently numbers of lost events are visible in logs and in the observer, but
they tend to be noisy and difficult to consume in such form. A Prometheus
counter is much more suitable for this kind of information, thus this patch
adds hubble_lost_events_total metric.

Fixes cilium#15112

Signed-off-by: Anna Kapuscinska <anna@isovalent.com>
The number of lost messages is now exposed as a metric, which is more practical
than a log, so the log level can be reduced. Also rephrased the log message, to
indicate only the fact that a LostEvent message was received - quite often the
queue will be full immediately after that, so saying that Hubble is "back to
nromal" can be misleading.

Signed-off-by: Anna Kapuscinska <anna@isovalent.com>
Signed-off-by: Anna Kapuscinska <anna@isovalent.com>
@lambdanis
Copy link
Contributor Author

lambdanis commented Mar 27, 2023

Should there be a note in Documentation/operations/upgrade.rst about the new metric?

Hmm, should there? The new metric has low cardinality, so it shouldn't require any extra capacity planning from users. I think a release note is enough, but let me know @qmonnet if I should add something extra.

@lambdanis
Copy link
Contributor Author

/test

@lambdanis
Copy link
Contributor Author

/ci-gke

@lambdanis
Copy link
Contributor Author

/ci-multicluster

@lambdanis
Copy link
Contributor Author

/ci-external-workloads

@maintainer-s-little-helper maintainer-s-little-helper bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Mar 30, 2023
@michi-covalent michi-covalent merged commit e0de01c into cilium:master Mar 30, 2023
42 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/metrics Impacts statistics / metrics gathering, eg via Prometheus. ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/minor This PR changes functionality that users may find relevant to operating Cilium. sig/hubble Impacts hubble server or relay
Projects
None yet
Development

Successfully merging this pull request may close these issues.

hubble: Introduce metric for lost events
7 participants