New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
hubble: Add a metric for lost events #22865
hubble: Add a metric for lost events #22865
Conversation
ce1da36
to
caf39c4
Compare
This pull request has been automatically marked as stale because it |
caf39c4
to
8890be4
Compare
28543ba
to
078b60a
Compare
078b60a
to
203afa0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
/test |
Should there be a note in Documentation/operations/upgrade.rst about the new metric? |
/test-1.26-net-next |
Currently numbers of lost events are visible in logs and in the observer, but they tend to be noisy and difficult to consume in such form. A Prometheus counter is much more suitable for this kind of information, thus this patch adds hubble_lost_events_total metric. Fixes cilium#15112 Signed-off-by: Anna Kapuscinska <anna@isovalent.com>
203afa0
to
db321d9
Compare
The number of lost messages is now exposed as a metric, which is more practical than a log, so the log level can be reduced. Also rephrased the log message, to indicate only the fact that a LostEvent message was received - quite often the queue will be full immediately after that, so saying that Hubble is "back to nromal" can be misleading. Signed-off-by: Anna Kapuscinska <anna@isovalent.com>
Signed-off-by: Anna Kapuscinska <anna@isovalent.com>
db321d9
to
72a2640
Compare
Hmm, should there? The new metric has low cardinality, so it shouldn't require any extra capacity planning from users. I think a release note is enough, but let me know @qmonnet if I should add something extra. |
/test |
/ci-gke |
/ci-multicluster |
/ci-external-workloads |
Currently numbers of lost events are visible in logs and in the observer, but they tend to be noisy and difficult to consume in such form. A Prometheus counter is much more suitable for this kind of information, thus this patch adds
hubble_lost_events_total
metric.The metric is exposed always when any of the "regular" Hubble metrics is enabled. It's low cardinality, and most users will want some of the flows metrics in addition to lost events - so this shouldn't be a problem.
I left the log line with lost events count, but to reduce logs noisiness (it was reported to be noisy in the past) I reduced its level to Debug.
Fixes #15112