Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hubble: improve hubble lost event log rate limit #24720

Merged

Conversation

kaworu
Copy link
Member

@kaworu kaworu commented Apr 4, 2023

Once we start logging the "hubble events queue is full" message, it means the buffered channel is full and the consumer can't keep up. From there either:

  • the consumer catches up and we stop logging the message
  • the consumer stay behind and we (rate-limited) log the message every time an event is read

The latter case is not ideal because we think the consumer has caught up every time it reads a message while it stays behind. Since we now have metrics for Hubble lost events, a single message every 30s should be a useful enough signal from the logs.

@kaworu kaworu added kind/enhancement This would improve or streamline existing functionality. release-note/misc This PR makes changes that have no direct user impact. sig/hubble Impacts hubble server or relay labels Apr 4, 2023
@kaworu kaworu requested a review from a team as a code owner April 4, 2023 06:40
@kaworu kaworu requested a review from chancez April 4, 2023 06:40
Copy link
Contributor

@lambdanis lambdanis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🙇‍♀️

Once we start logging the "hubble events queue is full" message, it
means the buffered channel is full and the consumer can't keep up. From
there either:

* the consumer catches up and we stop logging the message
* the consumer stay behind and we (rate-limited) log the message every
  time an event is read

The latter case is not ideal because we think the consumer has caught up
every time it reads a message while it stays behind. Since we now have
metrics for Hubble lost events, a single message every 30s should be a
useful enough signal from the logs.

Signed-off-by: Alexandre Perrin <alex@isovalent.com>
Suggested-by: Paul Chaignon <paul@cilium.io>
@kaworu kaworu force-pushed the pr/kaworu/hubble/rate-limit-lost-events branch from c25ab08 to 8364e95 Compare April 4, 2023 09:48
@pchaigno
Copy link
Member

pchaigno commented Apr 9, 2023

I don't think we need to run the full end-to-end tests for such minor changes. Marking ready to merge.

@pchaigno pchaigno added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Apr 9, 2023
@dylandreimerink dylandreimerink merged commit 324e62c into cilium:master Apr 10, 2023
43 checks passed
@kaworu kaworu deleted the pr/kaworu/hubble/rate-limit-lost-events branch April 14, 2023 07:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/enhancement This would improve or streamline existing functionality. ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/misc This PR makes changes that have no direct user impact. sig/hubble Impacts hubble server or relay
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants