Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hubble: Fix Race in Hubble Consumer #15967

Merged
merged 1 commit into from May 20, 2021

Conversation

nathanjsweet
Copy link
Member

@nathanjsweet nathanjsweet commented Apr 30, 2021

When the hubble event consumer channel is full there is
a possible race condition. The code was structured such that
the consumer could end up losing the count of lost events, because
it was updating the count without a lock. Additionally, it could
reach a condition wherein it would reset the count and send a
notification of recovery more than once.

Finally, there was a log message for dropped events, but it was
"debug" level event rather than the warning that it should be.

Signed-off-by: Nate Sweet nathanjsweet@pm.me

Fixed a minor race condition on drop counts when hubble starts drops flows/events, because of a full channel. This change also will log the fact that drops are happening once, rather than a log message for every drop, and will log an additional comment after drops are no longer happening with the number of events/flows that were dropped.

@nathanjsweet nathanjsweet added area/monitor Impacts monitoring, access logging, flow logging, visibility of datapath traffic. release-note/misc This PR makes changes that have no direct user impact. labels Apr 30, 2021
@nathanjsweet nathanjsweet requested review from gandro and a team April 30, 2021 20:42
@nathanjsweet
Copy link
Member Author

I was looking through the hubble code today to see who the BPF Perf map got transmuted into the bubble perf ring buffer, to see if there was a possibility of a direct read situation (seems like it's not possible). In the course of doing that though I ran into this, which I think is a legitimate bug.

@nathanjsweet nathanjsweet force-pushed the pr/nathanjsweet/fix-race-in-hubble-consumer branch from 5d32427 to 806fc4f Compare April 30, 2021 21:02
Copy link
Member

@gandro gandro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! Thanks for fixing it!

When the hubble event consumer channel is full there is
a possible race condition. The code was structured such that
the consumer could end up losing the count of lost events, because
it was updating the count without a lock. Additionally, it could
reach a condition wherein it would reset the count and send a
notification of recovery more than once.

Finally, there was a log message for dropped events, but it was
"debug" level event rather than the warning that it should be.

Signed-off-by: Nate Sweet <nathanjsweet@pm.me>
@nathanjsweet nathanjsweet force-pushed the pr/nathanjsweet/fix-race-in-hubble-consumer branch from 806fc4f to c11c010 Compare May 3, 2021 15:46
@gandro gandro removed their assignment May 4, 2021
@nathanjsweet nathanjsweet added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label May 12, 2021
@nathanjsweet nathanjsweet requested a review from glibsm May 12, 2021 16:41
@tklauser
Copy link
Member

test-me-please

@joestringer
Copy link
Member

Please triage the 5 failing checks and if they're all truly unrelated, briefly describe why we should be able to ignore them.

@joestringer joestringer removed the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label May 17, 2021
@nathanjsweet
Copy link
Member Author

All test failures were either network or DNS failures, and in one weird case a success that was supposed to be a failure. The likelihood that any of these failures was caused by this PR is as close to nil as I can imagine.

@nathanjsweet nathanjsweet added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label May 19, 2021
@twpayne twpayne merged commit 9824484 into master May 20, 2021
@twpayne twpayne deleted the pr/nathanjsweet/fix-race-in-hubble-consumer branch May 20, 2021 09:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/monitor Impacts monitoring, access logging, flow logging, visibility of datapath traffic. ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/misc This PR makes changes that have no direct user impact.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants