New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
hubble: Add --hubble-monitor-events
flag
#24828
Conversation
cce7da8
to
d85ad93
Compare
Commit 3d92136363dac97bac8e4a67607185225ccf699f does not contain "Signed-off-by". Please follow instructions provided in https://docs.cilium.io/en/stable/contributing/development/contributing_guide/#developer-s-certificate-of-origin |
3d92136
to
d85ad93
Compare
d85ad93
to
2497df3
Compare
--hubble-monitor-events
flag--hubble-monitor-events
flag
2497df3
to
9d39387
Compare
Commit e4632624f7a096c7fbd1792e5bc7c5a81f8bbee4 does not contain "Signed-off-by". Please follow instructions provided in https://docs.cilium.io/en/stable/contributing/development/contributing_guide/#developer-s-certificate-of-origin |
dcfbb3c
to
236c8e8
Compare
Commit e4632624f7a096c7fbd1792e5bc7c5a81f8bbee4 does not contain "Signed-off-by". Please follow instructions provided in https://docs.cilium.io/en/stable/contributing/development/contributing_guide/#developer-s-certificate-of-origin |
236c8e8
to
c5a41c8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a minor usability/UX suggestion below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 🎉
It looks like the two test failures are pointing at the same thing: The cmdref docs need to be updated: https://github.com/cilium/cilium/actions/runs/4757401849/jobs/8454194805?pr=24828#step:8:150 |
746a246
to
38484ac
Compare
/test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
Signed-off-by: Michi Mutsuzaki <michi@isovalent.com> Signed-off-by: Aditya Sharma <aditya.sharma@shopify.com> Co-authored-by: Michi Mutsuzaki <michi@isovalent.com> Co-authored-by: Aditya Sharma <aditya.sharma@shopify.com>
38484ac
to
f764b10
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM for api changes.
/test |
test-runtime: hit #22373 |
/test-runtime |
ci-multicluster: filed #25064 |
/ci-multicluster |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe you misinterpreted the meaning of return value from the OnMonitorEvent
handler. Please check my comments within a PR.
return &monitorFilter, nil | ||
} | ||
|
||
func (m *monitorFilter) OnMonitorEvent(ctx context.Context, event *observerTypes.MonitorEvent) (bool, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the return bool flag is used as stopPropagation indicator https://github.com/cilium/cilium/blob/main/pkg/hubble/observer/local_observer.go#L119
So if this function returns true the event won't be propagated to hubble observers. If it returns false, the event will be propagated to hubble observers.
Either return value of this function needs to be reverted, or configuration flag should be changed to clearly indicate that this settings is excluding given event types from getting observed by hubble.
Currently when I use --hubble-monitor-events="drop debug capture trace policy-verdict recorder l7 agent"
I expect to see everything, but actually there is nothing reaching hubble subsystem.
flags.StringSlice(option.HubbleMonitorEvents, []string{}, | ||
fmt.Sprintf( | ||
"Cilium monitor events for Hubble to observe: [%s]. By default, Hubble observes all monitor events.", | ||
strings.Join(monitorAPI.AllMessageTypeNames(), " "), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Current implementation is to "exclude", not to "observe"
switch payload := event.Payload.(type) { | ||
case *observerTypes.PerfEvent: | ||
if len(payload.Data) == 0 { | ||
return false, errors.ErrEmptyData |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Related to my above comment: return false
actually passes this event to the hubble subsystem. If you want to not pass empty events (which makes a lot of sense) this should return true
|
||
type testEvent struct { | ||
event *observerTypes.MonitorEvent | ||
allowed bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to be consistent with https://github.com/cilium/cilium/blob/main/pkg/hubble/observer/local_observer.go#L119 please rename allowed
to stop
or stopPropagation
assert.NoError(t, err) | ||
|
||
for _, event := range tc.events { | ||
allowed, err := mf.OnMonitorEvent(context.Background(), event.event) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rename allowed
to stop
or stopPropagation
Can we back-port this to 1.12 / 1.13? that would really help with managing Hubble CPU usage. Cheers. |
In general, only bugfixes are backported, see the backporting policy here. This is considered a new feature. This feature will be made available in an upcoming v1.14 monthly snapshot release which you're welcome to try out and provide feedback. Furthermore, as we can see from recent posts on this PR, there's still active development on it, so it would be premature to release at this stage without further testing and feedback. |
@joestringer that make sense, thank you for the details. |
Please ensure your pull request adheres to the following guidelines:
description and a
Fixes: #XXX
line if the commit addresses a particularGitHub issue.
Fixes: <commit-id>
tag, thenplease add the commit author[s] as reviewer[s] to this issue.
Related #19929
Adds a
--hubble-monitor-events
flag that let's a user control the event types that get to the hubble subsystem.Why
Nodes with traffic heavy workloads are doing 30k+ flows per second. This causes very high CPU utilization for the cilium-agent container. We'd like to be able to only monitor drop and other events and ignore the normal flows.
Flame graph showing that most of the CPU time is spent in the hubble subsystem decoding the events:
cc @michi-covalent