New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
agent: add several new flags to control Cilium's datapath events notifications #30063
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One comment on the semantics, otherwise LGTM.
Others may have more thoughts, as I'm not very familiar with the runtime behavior of the Envoy config.
04d8c57
to
6a0156f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great. Thanks for the updates.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
You might need to run |
6a0156f
to
673ad6d
Compare
indeed, thanks! 🙈 |
Just realizing that I missed out on @EricMountain's draft PR #30031 @joestringer, are you happy with the way it looks or would you prefer if we pursue a more generic command-line option? |
Thanks for the PR. My gut feeling is it'd be simpler to have a single notify flag that allows defining which notifications to enable (with a default that enables all notification types). I see that @kaworu has expressed interest in this PR and I think he could have some useful input here as well. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR @mvisonneau!
For context, This PR will introduce rate-limiting for events, and one of the potential follow-up is per-event type rate-limiting.
This PR is currently missing configMap / Helm values corresponding to the new flags, and I'd to find a way to expose cleanly enable / disable toggles (introduced by this PR) and rate-limiting configuration. How about something like this:
-- control events generated by the Cilium datapath exposed to cilium monitor and hubble.
events:
drop:
enabled: true
policyVerdict:
enabled: true
trace:
enabled: true
That way we have a per event-type section. The global rate-limiting PR can introduce values under the events
section, and the per event-type rate-limiting follow-up under the drop
/ policyVerdict
/ trace
sections. What do you think @joestringer?
@kaworu I like that proposal. This should enable us to subsequently add more specific configurations for each event type. @mvisonneau What do you think about the Helm configuration proposed by @kaworu above? One more thing, this PR describes the change as modifying the behaviour for endpoint event notifications, but I think that's incorrect. I believe this impacts all datapath event notifications (via |
673ad6d
to
a40e6ac
Compare
The PR already picked up conflicts that need to be resolved :/ |
1d76cd2
to
cd98195
Compare
argh indeed, I just rebased, hope CI will still pass! 🤞 |
/test |
This needs another rebase, sorry. |
cd98195
to
430ccca
Compare
apologies for the delay, it should now be rebased! :) |
/test |
should I assume that the failing tests are flakes or is there something else required on my end? 🤔 |
@mvisonneau ipsec upgrade can be flakey, and seems unlikely to be related to these changes. For the Travis failure, it looks like a similar flake issue/fix was closed recently . Perhaps try rebasing to ensure your branch has those changes? |
430ccca
to
b84aa6b
Compare
thanks for your guidance @tommyp1ckles, I've once again rebased! 🤞 |
/test |
Heads up that the main blocker on this PR is getting the required jobs to pass. The L4LB test has been retriggered five times over the past couple of weeks and seems to be consistently failing in this way:
It's not clear to me why this PR would be triggering such a failure, but it's also hard for us to ignore a consistent signal from CI that the tests are failing and then merge the PR. It may be worth looking around in the issues to see if there are similar failures reported by others, and worst case rebase again to see if it was a problem already fixed by another PR in the main branch. And yes I recognize that asking to rebase again is a bit tedious, unfortunately it looks like this PR may have been adversely impacted by the elevated background failure rate in the tree since late Jan / early feb (via the CI dashboard). We're gradually improving some of the individual workflows that have been failing more often, but a couple of the workflows still seem to have elevated failure rates: |
👋 hey @joestringer, thanks for the feedback! I've just stumbled upon #31167, going to attempt rebasing as well as I am also unsure where to start looking otherwise 🙈 |
…ions This commit introduces three new configuration flags for the Cilium agent, allowing users to choose the bpf event types they want to expose to Cilium monitor and Hubble. - `--bpf-events-drop-enabled` Expose 'drop' events for Cilium monitor and/or Hubble (default true) - `--bpf-events-policy-verdict-enabled` Expose 'policy verdict' events for Cilium monitor and/or Hubble (default true) - `--bpf-events-trace-enabled` Expose 'trace' events for Cilium monitor and/or Hubble (default true) The default values for these flags remain set to `true`, not changing the current behaviour. In our case, we found particularly useful to disable the TraceNotification in order to reduce the CPU overhead on some of our nodes when Hubble is enabled as we were mostly interested into dropped packets. Signed-off-by: Maxime Visonneau <maxime.visonneau@gmail.com>
b84aa6b
to
b8830d8
Compare
/test |
seems like it did the trick! 🙌 |
@mvisonneau thanks for the contribution! |
This commit introduces three new configuration flags for the Cilium agent, allowing users to choose the bpf event types they want to expose to Cilium monitor and Hubble.
--bpf-events-drop-enabled
Expose 'drop' events for Cilium monitor and/or Hubble (default true)--bpf-events-policy-verdict-enabled
Expose 'policy verdict' events for Cilium monitor and/or Hubble (default true)--bpf-events-trace-enabled
Expose 'trace' events for Cilium monitor and/or Hubble (default true)The default values for these flags remain set to
true
, not changing the current behaviour.In our case, we found particularly useful to disable the TraceNotification in order to reduce the CPU overhead on some of our nodes when Hubble is enabled as we were mostly interested into dropped packets.