Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bpf: Enable monitor aggregation for all events in bpf_network.c #31015

Merged

Conversation

learnitall
Copy link
Contributor

Please ensure your pull request adheres to the following guidelines:

  • For first time contributors, read Submitting a pull request
  • All code is covered by unit and/or runtime tests where feasible.
  • All commits contain a well written commit description including a title,
    description and a Fixes: #XXX line if the commit addresses a particular
    GitHub issue.
  • If your commit description contains a Fixes: <commit-id> tag, then
    please add the commit author[s] as reviewer[s] to this issue.
  • All commits are signed off. See the section Developer’s Certificate of Origin
  • Provide a title or release-note blurb suitable for the release notes.
  • Are you a user of Cilium? Please add yourself to the Users doc
  • Thanks for contributing!

This commit adjusts the usage of send_trace_notify in bpf_network.c to enable monitor aggregation for all events emitted at this observation point in the datapath. This change helps improve resource usage by reducing the overall number of events that the datapath emits, while still enabling packet observability with Hubble.

The events in bpf_network.c enable observability into the IPSec processing of the datapath. Before this commit, multiple other efforts have been made to increase the aggregation of events related to IPSec to reduce resource usage, see #29616 and #27168. These efforts were related to packets that were specifically marked as encrypted or decrypted by IPSec and did not include events in bpf_network.c that were emitted when either: (a) a plaintext packet has been received from the network, or (b) a packet was decrypted and reinserted into the stack by XFRM. Both of these events are candidates for aggregation because similar to-stack events will be emitted down the line in the datapath anyways. Additionally, these events are mainly useful for root-cause analysis or debugging and are not necessarily helpful from an overall observability standpoint.

Add monitor aggregation for all events related to packets ingressing to the network-facing device.

@learnitall learnitall added sig/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages. release-note/misc This PR makes changes that have no direct user impact. needs-backport/1.13 This PR / issue needs backporting to the v1.13 branch affects/v1.13 This issue affects v1.13 branch affects/v1.14 This issue affects v1.14 branch needs-backport/1.14 This PR / issue needs backporting to the v1.14 branch affects/v1.15 This issue affects v1.15 branch needs-backport/1.15 This PR / issue needs backporting to the v1.15 branch labels Feb 27, 2024
@maintainer-s-little-helper maintainer-s-little-helper bot added this to Needs backport from main in 1.13.13 Feb 27, 2024
@maintainer-s-little-helper maintainer-s-little-helper bot added this to Needs backport from main in 1.15.2 Feb 27, 2024
@maintainer-s-little-helper maintainer-s-little-helper bot added this to Needs backport from main in 1.14.8 Feb 27, 2024
@learnitall learnitall marked this pull request as ready for review February 27, 2024 18:49
@learnitall learnitall requested a review from a team as a code owner February 27, 2024 18:49
Copy link
Contributor

@gentoo-root gentoo-root left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

This change helps improve resource usage by
reducing the overall number of events that the datapath emits

Do you have numbers? I.e. how much CPU usage were saved in some test, how did the packet rate improve?

IPSec

Nit: the correct spelling is IPsec. (Although it looks weird, and the spelling varies even in the related RFCs.)

@learnitall
Copy link
Contributor Author

Do you have numbers? I.e. how much CPU usage were saved in some test, how did the packet rate improve?

I didn't pull numbers for this change just to save some time and I figured it would be ok since it's fairly similar to similar PRs that were made before.

Nit: the correct spelling is IPsec. (Although it looks weird, and the spelling varies even in the related RFCs.)

Oh thank you for letting me know! I had no idea 😄.

@learnitall learnitall force-pushed the pr/learnitall/add-monitor-agg-network branch from 5fc9dd7 to 5bce5f2 Compare March 6, 2024 20:46
@learnitall
Copy link
Contributor Author

/test

This commit adjusts the usage of send_trace_notify in bpf_network.c to
enable monitor aggregation for all events emitted at this observation
point in the datapath. This change helps improve resource usage by
reducing the overall number of events that the datapath emits, while
still enabling packet observability with Hubble.

The events in bpf_network.c enable observability into the IPSec
processing of the datapath. Before this commit, multiple other efforts
have been made to increase the aggregation of events related to IPSec to
reduce resource usage, see cilium#29616 and cilium#27168. These efforts were related
to packets that were specifically marked as encrypted or decrypted by
IPSec and did not include events in bpf_network.c that were emitted when
either: (a) a plaintext packet has been received from the network, or
(b) a packet was decrypted and reinserted into the stack by XFRM. Both
of these events are candidates for aggregation because similar to-stack
events will be emitted down the line in the datapath anyways.
Additionally, these events are mainly useful for root-cause
analysis or debugging and are not necessarily helpful from an overall
observability standpoint.

Signed-off-by: Ryan Drew <ryan.drew@isovalent.com>
@learnitall learnitall force-pushed the pr/learnitall/add-monitor-agg-network branch from 5bce5f2 to 83210bd Compare March 7, 2024 18:33
@learnitall
Copy link
Contributor Author

/test

@maintainer-s-little-helper maintainer-s-little-helper bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Mar 7, 2024
@joestringer joestringer added this pull request to the merge queue Mar 8, 2024
Merged via the queue into cilium:main with commit 81f14bb Mar 8, 2024
62 checks passed
@jibi jibi mentioned this pull request Mar 11, 2024
5 tasks
@jibi jibi added backport-pending/1.13 The backport for Cilium 1.13.x for this PR is in progress. and removed needs-backport/1.13 This PR / issue needs backporting to the v1.13 branch labels Mar 11, 2024
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Backport pending to v1.14 to Needs backport from main in 1.14.8 Mar 13, 2024
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Backport pending to v1.13 to Needs backport from main in 1.13.13 Mar 13, 2024
@jrajahalme jrajahalme added this to Needs backport from main in 1.15.3 Mar 13, 2024
@jrajahalme jrajahalme removed this from Needs backport from main in 1.15.2 Mar 13, 2024
@thorn3r thorn3r removed this from Needs backport from main in 1.13.13 Mar 13, 2024
@thorn3r thorn3r added this to Needs backport from main in 1.13.13 Mar 13, 2024
@thorn3r thorn3r added this to Needs backport from main in 1.13.14 Mar 13, 2024
@thorn3r thorn3r removed this from Needs backport from main in 1.13.13 Mar 13, 2024
@jibi jibi added backport-pending/1.13 The backport for Cilium 1.13.x for this PR is in progress. backport-pending/1.14 The backport for Cilium 1.14.x for this PR is in progress. and removed needs-backport/1.13 This PR / issue needs backporting to the v1.13 branch needs-backport/1.14 This PR / issue needs backporting to the v1.14 branch labels Mar 13, 2024
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Needs backport from main to Backport pending to v1.14 in 1.14.8 Mar 13, 2024
@jibi jibi added backport-pending/1.15 The backport for Cilium 1.15.x for this PR is in progress. and removed needs-backport/1.15 This PR / issue needs backporting to the v1.15 branch labels Mar 13, 2024
@thorn3r thorn3r added this to Backport pending to v1.14 in 1.14.9 Mar 13, 2024
@thorn3r thorn3r removed this from Backport pending to v1.14 in 1.14.8 Mar 13, 2024
@github-actions github-actions bot added backport-done/1.13 The backport for Cilium 1.13.x for this PR is done. backport-done/1.14 The backport for Cilium 1.14.x for this PR is done. backport-done/1.15 The backport for Cilium 1.15.x for this PR is done. and removed backport-pending/1.13 The backport for Cilium 1.13.x for this PR is in progress. backport-pending/1.14 The backport for Cilium 1.14.x for this PR is in progress. backport-pending/1.15 The backport for Cilium 1.15.x for this PR is in progress. labels Mar 16, 2024
@jrajahalme jrajahalme moved this from Needs backport from main to Backport done to v1.15 in 1.15.3 Mar 26, 2024
@thorn3r thorn3r moved this from Needs backport from main to Backport done to v1.13 in 1.13.14 Mar 26, 2024
@jrajahalme jrajahalme moved this from Backport pending to v1.14 to Backport done to v1.14 in 1.14.9 Mar 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
affects/v1.13 This issue affects v1.13 branch affects/v1.14 This issue affects v1.14 branch affects/v1.15 This issue affects v1.15 branch backport-done/1.13 The backport for Cilium 1.13.x for this PR is done. backport-done/1.14 The backport for Cilium 1.14.x for this PR is done. backport-done/1.15 The backport for Cilium 1.15.x for this PR is done. ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/misc This PR makes changes that have no direct user impact. sig/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages.
Projects
No open projects
1.13.14
Backport done to v1.13
1.14.9
Backport done to v1.14
1.15.3
Backport done to v1.15
Development

Successfully merging this pull request may close these issues.

None yet

4 participants