Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve Hubble decoding performance for drop, debug, policy and tracesock events #25751

Merged
merged 1 commit into from Jul 4, 2023

Conversation

Jack-R-lantern
Copy link
Contributor

@Jack-R-lantern Jack-R-lantern commented May 30, 2023

Please ensure your pull request adheres to the following guidelines:

  • For first time contributors, read Submitting a pull request
  • All code is covered by unit and/or runtime tests where feasible.
  • All commits contain a well written commit description including a title,
    description and a Fixes: #XXX line if the commit addresses a particular
    GitHub issue.
  • If your commit description contains a Fixes: <commit-id> tag, then
    please add the commit author[s] as reviewer[s] to this issue.
  • All commits are signed off. See the section Developer’s Certificate of Origin
  • Provide a title or release-note blurb suitable for the release notes.
  • Are you a user of Cilium? Please add yourself to the Users doc
  • Thanks for contributing!

I implemented the function by referencing #24162.
reduced CPU usage.

DecodeDropNotify Benchmark Result

goos: linux
goarch: amd64
pkg: github.com/cilium/cilium/pkg/monitor
cpu: Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
=== RUN   BenchmarkNewDecodeDropNotify
BenchmarkNewDecodeDropNotify
BenchmarkNewDecodeDropNotify-4   	45821463	        27.00 ns/op	       0 B/op	       0 allocs/op
=== RUN   BenchmarkOldDecodeDropNotify
BenchmarkOldDecodeDropNotify
BenchmarkOldDecodeDropNotify-4   	 3343834	       368.0 ns/op	     144 B/op	       3 allocs/op
PASS
ok  	github.com/cilium/cilium/pkg/monitor	2.925s

DecodePolicyVerdictNotify

goos: linux
goarch: amd64
pkg: github.com/cilium/cilium/pkg/monitor
cpu: Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
=== RUN   BenchmarkNewDecodePolicyVerdictNotify
BenchmarkNewDecodePolicyVerdictNotify
BenchmarkNewDecodePolicyVerdictNotify-4         42731541                26.72 ns/op            0 B/op          0 allocs/op
=== RUN   BenchmarkOldDecodePolicyVerdictNotify
BenchmarkOldDecodePolicyVerdictNotify
BenchmarkOldDecodePolicyVerdictNotify-4          2977009               405.1 ns/op           112 B/op          3 allocs/op
PASS
ok      github.com/cilium/cilium/pkg/monitor    2.863s

DecodeDebugCapture/DecodeDebugMsg

goos: linux
goarch: amd64
pkg: github.com/cilium/cilium/pkg/monitor
cpu: Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
=== RUN   BenchmarkNewDecodeDebugCapture
BenchmarkNewDecodeDebugCapture
BenchmarkNewDecodeDebugCapture-4        64555694                18.92 ns/op            0 B/op          0 allocs/op
=== RUN   BenchmarkOldDecodeDebugCapture
BenchmarkOldDecodeDebugCapture
BenchmarkOldDecodeDebugCapture-4         4329416               272.1 ns/op            96 B/op          3 allocs/op
=== RUN   BenchmarkNewDecodeDebugMsg
BenchmarkNewDecodeDebugMsg
BenchmarkNewDecodeDebugMsg-4            74855434                16.42 ns/op            0 B/op          0 allocs/op
=== RUN   BenchmarkOldDecodeDebugMsg
BenchmarkOldDecodeDebugMsg
BenchmarkOldDecodeDebugMsg-4             4504531               264.1 ns/op            96 B/op          3 allocs/op
PASS
ok      github.com/cilium/cilium/pkg/monitor    5.501s

DecodeTraceSockNotify

goos: linux
goarch: amd64
pkg: github.com/cilium/cilium/pkg/monitor
cpu: Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
=== RUN   BenchmarkNewDecodeTraceSockNotify
BenchmarkNewDecodeTraceSockNotify
BenchmarkNewDecodeTraceSockNotify-4     92002668                12.99 ns/op            0 B/op          0 allocs/op
=== RUN   BenchmarkOldDecodeTraceSockNotify
BenchmarkOldDecodeTraceSockNotify
BenchmarkOldDecodeTraceSockNotify-4      2426386               439.2 ns/op           144 B/op          3 allocs/op
PASS
ok      github.com/cilium/cilium/pkg/monitor    2.862s

Fixes: #25657

@maintainer-s-little-helper
Copy link

Commit a67f17db1fe1b26b9a17a2bb29ce25da5d86f9f3 does not contain "Signed-off-by".

Please follow instructions provided in https://docs.cilium.io/en/stable/contributing/development/contributing_guide/#developer-s-certificate-of-origin

@maintainer-s-little-helper maintainer-s-little-helper bot added dont-merge/needs-sign-off The author needs to add signoff to their commits before merge. dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. labels May 30, 2023
@github-actions github-actions bot added the kind/community-contribution This was a contribution made by a community member. label May 30, 2023
@Jack-R-lantern Jack-R-lantern temporarily deployed to release-base-images May 30, 2023 15:59 — with GitHub Actions Inactive
@maintainer-s-little-helper maintainer-s-little-helper bot removed the dont-merge/needs-sign-off The author needs to add signoff to their commits before merge. label May 30, 2023
@Jack-R-lantern Jack-R-lantern changed the base branch from master to main May 30, 2023 16:00
@Jack-R-lantern Jack-R-lantern marked this pull request as ready for review May 31, 2023 03:56
@Jack-R-lantern Jack-R-lantern requested review from a team as code owners May 31, 2023 03:56
@maintainer-s-little-helper
Copy link

Commit 769eb64476fd5ec1cbb1b464e247d1b357460ebe does not contain "Signed-off-by".

Please follow instructions provided in https://docs.cilium.io/en/stable/contributing/development/contributing_guide/#developer-s-certificate-of-origin

@maintainer-s-little-helper maintainer-s-little-helper bot added the dont-merge/needs-sign-off The author needs to add signoff to their commits before merge. label May 31, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot removed the dont-merge/needs-sign-off The author needs to add signoff to their commits before merge. label May 31, 2023
@Jack-R-lantern Jack-R-lantern force-pushed the ISSUE-25657 branch 2 times, most recently from 55e3d12 to 19ce6d2 Compare June 3, 2023 04:06
@kaworu kaworu added kind/enhancement This would improve or streamline existing functionality. kind/performance There is a performance impact of this. release-note/misc This PR makes changes that have no direct user impact. sig/hubble Impacts hubble server or relay labels Jun 5, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot removed the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Jun 5, 2023
Copy link
Member

@kaworu kaworu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the patch @Jack-R-lantern! The commit split is great, as are the added benchmark functions 🙏

This Travis-CI failure seems related to the patch (TestDecodeDebugEvent). Also couple of requests:

  1. Please do the length check in the "private" functions, i.e. right before indexing data
  2. I think in addition the the benchmark we need to have some safeguard unit test around decoding, in case any of the message struct change. See the unit tests added by this PR if you need inspiration.

@Jack-R-lantern
Copy link
Contributor Author

Thanks for the feedback, I'll make changes based on your feedback and add a new commit

@ti-mo
Copy link
Contributor

ti-mo commented Jun 20, 2023

Nit: 11 commits is a little on the high side for these changes. Would you be able to squash them down to one or two logical changes? Otherwise, this will inflate the commit history for no benefit.

@ti-mo ti-mo added the dont-merge/wait-until-release Freeze window for current release is blocking non-bugfix PRs label Jun 20, 2023
@Jack-R-lantern
Copy link
Contributor Author

@ti-mo I did a rebase and pushed it to a PR.

@ti-mo
Copy link
Contributor

ti-mo commented Jun 21, 2023

/test

@ti-mo
Copy link
Contributor

ti-mo commented Jun 21, 2023

@Jack-R-lantern Thanks! Running tests.

Copy link
Member

@kaworu kaworu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the patch @Jack-R-lantern, couple of nitpicking on the way but overall LGTM!

pkg/monitor/datapath_debug.go Outdated Show resolved Hide resolved
pkg/monitor/datapath_debug.go Outdated Show resolved Hide resolved
pkg/monitor/datapath_debug.go Outdated Show resolved Hide resolved
pkg/monitor/datapath_drop.go Outdated Show resolved Hide resolved
pkg/monitor/datapath_policy.go Outdated Show resolved Hide resolved
pkg/monitor/datapath_policy.go Show resolved Hide resolved
pkg/monitor/datapath_sock_trace.go Outdated Show resolved Hide resolved
@kaworu kaworu requested a review from gandro June 27, 2023 11:33
Copy link
Member

@gandro gandro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Looks good to me (besides the things Alex already pointed out)

pkg/monitor/datapath_policy.go Show resolved Hide resolved
@Jack-R-lantern Jack-R-lantern force-pushed the ISSUE-25657 branch 2 times, most recently from e9dc98d to edca6b5 Compare June 27, 2023 13:17
@ti-mo ti-mo added dont-merge/wait-until-release Freeze window for current release is blocking non-bugfix PRs and removed dont-merge/wait-until-release Freeze window for current release is blocking non-bugfix PRs labels Jun 27, 2023
Copy link
Member

@kaworu kaworu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update @Jack-R-lantern, the patch LGTM.

This commit is to improve decode performance.
Due to the performance degradation caused by the reflection of 'binary.Read',
added functions for each structure.

Signed-off-by: Jack-R-lantern <tjdfkr2421@gmail.com>
Co-authored-by: Alexandre Perrin <alex@kaworu.ch>
@ldelossa
Copy link
Contributor

/test

@ldelossa ldelossa removed the dont-merge/wait-until-release Freeze window for current release is blocking non-bugfix PRs label Jun 29, 2023
@ldelossa
Copy link
Contributor

ldelossa commented Jun 29, 2023

Legit failure here in [Setup & Test (1, 4.19-20230420.212204, iptables, false, vxlan)] (https://github.com/cilium/cilium/actions/runs/5401014338/jobs/9810263072#logs)

Found "2023-06-28T12:27:57.917020273Z level=error msg=k8sError error=\"github.com/cilium/cilium/pkg/k8s/resource/resource.go:304: Failed to watch *v2.CiliumClusterwideNetworkPolicy: failed to list *v2.CiliumClusterwideNetworkPolicy: the server could not find the requested resource (get ciliumclusterwidenetworkpolicies.cilium.io)\" subsys=k8s" in logs 1 times

  ❌ Found 1 logs matching list of errors that must be investigated:
2023-06-28T12:27:57.917020273Z level=error msg=k8sError error="github.com/cilium/cilium/pkg/k8s/resource/resource.go:304: Failed to watch *v2.CiliumClusterwideNetworkPolicy: failed to list *v2.CiliumClusterwideNetworkPolicy: the server could not find the requested resource (get ciliumclusterwidenetworkpolicies.cilium.io)" subsys=k8s

📋 Test Report
❌ 1/51 tests failed (0/549 actions), 10 tests skipped, 1 scenarios skipped:
Test [check-log-errors]:
connectivity test failed: 1 tests failed
Error: Process completed with exit code 1.

@Jack-R-lantern
Copy link
Contributor Author

@ldelossa Is there any action I should take?

@gandro
Copy link
Member

gandro commented Jul 3, 2023

I don't think that failure is legitimate. This change here purely affects observability internals and does not impact the datapath, policy engine or K8s connectivity in any way.

I think we have discovered a new flake.

Edit: I have created #26591 to track it.

Edit 2: It is suspicious that the error occurred two times in a row, but I really don't see how it could be related. Restarted again. I've also checked the sysdump (you need to go to the latest attempt for that) and don't see anything suspicious in the affected agent logs either. I think we have a race in the agent where it tries to fetch CiliumNetworkPolicies before they are ready, but again, that is unrelated to the changes in this PR.

Edit 3: Third time's the charm. That CI pipeline is green now ✔️

@gandro gandro changed the title Improve Hubble decoding performance Improve Hubble decoding performance for drop, debug, policy and tracesock events Jul 3, 2023
Copy link
Member

@julianwiedmann julianwiedmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@maintainer-s-little-helper maintainer-s-little-helper bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Jul 4, 2023
@kaworu kaworu merged commit a8ecaa9 into cilium:main Jul 4, 2023
64 of 65 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/community-contribution This was a contribution made by a community member. kind/enhancement This would improve or streamline existing functionality. kind/performance There is a performance impact of this. ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/misc This PR makes changes that have no direct user impact. sig/hubble Impacts hubble server or relay
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve Hubble decoding performance
6 participants