New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
init.sh: move socketlb creation into own pkg #23557
Conversation
d8d5b96
to
0770269
Compare
/test |
0770269
to
4f927c0
Compare
/test |
4 similar comments
/test |
/test |
/test |
/test |
7fae206
to
a235a85
Compare
a235a85
to
ec29b7d
Compare
/test |
e2fc021
to
be9cfdc
Compare
/test |
/test-1.24-5.4 |
/test-1.25-4.19 |
/test-1.26-net-next |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logic looks much simpler and more reliable now 👍 (I didn't look super close at the details)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the revisions and detailed comments: the code logic is easier to follow now, and adopting bpf_link approach only on fresh kernels looks like a sound approach.
I was looking at the ebpf library code, and I found a couple of instances of panic. They may or may not directly be in the link or PROG_ATTACH logic, but this could change. Can we recover
from panic so that we don't end up crashing the agent?
While the code migration to Go code provides visibility and better error reporting, I've had to resort to cleaning things up (re: bpf_clear_cgroups) out of band a few times in the past. Any thoughts on providing back-up code that could be easily run as an independent binary? Could it be as simple as removing the pinned link for the new approach? This isn't necessarily a blocking comment for this PR, but it would be worthwhile to be considered as a follow-up.
I know you're just being cautious, but this feels nitpicky. panic is used very sparingly outside of tests and The only code path that could potentially panic in a real-life scenario is the one in
I think that's just part of development, but that doesn't mean we can't maintain tooling to help. With the various ways of attaching programs, this may come in handy in the future.
Yep! Simply deleting the bpffs pin does the trick, as long as nothing else holds an open fd to the link.
Yes, let's not increase the scope of this PR and delay it further. Would this fit in |
Agreed, users of the lib are never meant to need to recover. |
1125b78
to
6ba0e05
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Barring couple of follow-ups, LGTM. Nice work!
While the code migration to Go code provides visibility and better error reporting, I've had to resort to cleaning things up (re: bpf_clear_cgroups) out of band a few times in the past. Any thoughts on providing back-up code that could be easily run as an independent binary?
I think that's just part of development, but that doesn't mean we can't maintain tooling to help. With the various ways of attaching programs, this may come in handy in the future.
Could it be as simple as removing the pinned link for the new approach?
Yep! Simply deleting the bpffs pin does the trick, as long as nothing else holds an open fd to the link.
It'll be good to document this.
This isn't necessarily a blocking comment for this PR, but it would be worthwhile to be considered as a follow-up.
Yes, let's not increase the scope of this PR and delay it further. Would this fit in
cilium cleanup
? I think this is part of a wider topic, but please drop an issue with the behaviour you'd like to see so the foundations team can potentially pick this up.
Here it is - #24585.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Vendor changes lgtm
1daf500
to
e8296f9
Compare
/test |
e8296f9
to
619c8e0
Compare
/test |
Had to add some extra code that removes defunct links. |
test-runtime hit #24653, all other required tests are green. |
619c8e0
to
0340c2d
Compare
Signed-off-by: Robin Gögge <r.goegge@isovalent.com>
bpf.LoadCollection() would fail on kernels not supporting getpeername since it tries loading all programs in the ELF. Signed-off-by: Timo Beckers <timo@isovalent.com>
With subsequent patches making use of bpffs directories other than tc/globals, clear up the terminology and reserve the term 'root' for the equivalent of /sys/fs/bpf. Use 'TCGlobals' for tc/globals and 'Cilium' for cilium/. Signed-off-by: Timo Beckers <timo@isovalent.com>
Signed-off-by: Robin Gögge <r.goegge@isovalent.com>
With this commit we translate code from bpf/init.sh that compiled and loaded bpf code for the socketlb feature to a pure go implementation. The main difference to the init.sh code is that now on a fresh install of cilium, cilium will leverage bpf links to attach programs to cgroups if bpf_link is available in the kernel. Test cases focus on testing scenarios that occur on clean agent startup, restart and upgrade for both attaching and detaching bpf programs to/from cgroups. Co-authored-by: Timo Beckers <timo@isovalent.com> Signed-off-by: Robin Gögge <r.goegge@isovalent.com>
0340c2d
to
04d3648
Compare
/test Job 'Cilium-PR-K8s-1.24-kernel-5.4' failed: Click to show.Test Name
Failure Output
If it is a flake and a GitHub issue doesn't already exist to track it, comment |
/test-1.24-5.4 |
/test-1.26-net-next |
Description taken from the main commit:
Fixes: #20739