-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
datapath: add tail call hooks for custom metrics, bytecounter example #13191
Conversation
adab887
to
ff39c82
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for all the comments and explanations! That makes it very easy to review :-)
A few nits below, I haven't commented on the position of hooks within bpf_lxc
since we discussed that during #sig-datapath meeting already.
My main remaining concern is around the API we expose. I'm assuming we won't be able to change what we expose to custom program after the fact because it may break custom programs in the wild. I've added a couple comments below in that regard.
ff39c82
to
7708a75
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed the agent side changes and they seem pretty small to me
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks pretty clean to me, nice work.
The main questions that come to mind reviewing this PR are:
- Custom metrics on packets that are subject to L7 forwarding seems a little inconsistent
- Seems like there are two approaches for handling ingress/egress hinted in the PR, either having two programs (one ingress / one egress) or passing the direction bit in the
CB
. Is this just left over from a previous implementation or did you intend to support more than one custom program? - Some helpers could be called by a custom program and influence the handling of the packet, such as
bpf_redirect()
. What kind of guard rails do we want for cases like this? (One simple answer can be just leave it up to the implementer not to mess up; or if we're more concerned we could put compile-time warnings in place to discourage such behaviour) - Is IPv6 support intended? How will we handle this?
I've added comments on each of these topics below so we can have threaded discussions.
Beyond that, there's just a scattering of minor nits, none of which are overly consequential.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
83916fd
to
4e124dd
Compare
Add a new CI test to check that custom programs loaded and run via the datapath tail call hooks work as expected. In particular, we use the byte counter example provided in a previous commit. The test is the following: We deploy Cilium and some pods, we load the byte counter program and attach it to a hook (ingress or egress) for a given endpoint, we send some traffic (ping requests and replies), and then check the values in the eBPF map where the count is stored. Try again with per-endpoint routes, to check the remaining ingress hook. A new manifest is added, to make sure that we have two simple application endpoints, located on different nodes. This is to check that the egress hook is working (there is no egress hook for socket-based load-balancing yet). This manifest also pulls an image which is used to compile the byte counter program before we can load it. This image should be updated in the future to embed the sample byte counter and all related headers, so we wouldn't have to mount a volume against the source repository on the node. This would permit to remove the byte counter from Cilium's repository, and to keep it in that dedicated image instead. Skip the test on 4.9 kernels because bpftool map updates do not work on such kernels. Skip on GKE because we do not have Cilium's source to compile the custom program from. Both issues should be addressed in the future, by moving the program sources to a dedicated image which should also embed its own loader. But actually, just run on net-next, since the coverage for other kernel versions would be the same. Signed-off-by: Quentin Monnet <quentin@isovalent.com>
4e124dd
to
83e0f7c
Compare
Rebased, following merge conflicts |
test-gke |
All required CI jobs are green. A review from @cilium/kubernetes is missing but I think we can skip it as it was only requested for |
The AfterAll() and AfterEach() blocks in the test file for custom calls run everytime, even if the Context block for the actual tests is skipped. In that case, running the final blocks results in an attempt to remove deployments that have never been set up in the first place. This may lead to the blocks failing when the tests were in fact skipped, and may produce test artifacts even though Jenkins does not considered the test failed. Let's reorganise those blocks, to make sure they are called only when necessary. Note that we do need to keep both DeleteCilium() and DeleteAll(), even if they are now in the same block, as calling only DeleteAll() would not remove the Cilium ConfigMap. Fixes: 37f6192 ("test: add CI test for tail calls hooks for custom programs") Fixes: cilium#13191 Fixes: cilium#16633 Reported-by: Paul Chaignon <paul@cilium.io> Signed-off-by: Quentin Monnet <quentin@isovalent.com>
The AfterAll() and AfterEach() blocks in the test file for custom calls run everytime, even if the Context block for the actual tests is skipped. In that case, running the final blocks results in an attempt to remove deployments that have never been set up in the first place. This may lead to the blocks failing when the tests were in fact skipped, and may produce test artifacts even though Jenkins does not considered the test failed. Let's reorganise those blocks, to make sure they are called only when necessary. Note that we do need to keep both DeleteCilium() and DeleteAll(), even if they are now in the same block, as calling only DeleteAll() would not remove the Cilium ConfigMap. Fixes: 37f6192 ("test: add CI test for tail calls hooks for custom programs") Fixes: #13191 Fixes: #16633 Reported-by: Paul Chaignon <paul@cilium.io> Signed-off-by: Quentin Monnet <quentin@isovalent.com>
[ upstream commit 9d4e99d ] The AfterAll() and AfterEach() blocks in the test file for custom calls run everytime, even if the Context block for the actual tests is skipped. In that case, running the final blocks results in an attempt to remove deployments that have never been set up in the first place. This may lead to the blocks failing when the tests were in fact skipped, and may produce test artifacts even though Jenkins does not considered the test failed. Let's reorganise those blocks, to make sure they are called only when necessary. Note that we do need to keep both DeleteCilium() and DeleteAll(), even if they are now in the same block, as calling only DeleteAll() would not remove the Cilium ConfigMap. Fixes: 37f6192 ("test: add CI test for tail calls hooks for custom programs") Fixes: cilium#13191 Fixes: cilium#16633 Reported-by: Paul Chaignon <paul@cilium.io> Signed-off-by: Quentin Monnet <quentin@isovalent.com> Signed-off-by: Tobias Klauser <tobias@cilium.io>
[ upstream commit 9d4e99d ] The AfterAll() and AfterEach() blocks in the test file for custom calls run everytime, even if the Context block for the actual tests is skipped. In that case, running the final blocks results in an attempt to remove deployments that have never been set up in the first place. This may lead to the blocks failing when the tests were in fact skipped, and may produce test artifacts even though Jenkins does not considered the test failed. Let's reorganise those blocks, to make sure they are called only when necessary. Note that we do need to keep both DeleteCilium() and DeleteAll(), even if they are now in the same block, as calling only DeleteAll() would not remove the Cilium ConfigMap. Fixes: 37f6192 ("test: add CI test for tail calls hooks for custom programs") Fixes: #13191 Fixes: #16633 Reported-by: Paul Chaignon <paul@cilium.io> Signed-off-by: Quentin Monnet <quentin@isovalent.com> Signed-off-by: Tobias Klauser <tobias@cilium.io>
Hi @qmonnet, great job! I'm just looking for a traffic byte-counter, and I have a few question:
Looking forward to your reply, much appreciated! |
Hi, thanks for using the feature!
I'm not sure what gain you expect here. The updates on the counter value are atomic operations, so it shouldn't be an issue if multiple CPUs work with the same map. This being said, the point of these hooks is to have custom programs, so of course you're free to experiment :)
Please have a look at the related README.rst, and in particular at step 3 in the instructions for the byte-counter program. There are several hooks where you can attach the program, some on ingress, some on egress. The current program does not make the distinction between ingress and egress. If you want it, you can edit the program to e.g. use another map for egress (distinct from the one for ingress), and attach this new program to the hooks on egress. |
Thanks! |
There's nothing of the sort implemented in the agent currently, so I suppose you'd have to create some watcher to check for the presence of the maps and attach the programs when necessary. |
Custom Metrics
Byte counters! Let's count how much data is received and sent for each endpoint. Related PR: #13173
But more than byte counters, this PR introduces hooks in the datapath so that users can tail call into custom programs, and collect all kind of metrics they want - Provided they implement it themselves in eBPF. Here is an overview:
First commit introduces the tail call hooks in the datapath (IPv4 only), plus management for the per-endpoint prog_array maps used to reference the custom programs. Those maps have two entries, one for ingress and one for egress.
Second commit makes these tail calls optional, and opt-in, through the introduction of a dedicated Cilium option.
Third commit adds a sample application: a simple byte counter which can be attached to keep track of the amount of data received or sent for a given endpoint.
Fourth commit adds the tail call hooks to the IPv6 datapath, on the same model as for IPv4.
To use it, once Cilium is deployed:
bpftool prog load bpf/custom/bpf_custom.o /sys/fs/bpf/tc/globals/bytecounter type classifier \ pinmaps /sys/fs/bpf/tc/globals/bytecounter_maps
PR Status
Here are some points of interest:
I am still trying to finalise the location for the ingress hook. In the current version, the tail call added totail_ipv4_policy()
is not always executed, ashandle_policy()
(which calls it) is skipped inipv4_local_delivery()
ifUSE_BPF_PROG_FOR_INGRESS_POLICY
is defined i.e. ifenable-endpoint-routes
is set totrue
. This is the case for GKE, for example. So my current plan is to add the tail call to the different branches inipv4_local_delivery()
, possibly by replacingreturn
s withgoto
in that function to write the tail call just once, and also possibly with preprocessor guards to make sure it is only called from bpf_lxc.The custom program needs to return with the same value the original program would have used, to preserve the datapath logics. This means that we must pass the desired value from bpf_lxc to the custom program we hook into. I'm using
cb[CB_CT_STATE]
for that because it did not seem to be used at that location, but I might be wrong. Feedback on this approach (or an alternatives) would be welcome.I also pass the current direction (ingress or egress) and the source identity to the custom program. Given that direction needs 1 bit, identity needs 24, and existing return values are low and needs less than 7 bits, I make it all fit on the samecb
cell.Once we have the counters in an eBPF map, we would need to extract and process it. It could be a Cilium metric, but I'm not sure how to implement custom, user-defined metrics. It could be just from the command line, but similarly this requires to give a way to users to specify what kind of metrics they want. At the moment, the values from the counters can be retrieved from bpftool.