datapath: add tail call hooks for custom metrics, bytecounter example #13191

qmonnet · 2020-09-16T15:00:52Z

Custom Metrics

Byte counters! Let's count how much data is received and sent for each endpoint. Related PR: #13173

But more than byte counters, this PR introduces hooks in the datapath so that users can tail call into custom programs, and collect all kind of metrics they want - Provided they implement it themselves in eBPF. Here is an overview:

First commit introduces the tail call hooks in the datapath (IPv4 only), plus management for the per-endpoint prog_array maps used to reference the custom programs. Those maps have two entries, one for ingress and one for egress.
Second commit makes these tail calls optional, and opt-in, through the introduction of a dedicated Cilium option.
Third commit adds a sample application: a simple byte counter which can be attached to keep track of the amount of data received or sent for a given endpoint.
Fourth commit adds the tail call hooks to the IPv6 datapath, on the same model as for IPv4.

To use it, once Cilium is deployed:

Compile the custom program
```
make -C bpf/custom
```

Load it

bpftool prog load bpf/custom/bpf_custom.o /sys/fs/bpf/tc/globals/bytecounter type classifier \
        pinmaps /sys/fs/bpf/tc/globals/bytecounter_maps

Reference it from the prog_array map

EP=<endpoint_id>
bpftool map update pinned /sys/fs/bpf/tc/globals/cilium_calls_custom_$(printf '%05d' ${EP}) \
        key 0 0 0 0 value pinned /sys/fs/bpf/tc/globals/bytecounter              # or key `1 0 0 0` for egress

For now, dump counters' value manually

bpftool map dump pinned /sys/fs/bpf/tc/globals/bytecounter_maps/bytecount_map

PR Status

Here are some points of interest:

I am still trying to finalise the location for the ingress hook. In the current version, the tail call added to tail_ipv4_policy() is not always executed, as handle_policy() (which calls it) is skipped in ipv4_local_delivery() if USE_BPF_PROG_FOR_INGRESS_POLICY is defined i.e. if enable-endpoint-routes is set to true. This is the case for GKE, for example. So my current plan is to add the tail call to the different branches in ipv4_local_delivery(), possibly by replacing returns with goto in that function to write the tail call just once, and also possibly with preprocessor guards to make sure it is only called from bpf_lxc.
The custom program needs to return with the same value the original program would have used, to preserve the datapath logics. This means that we must pass the desired value from bpf_lxc to the custom program we hook into. I'm using cb[CB_CT_STATE] for that because it did not seem to be used at that location, but I might be wrong. Feedback on this approach (or an alternatives) would be welcome.
I also pass the current direction (ingress or egress) and the source identity to the custom program. Given that direction needs 1 bit, identity needs 24, and existing return values are low and needs less than 7 bits, I make it all fit on the same cb cell.
Once we have the counters in an eBPF map, we would need to extract and process it. It could be a Cilium metric, but I'm not sure how to implement custom, user-defined metrics. It could be just from the command line, but similarly this requires to give a way to users to specify what kind of metrics they want. At the moment, the values from the counters can be retrieved from bpftool.

pchaigno

Thanks for all the comments and explanations! That makes it very easy to review :-)

A few nits below, I haven't commented on the position of hooks within bpf_lxc since we discussed that during #sig-datapath meeting already.

My main remaining concern is around the API we expose. I'm assuming we won't be able to change what we expose to custom program after the fact because it may break custom programs in the wild. I've added a couple comments below in that regard.

bpf/lib/maps.h

pkg/datapath/linux/config/config.go

bpf/custom/bpf_custom.c

bpf/custom/bytecount.h

bpf/custom/bpf_custom.c

pkg/datapath/linux/config/config.go

bpf/bpf_lxc.c

aanm

Reviewed the agent side changes and they seem pretty small to me

joestringer

This looks pretty clean to me, nice work.

The main questions that come to mind reviewing this PR are:

Custom metrics on packets that are subject to L7 forwarding seems a little inconsistent
Seems like there are two approaches for handling ingress/egress hinted in the PR, either having two programs (one ingress / one egress) or passing the direction bit in the CB. Is this just left over from a previous implementation or did you intend to support more than one custom program?
Some helpers could be called by a custom program and influence the handling of the packet, such as bpf_redirect(). What kind of guard rails do we want for cases like this? (One simple answer can be just leave it up to the implementer not to mess up; or if we're more concerned we could put compile-time warnings in place to discourage such behaviour)
Is IPv6 support intended? How will we handle this?

I've added comments on each of these topics below so we can have threaded discussions.

Beyond that, there's just a scattering of minor nits, none of which are overly consequential.

pkg/datapath/linux/config/config.go

bpf/lib/maps.h

bpf/custom/bpf_custom.c

bpf/bpf_lxc.c

bpf/custom/bpf_custom.c

bpf/bpf_lxc.c

Add a new CI test to check that custom programs loaded and run via the datapath tail call hooks work as expected. In particular, we use the byte counter example provided in a previous commit. The test is the following: We deploy Cilium and some pods, we load the byte counter program and attach it to a hook (ingress or egress) for a given endpoint, we send some traffic (ping requests and replies), and then check the values in the eBPF map where the count is stored. Try again with per-endpoint routes, to check the remaining ingress hook. A new manifest is added, to make sure that we have two simple application endpoints, located on different nodes. This is to check that the egress hook is working (there is no egress hook for socket-based load-balancing yet). This manifest also pulls an image which is used to compile the byte counter program before we can load it. This image should be updated in the future to embed the sample byte counter and all related headers, so we wouldn't have to mount a volume against the source repository on the node. This would permit to remove the byte counter from Cilium's repository, and to keep it in that dedicated image instead. Skip the test on 4.9 kernels because bpftool map updates do not work on such kernels. Skip on GKE because we do not have Cilium's source to compile the custom program from. Both issues should be addressed in the future, by moving the program sources to a dedicated image which should also embed its own loader. But actually, just run on net-next, since the coverage for other kernel versions would be the same. Signed-off-by: Quentin Monnet <quentin@isovalent.com>

qmonnet · 2021-03-24T22:26:38Z

Rebased, following merge conflicts
test-me-please

qmonnet · 2021-03-25T07:20:31Z

test-gke

pchaigno · 2021-03-25T11:30:22Z

All required CI jobs are green. A review from @cilium/kubernetes is missing but I think we can skip it as it was only requested for install/kubernetes/ and several person familiar with the agent and Helm reviewed. Marking as ready to merge.

The AfterAll() and AfterEach() blocks in the test file for custom calls run everytime, even if the Context block for the actual tests is skipped. In that case, running the final blocks results in an attempt to remove deployments that have never been set up in the first place. This may lead to the blocks failing when the tests were in fact skipped, and may produce test artifacts even though Jenkins does not considered the test failed. Let's reorganise those blocks, to make sure they are called only when necessary. Note that we do need to keep both DeleteCilium() and DeleteAll(), even if they are now in the same block, as calling only DeleteAll() would not remove the Cilium ConfigMap. Fixes: 37f6192 ("test: add CI test for tail calls hooks for custom programs") Fixes: cilium#13191 Fixes: cilium#16633 Reported-by: Paul Chaignon <paul@cilium.io> Signed-off-by: Quentin Monnet <quentin@isovalent.com>

The AfterAll() and AfterEach() blocks in the test file for custom calls run everytime, even if the Context block for the actual tests is skipped. In that case, running the final blocks results in an attempt to remove deployments that have never been set up in the first place. This may lead to the blocks failing when the tests were in fact skipped, and may produce test artifacts even though Jenkins does not considered the test failed. Let's reorganise those blocks, to make sure they are called only when necessary. Note that we do need to keep both DeleteCilium() and DeleteAll(), even if they are now in the same block, as calling only DeleteAll() would not remove the Cilium ConfigMap. Fixes: 37f6192 ("test: add CI test for tail calls hooks for custom programs") Fixes: #13191 Fixes: #16633 Reported-by: Paul Chaignon <paul@cilium.io> Signed-off-by: Quentin Monnet <quentin@isovalent.com>

[ upstream commit 9d4e99d ] The AfterAll() and AfterEach() blocks in the test file for custom calls run everytime, even if the Context block for the actual tests is skipped. In that case, running the final blocks results in an attempt to remove deployments that have never been set up in the first place. This may lead to the blocks failing when the tests were in fact skipped, and may produce test artifacts even though Jenkins does not considered the test failed. Let's reorganise those blocks, to make sure they are called only when necessary. Note that we do need to keep both DeleteCilium() and DeleteAll(), even if they are now in the same block, as calling only DeleteAll() would not remove the Cilium ConfigMap. Fixes: 37f6192 ("test: add CI test for tail calls hooks for custom programs") Fixes: cilium#13191 Fixes: cilium#16633 Reported-by: Paul Chaignon <paul@cilium.io> Signed-off-by: Quentin Monnet <quentin@isovalent.com> Signed-off-by: Tobias Klauser <tobias@cilium.io>

[ upstream commit 9d4e99d ] The AfterAll() and AfterEach() blocks in the test file for custom calls run everytime, even if the Context block for the actual tests is skipped. In that case, running the final blocks results in an attempt to remove deployments that have never been set up in the first place. This may lead to the blocks failing when the tests were in fact skipped, and may produce test artifacts even though Jenkins does not considered the test failed. Let's reorganise those blocks, to make sure they are called only when necessary. Note that we do need to keep both DeleteCilium() and DeleteAll(), even if they are now in the same block, as calling only DeleteAll() would not remove the Cilium ConfigMap. Fixes: 37f6192 ("test: add CI test for tail calls hooks for custom programs") Fixes: #13191 Fixes: #16633 Reported-by: Paul Chaignon <paul@cilium.io> Signed-off-by: Quentin Monnet <quentin@isovalent.com> Signed-off-by: Tobias Klauser <tobias@cilium.io>

Thearas · 2022-06-10T21:48:47Z

Hi @qmonnet, great job! I'm just looking for a traffic byte-counter, and I have a few question:

The byte-counter is implemented using BPF_MAP_TYPE_HASH, would it be more efficient to implement it using BPF_MAP_TYPE_PERCPU_HASH? Just like cilium_metrics.
How to determine if the __ctx_buff is on ingress path or on egress path?

Looking forward to your reply, much appreciated!

qmonnet · 2022-06-13T10:53:17Z

Hi, thanks for using the feature!

The byte-counter is implemented using BPF_MAP_TYPE_HASH, would it be more efficient to implement it using BPF_MAP_TYPE_PERCPU_HASH? Just like cilium_metrics.

I'm not sure what gain you expect here. The updates on the counter value are atomic operations, so it shouldn't be an issue if multiple CPUs work with the same map. This being said, the point of these hooks is to have custom programs, so of course you're free to experiment :)

How to determine if the __ctx_buff is on ingress path or on egress path?

Please have a look at the related README.rst, and in particular at step 3 in the instructions for the byte-counter program. There are several hooks where you can attach the program, some on ingress, some on egress. The current program does not make the distinction between ingress and egress. If you want it, you can edit the program to e.g. use another map for egress (distinct from the one for ingress), and attach this new program to the hooks on egress.

Thearas · 2022-06-14T21:02:09Z

Thanks!
One more question, how to get byte-counter program to be attached automatically when the cilium_calls_custom_* map is created?

qmonnet · 2022-06-21T20:46:04Z

There's nothing of the sort implemented in the agent currently, so I suppose you'd have to create some watcher to check for the presence of the maps and attach the programs when necessary.

qmonnet requested review from a team September 16, 2020 15:00

qmonnet requested review from a team as code owners September 16, 2020 15:00

qmonnet added the sig/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages. label Sep 16, 2020

maintainer-s-little-helper bot added the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Sep 16, 2020

qmonnet marked this pull request as draft September 16, 2020 15:01

qmonnet added release-note/major This PR introduces major new functionality to Cilium. release-note/minor This PR changes functionality that users may find relevant to operating Cilium. labels Sep 16, 2020

maintainer-s-little-helper bot removed dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. labels Sep 16, 2020

qmonnet removed the release-note/major This PR introduces major new functionality to Cilium. label Sep 16, 2020

qmonnet force-pushed the pr/qmonnet/custom_metrics branch from adab887 to ff39c82 Compare September 16, 2020 15:30

pchaigno reviewed Sep 21, 2020

View reviewed changes

qmonnet force-pushed the pr/qmonnet/custom_metrics branch from ff39c82 to 7708a75 Compare September 28, 2020 17:20

qmonnet marked this pull request as ready for review September 28, 2020 17:29

pchaigno reviewed Sep 28, 2020

View reviewed changes

pkg/datapath/linux/config/config.go Show resolved Hide resolved

pchaigno reviewed Sep 28, 2020

View reviewed changes

bpf/bpf_lxc.c Outdated Show resolved Hide resolved

aanm approved these changes Sep 28, 2020

View reviewed changes

joestringer requested changes Sep 28, 2020

View reviewed changes

qmonnet added the dont-merge/needs-rebase This PR needs to be rebased because it has merge conflicts. label Oct 12, 2020

aanm added the dont-merge/wait-until-release Freeze window for current release is blocking non-bugfix PRs label Oct 13, 2020

aanm removed the dont-merge/wait-until-release Freeze window for current release is blocking non-bugfix PRs label Nov 10, 2020

This comment has been minimized.

Sign in to view

stale bot added the stale The stale bot thinks this issue is old. Add "pinned" label to prevent this from becoming stale. label Dec 11, 2020

pchaigno marked this pull request as draft December 11, 2020 09:04

stale bot removed the stale The stale bot thinks this issue is old. Add "pinned" label to prevent this from becoming stale. label Dec 11, 2020

This comment has been minimized.

Sign in to view

stale bot added the stale The stale bot thinks this issue is old. Add "pinned" label to prevent this from becoming stale. label Jan 10, 2021

qmonnet force-pushed the pr/qmonnet/custom_metrics branch 2 times, most recently from 83916fd to 4e124dd Compare March 24, 2021 21:13

qmonnet force-pushed the pr/qmonnet/custom_metrics branch from 4e124dd to 83e0f7c Compare March 24, 2021 21:38

nbusseneau approved these changes Mar 25, 2021

View reviewed changes

pchaigno approved these changes Mar 25, 2021

View reviewed changes

pchaigno unassigned joestringer Mar 25, 2021

pchaigno added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Mar 25, 2021

joestringer merged commit 37f6192 into master Mar 25, 2021

1.10.0 automation moved this from Must have to Done Mar 25, 2021

joestringer deleted the pr/qmonnet/custom_metrics branch March 25, 2021 16:42

joestringer moved this from Done to Merged Features in 1.10.0 Mar 29, 2021

This was referenced Apr 28, 2021

Prepare for release v1.10.0-rc1 #15896

Closed

Prepare for release v1.10.0-rc1 #15897

Merged

qmonnet mentioned this pull request Jun 25, 2021

tests: rework custom calls's AfterEach/AfterAll blocks to skip if needed #16651

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

datapath: add tail call hooks for custom metrics, bytecounter example #13191

datapath: add tail call hooks for custom metrics, bytecounter example #13191

qmonnet commented Sep 16, 2020 •

edited

pchaigno left a comment

aanm left a comment

joestringer left a comment

This comment has been minimized.

This comment has been minimized.

qmonnet commented Mar 24, 2021 •

edited

qmonnet commented Mar 25, 2021

pchaigno commented Mar 25, 2021 •

edited

Thearas commented Jun 10, 2022 •

edited

qmonnet commented Jun 13, 2022

Thearas commented Jun 14, 2022

qmonnet commented Jun 21, 2022

datapath: add tail call hooks for custom metrics, bytecounter example #13191

datapath: add tail call hooks for custom metrics, bytecounter example #13191

Conversation

qmonnet commented Sep 16, 2020 • edited

Custom Metrics

PR Status

pchaigno left a comment

Choose a reason for hiding this comment

aanm left a comment

Choose a reason for hiding this comment

joestringer left a comment

Choose a reason for hiding this comment

This comment has been minimized.

This comment has been minimized.

qmonnet commented Mar 24, 2021 • edited

qmonnet commented Mar 25, 2021

pchaigno commented Mar 25, 2021 • edited

Thearas commented Jun 10, 2022 • edited

qmonnet commented Jun 13, 2022

Thearas commented Jun 14, 2022

qmonnet commented Jun 21, 2022

qmonnet commented Sep 16, 2020 •

edited

qmonnet commented Mar 24, 2021 •

edited

pchaigno commented Mar 25, 2021 •

edited

Thearas commented Jun 10, 2022 •

edited