New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cilium: optimize bpf to use jiffies for ct maps #11434
Conversation
test-me-please |
test-me-please |
test-me-please |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oO That's neat! And I didn't think it would work so well. warp
is almost always <=2 on my system.
I also checked that the number of retries is high enough (in case of on interrupts). I couldn't make ./cilium-probe-kernel-hz
fail even while creating artificial load (though not RT).
test-me-please |
test-me-please |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! A few small nits inline, address as you see fit.
In order to use and work with jiffies64() BPF helper, we need to know the HZ value that the kernel is operating in. Doing something like a sysconf(_SC_CLK_TCK) from user space to get the tick frequency won't work since this is the user space HZ value whereas BPF programs run in kernel space, so we need the kernel HZ. It is not exposed directly and we cannot assume that the kernel config is always exposed by distros, so as an alternative /proc/timer_list can be used. This work adds a smal tool cilium-probe-kernel-hz which probes and sanity checks the HZ value and emits a define KERNEL_HZ which can be used by the BPF prog. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Rework our time-keeping helpers a bit and add new jiffie helpers for converting seconds back and forth based on the KERNEL_HZ. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Switch CT timeout and report intervals to more lightweight jiffies instead. Our bpf_ktime_get_sec() can become very expensive and we only need a low resolution time source anyway in all our code. For now the most expensive operation is the constant update of the CT entry timeout which is needed for every packet. Switch it to jiffies for newer kernels. Precision is scaled down in order to better fit entry->lifetime. Under XDP stress testing this gave a performance improvement of approx +1.1Mpps. More candidates are on todo to convert later as well (services w/ session affinity, SNAT create timeout). Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Adapt the CT GC accordingly to either work on monotonic clock or jiffies. Given the timeout of low-res anyway, we can simply read out the current kernel value of jiffies via /proc/timer_list and use that for comparing timeouts. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
test-me-please |
For existing deployments, we need to continue to use ktime as source, but for newly deployed ones via helm, we can opt-into probing for jiffies as source instead. Therefore implement an agent switch --enable-bpf-clock-probe to configure it as well as helm support. Also update cilium-agent.md doc. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
test-me-please |
test-with-kernel |
1 similar comment
test-with-kernel |
See commit msgs, gives ~ +1.1Mpps on my test machine under xdp benchmark.