New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix 5.10+ complexity issue with kubeProxyReplacement=disabled
#16084
Conversation
mattr=+alu32, supported since LLVM 7.0 and implied by mcpu=v3, enables the use of 32-bit registers in BPF bytecode. Enabling this compiler option can however result in loading issues as illustrated below. 12: (61) r1 = *(u32 *)(r0 +80) // ctx->data_end 13: (61) r6 = *(u32 *)(r0 +76) // ctx->data 14: (bc) w7 = w6 // <- verifier looses track of inferred pkt type here. [...] 38: (71) r1 = *(u8 *)(r7 +20) R7 invalid mem access 'inv' These errors typically happen because the data and data_end pointers are actually 32-bit registers. Depending on how these pointers are used, LLVM sometimes makes use of that assumption (e.g., 32-bit assignment on instruction 14 above). The verifier is however not able to follow and reject such programs. We can usually work around those by ensuring these pointers are only used via 64-bit types. This commit implements this wherever needed to pass the verifier. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Paul Chaignon <paul@cilium.io>
Set mcpu=v3 in the compiler on kernels 5.10+ to use all available eBPF instructions and 32-bit registers. This change fixes the complexity issue we're hitting on v5.10+ when socket-level load balancing is disabled (via enable-host-services=false or kube-proxy-replacement=disabled). Using the third eBPF instruction set doesn't reduce complexity for all BPF programs but it leads to more standard numbers, with less variations in complexities. A big part of this improvement is due to the implicit use of mattr=+alu32 to enable 32-bit eBPF registers. In addition to the end-to-end test on bpf-next, this change was tested on kernels 5.10 and 5.11 with the existing verifier-test.sh, compiling the datapath with both KERNEL=netnext and KERNEL=419. Signed-off-by: Paul Chaignon <paul@cilium.io>
On master and with kernels 5.10+, we have a complexity issue when ENABLE_HOST_SERVICES_FULL is undefined (i.e., socket-level load balancing is disabled and additional code compiled in bpf_lxc as a replacement). Our verifier test included a workaround for that issue, by always defining ENABLE_HOST_SERVICES_FULL on newer kernels. This commit removes that workaround since the previous commit fixed the complexity issue. Signed-off-by: Paul Chaignon <paul@cilium.io>
5884637
to
552ec87
Compare
test-me-please |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm!
@brb It's on my radar to backport to both v1.9 and v1.8. I want to finish testing with the latest BPF security fixes first as those will be backported to all LTS kernels. Looks good so far, but I didn't run the full e2e tests yet. |
Marking for backport to v1.9. |
Is this not going to be backported to 1.8 after all? |
Probably not. It's a fair bit of effort to backport to stable branches because several other PRs need to be backported as well. v1.8 is also not supported since the release of v1.11. |
This pull request fixes #14726, our complexity issue with
kubeProxyReplacement=disabled
when running Linux 5.10+. The fix is less general than originally hoped and only applies to 5.10+; other complexity issues are therefore not fixed by this change. With a bit more work, we may able to extend it to 5.1+ in the future.See commits for details. As a summary:
mattr=+alu32
.mcpu=v3
(impliesmattr=+alu32
) on kernels 5.10+.In addition to the end-to-end tests, these changes were also tested on kernels 5.10 and 5.11 with datapath configurations
KERNEL=419
andKERNEL=netnext
.