New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
endpoint: Write headerfile under full regeneration #12524
endpoint: Write headerfile under full regeneration #12524
Conversation
I ran into this bug while debugging #12482 and looked into it. I'm not 100% sure on the implementation of this, but it does seem reasonable to me. I'd like to get feedback from those who have more context on the regeneration of endpoints. @joestringer The code touches recently changed parts by you. It'd be great to have your feedback. |
test-me-please |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor nit, I think we can have this be a little more concise but the change LGTM.
pkg/endpoint/bpf.go
Outdated
@@ -870,6 +870,11 @@ func (e *Endpoint) runPreCompilationSteps(regenContext *regenerationContext) (he | |||
if err = e.writeHeaderfile(nextDir); err != nil { | |||
return false, fmt.Errorf("unable to write header file: %s", err) | |||
} | |||
} else if datapathRegenCtxt.regenerationLevel == regeneration.RegenerateWithDatapathRebuild { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could probably tweak this to avoid duplicating the writeHeaderfile()
call by instead having the headerfileChanged
check above only set the regenerationLevel
then have this check do the following instead:
} else if datapathRegenCtxt.regenerationLevel == regeneration.RegenerateWithDatapathRebuild { | |
if datapathRegenCtxt.regenerationLevel >= regeneration.RegenerateWithDatapathRewrite { |
Then I'm not sure whether the comment below provides any additional context; basically if either a rewrite or rebuild is required, we'll be sure to write the headerfile again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
May be worth considering whether we should backport this. In an ideal circumstance we should never need this, but even for development purposes for backports and so on it may turn out to be useful. I note that the referred PR that this marks as "Fixes" was also attempting to fix a broader version of this bug and was backported to v1.6; hopefully this will bring the saga to a close :-) |
I was thinking that as well. It's not a crucial bug, but the backports should be trivial as the change is small and code paths are consistent in all relevant trees. Marking for 1.{6,7,8}. |
Previously, regenerating an endpoint manually via CLI resulted in compilation errors as seen below. These errors occurred because the endpoint headerfile (ep_config.h or lxc_config.h) was not written to the `<endpoint ID>_next` directory (will be reffered to as "next"), where the BPF compilation takes place. The reason the headerfile was not written to the "next" directory was because Cilium only wrote the headerfile if it was changed (via hash). However, a manual regeneration triggered through the API, sets the regeneration level to "compile+load" (a full regeneration), where Cilium expects all relevant files, including the headerfile, to be present in the "next" directory (`<endpoint ID>_next`). Hence, the compilation errors. This commit fixes this issue by checking whether there has been a request for a full regeneration, in which case, we write the headerfile to the "next" directory. ``` root@k8s2:/var/run/cilium/state# clang -emit-llvm -O2 -target bpf -std=gnu89 -nostdinc -D__NR_CPUS__=2 -Wall -Wextra -Werror -Wshadow -Wno-address-of-packed-member -Wno-unknown-warning-option -Wno-gnu-variable-s ized-type-not-at-end -Wdeclaration-after-statement -I/var/run/cilium/state/globals -I1285_next -I/var/lib/cilium/bpf -I/var/lib/cilium/bpf/include -c /var/lib/cilium/bpf/bpf_lxc.c -o /tmp/c In file included from /var/lib/cilium/bpf/bpf_lxc.c:22: /var/lib/cilium/bpf/lib/icmp6.h:50:29: error: use of undeclared identifier 'NODE_MAC' union macaddr smac, dmac = NODE_MAC; ^ /var/lib/cilium/bpf/lib/icmp6.h:359:30: error: use of undeclared identifier 'NODE_MAC' union macaddr router_mac = NODE_MAC; ^ In file included from /var/lib/cilium/bpf/bpf_lxc.c:26: /var/lib/cilium/bpf/lib/lxc.h:67:29: error: use of undeclared identifier 'NODE_MAC' union macaddr router_mac = NODE_MAC; ^ /var/lib/cilium/bpf/bpf_lxc.c:159:10: error: implicit declaration of function 'lookup_ip6_remote_endpoint' [-Werror,-Wimplicit-function-declaration] info = lookup_ip6_remote_endpoint(&orig_dip); ^ /var/lib/cilium/bpf/bpf_lxc.c:159:10: note: did you mean 'lookup_ip6_endpoint'? /var/lib/cilium/bpf/lib/eps.h:13:1: note: 'lookup_ip6_endpoint' declared here lookup_ip6_endpoint(struct ipv6hdr *ip6) ^ /var/lib/cilium/bpf/bpf_lxc.c:159:8: error: incompatible integer to pointer conversion assigning to 'struct remote_endpoint_info *' from 'int' [-Werror,-Wint-conversion] info = lookup_ip6_remote_endpoint(&orig_dip); ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ /var/lib/cilium/bpf/bpf_lxc.c:529:10: error: implicit declaration of function 'lookup_ip4_remote_endpoint' [-Werror,-Wimplicit-function-declaration] info = lookup_ip4_remote_endpoint(orig_dip); ^ /var/lib/cilium/bpf/bpf_lxc.c:529:10: note: did you mean 'lookup_ip4_endpoint'? /var/lib/cilium/bpf/lib/eps.h:35:1: note: 'lookup_ip4_endpoint' declared here lookup_ip4_endpoint(const struct iphdr *ip4) ^ /var/lib/cilium/bpf/bpf_lxc.c:529:8: error: incompatible integer to pointer conversion assigning to 'struct remote_endpoint_info *' from 'int' [-Werror,-Wint-conversion] info = lookup_ip4_remote_endpoint(orig_dip); ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ /var/lib/cilium/bpf/bpf_lxc.c:1027:10: error: implicit declaration of function 'lookup_ip6_remote_endpoint' [-Werror,-Wimplicit-function-declaration] info = lookup_ip6_remote_endpoint(src); ^ /var/lib/cilium/bpf/bpf_lxc.c:1027:8: error: incompatible integer to pointer conversion assigning to 'struct remote_endpoint_info *' from 'int' [-Werror,-Wint-conversion] info = lookup_ip6_remote_endpoint(src); ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ /var/lib/cilium/bpf/bpf_lxc.c:1256:10: error: implicit declaration of function 'lookup_ip4_remote_endpoint' [-Werror,-Wimplicit-function-declaration] info = lookup_ip4_remote_endpoint(ip4->saddr); ^ /var/lib/cilium/bpf/bpf_lxc.c:1256:8: error: incompatible integer to pointer conversion assigning to 'struct remote_endpoint_info *' from 'int' [-Werror,-Wint-conversion] info = lookup_ip4_remote_endpoint(ip4->saddr); ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 11 errors generated. ``` Fixes cilium#10630 Fixes cilium#12005 Signed-off-by: Chris Tarazi <chris@isovalent.com>
d5d54e3
to
d138143
Compare
@christarazi Did you check if this fix also works for the host endpoint? I can have a look as a follow up if not. |
test-me-please |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm as far as I can see
Previously, regenerating an endpoint manually via CLI resulted in
compilation errors as seen below. These errors occurred because the
endpoint headerfile (ep_config.h or lxc_config.h) was not written to the
<endpoint ID>_next
directory (will be reffered to as "next"), wherethe BPF compilation takes place.
The reason the headerfile was not written to the "next" directory was
because Cilium only wrote the headerfile if it was changed (via hash).
However, a manual regeneration triggered through the API, sets the
regeneration level to "compile+load" (a full regeneration), where Cilium
expects all relevant files, including the headerfile, to be present in
the "next" directory (
<endpoint ID>_next
). Hence, the compilationerrors.
This commit fixes this issue by checking whether there has been a
request for a full regeneration, in which case, we write the headerfile
to the "next" directory.
Fixes #10630
Fixes #12005