Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

endpoint: Write headerfile under full regeneration #12524

Conversation

christarazi
Copy link
Member

Previously, regenerating an endpoint manually via CLI resulted in
compilation errors as seen below. These errors occurred because the
endpoint headerfile (ep_config.h or lxc_config.h) was not written to the
<endpoint ID>_next directory (will be reffered to as "next"), where
the BPF compilation takes place.

The reason the headerfile was not written to the "next" directory was
because Cilium only wrote the headerfile if it was changed (via hash).
However, a manual regeneration triggered through the API, sets the
regeneration level to "compile+load" (a full regeneration), where Cilium
expects all relevant files, including the headerfile, to be present in
the "next" directory (<endpoint ID>_next). Hence, the compilation
errors.

This commit fixes this issue by checking whether there has been a
request for a full regeneration, in which case, we write the headerfile
to the "next" directory.

root@k8s2:/var/run/cilium/state# clang -emit-llvm -O2 -target bpf -std=gnu89 -nostdinc -D__NR_CPUS__=2 -Wall -Wextra -Werror -Wshadow -Wno-address-of-packed-member -Wno-unknown-warning-option -Wno-gnu-variable-s
ized-type-not-at-end -Wdeclaration-after-statement -I/var/run/cilium/state/globals -I1285_next -I/var/lib/cilium/bpf -I/var/lib/cilium/bpf/include -c /var/lib/cilium/bpf/bpf_lxc.c -o /tmp/c
In file included from /var/lib/cilium/bpf/bpf_lxc.c:22:
/var/lib/cilium/bpf/lib/icmp6.h:50:29: error: use of undeclared identifier 'NODE_MAC'
        union macaddr smac, dmac = NODE_MAC;
                                   ^
/var/lib/cilium/bpf/lib/icmp6.h:359:30: error: use of undeclared identifier 'NODE_MAC'
                union macaddr router_mac = NODE_MAC;
                                           ^
In file included from /var/lib/cilium/bpf/bpf_lxc.c:26:
/var/lib/cilium/bpf/lib/lxc.h:67:29: error: use of undeclared identifier 'NODE_MAC'
        union macaddr router_mac = NODE_MAC;
                                   ^
/var/lib/cilium/bpf/bpf_lxc.c:159:10: error: implicit declaration of function 'lookup_ip6_remote_endpoint' [-Werror,-Wimplicit-function-declaration]
                info = lookup_ip6_remote_endpoint(&orig_dip);
                       ^
/var/lib/cilium/bpf/bpf_lxc.c:159:10: note: did you mean 'lookup_ip6_endpoint'?
/var/lib/cilium/bpf/lib/eps.h:13:1: note: 'lookup_ip6_endpoint' declared here
lookup_ip6_endpoint(struct ipv6hdr *ip6)
^
/var/lib/cilium/bpf/bpf_lxc.c:159:8: error: incompatible integer to pointer conversion assigning to 'struct remote_endpoint_info *' from 'int' [-Werror,-Wint-conversion]
                info = lookup_ip6_remote_endpoint(&orig_dip);
                     ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/var/lib/cilium/bpf/bpf_lxc.c:529:10: error: implicit declaration of function 'lookup_ip4_remote_endpoint' [-Werror,-Wimplicit-function-declaration]
                info = lookup_ip4_remote_endpoint(orig_dip);
                       ^
/var/lib/cilium/bpf/bpf_lxc.c:529:10: note: did you mean 'lookup_ip4_endpoint'?
/var/lib/cilium/bpf/lib/eps.h:35:1: note: 'lookup_ip4_endpoint' declared here
lookup_ip4_endpoint(const struct iphdr *ip4)
^
/var/lib/cilium/bpf/bpf_lxc.c:529:8: error: incompatible integer to pointer conversion assigning to 'struct remote_endpoint_info *' from 'int' [-Werror,-Wint-conversion]
                info = lookup_ip4_remote_endpoint(orig_dip);
                     ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/var/lib/cilium/bpf/bpf_lxc.c:1027:10: error: implicit declaration of function 'lookup_ip6_remote_endpoint' [-Werror,-Wimplicit-function-declaration]
                info = lookup_ip6_remote_endpoint(src);
                       ^
/var/lib/cilium/bpf/bpf_lxc.c:1027:8: error: incompatible integer to pointer conversion assigning to 'struct remote_endpoint_info *' from 'int' [-Werror,-Wint-conversion]
                info = lookup_ip6_remote_endpoint(src);
                     ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/var/lib/cilium/bpf/bpf_lxc.c:1256:10: error: implicit declaration of function 'lookup_ip4_remote_endpoint' [-Werror,-Wimplicit-function-declaration]
                info = lookup_ip4_remote_endpoint(ip4->saddr);
                       ^
/var/lib/cilium/bpf/bpf_lxc.c:1256:8: error: incompatible integer to pointer conversion assigning to 'struct remote_endpoint_info *' from 'int' [-Werror,-Wint-conversion]
                info = lookup_ip4_remote_endpoint(ip4->saddr);
                     ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
11 errors generated.

Fixes #10630
Fixes #12005

Fix manual endpoint regeneration via command line

@christarazi christarazi requested a review from a team as a code owner July 14, 2020 20:59
@christarazi christarazi added sig/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages. kind/bug This is a bug in the Cilium logic. release-note/bug This PR fixes an issue in a previous release of Cilium. area/endpoint labels Jul 14, 2020
@maintainer-s-little-helper maintainer-s-little-helper bot added dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. and removed dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. labels Jul 14, 2020
@christarazi christarazi added sig/loader Impacts the loading of BPF programs into the kernel. kind/regression This functionality worked fine before, but was broken in a newer release of Cilium. and removed area/endpoint labels Jul 14, 2020
@christarazi
Copy link
Member Author

christarazi commented Jul 14, 2020

I ran into this bug while debugging #12482 and looked into it. I'm not 100% sure on the implementation of this, but it does seem reasonable to me. I'd like to get feedback from those who have more context on the regeneration of endpoints.

@joestringer The code touches recently changed parts by you. It'd be great to have your feedback.

@christarazi
Copy link
Member Author

test-me-please

@coveralls
Copy link

coveralls commented Jul 14, 2020

Coverage Status

Coverage decreased (-0.05%) to 36.954% when pulling d138143 on christarazi:pr/christarazi/fix-endpoint-manual-regenerate into b0610a4 on cilium:master.

Copy link
Member

@joestringer joestringer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor nit, I think we can have this be a little more concise but the change LGTM.

@@ -870,6 +870,11 @@ func (e *Endpoint) runPreCompilationSteps(regenContext *regenerationContext) (he
if err = e.writeHeaderfile(nextDir); err != nil {
return false, fmt.Errorf("unable to write header file: %s", err)
}
} else if datapathRegenCtxt.regenerationLevel == regeneration.RegenerateWithDatapathRebuild {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could probably tweak this to avoid duplicating the writeHeaderfile() call by instead having the headerfileChanged check above only set the regenerationLevel then have this check do the following instead:

Suggested change
} else if datapathRegenCtxt.regenerationLevel == regeneration.RegenerateWithDatapathRebuild {
if datapathRegenCtxt.regenerationLevel >= regeneration.RegenerateWithDatapathRewrite {

Then I'm not sure whether the comment below provides any additional context; basically if either a rewrite or rebuild is required, we'll be sure to write the headerfile again.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@joestringer
Copy link
Member

May be worth considering whether we should backport this. In an ideal circumstance we should never need this, but even for development purposes for backports and so on it may turn out to be useful. I note that the referred PR that this marks as "Fixes" was also attempting to fix a broader version of this bug and was backported to v1.6; hopefully this will bring the saga to a close :-)

@christarazi
Copy link
Member Author

christarazi commented Jul 14, 2020

May be worth considering whether we should backport this. In an ideal circumstance we should never need this, but even for development purposes for backports and so on it may turn out to be useful. I note that the referred PR that this marks as "Fixes" was also attempting to fix a broader version of this bug and was backported to v1.6; hopefully this will bring the saga to a close :-)

I was thinking that as well. It's not a crucial bug, but the backports should be trivial as the change is small and code paths are consistent in all relevant trees. Marking for 1.{6,7,8}.

@maintainer-s-little-helper maintainer-s-little-helper bot added this to Needs backport from master in 1.8.2 Jul 14, 2020
@maintainer-s-little-helper maintainer-s-little-helper bot added this to Needs backport from master in 1.7.7 Jul 14, 2020
@maintainer-s-little-helper maintainer-s-little-helper bot added this to Needs backport from master in 1.6.11 Jul 14, 2020
Previously, regenerating an endpoint manually via CLI resulted in
compilation errors as seen below. These errors occurred because the
endpoint headerfile (ep_config.h or lxc_config.h) was not written to the
`<endpoint ID>_next` directory (will be reffered to as "next"), where
the BPF compilation takes place.

The reason the headerfile was not written to the "next" directory was
because Cilium only wrote the headerfile if it was changed (via hash).
However, a manual regeneration triggered through the API, sets the
regeneration level to "compile+load" (a full regeneration), where Cilium
expects all relevant files, including the headerfile, to be present in
the "next" directory (`<endpoint ID>_next`). Hence, the compilation
errors.

This commit fixes this issue by checking whether there has been a
request for a full regeneration, in which case, we write the headerfile
to the "next" directory.

```
root@k8s2:/var/run/cilium/state# clang -emit-llvm -O2 -target bpf -std=gnu89 -nostdinc -D__NR_CPUS__=2 -Wall -Wextra -Werror -Wshadow -Wno-address-of-packed-member -Wno-unknown-warning-option -Wno-gnu-variable-s
ized-type-not-at-end -Wdeclaration-after-statement -I/var/run/cilium/state/globals -I1285_next -I/var/lib/cilium/bpf -I/var/lib/cilium/bpf/include -c /var/lib/cilium/bpf/bpf_lxc.c -o /tmp/c
In file included from /var/lib/cilium/bpf/bpf_lxc.c:22:
/var/lib/cilium/bpf/lib/icmp6.h:50:29: error: use of undeclared identifier 'NODE_MAC'
        union macaddr smac, dmac = NODE_MAC;
                                   ^
/var/lib/cilium/bpf/lib/icmp6.h:359:30: error: use of undeclared identifier 'NODE_MAC'
                union macaddr router_mac = NODE_MAC;
                                           ^
In file included from /var/lib/cilium/bpf/bpf_lxc.c:26:
/var/lib/cilium/bpf/lib/lxc.h:67:29: error: use of undeclared identifier 'NODE_MAC'
        union macaddr router_mac = NODE_MAC;
                                   ^
/var/lib/cilium/bpf/bpf_lxc.c:159:10: error: implicit declaration of function 'lookup_ip6_remote_endpoint' [-Werror,-Wimplicit-function-declaration]
                info = lookup_ip6_remote_endpoint(&orig_dip);
                       ^
/var/lib/cilium/bpf/bpf_lxc.c:159:10: note: did you mean 'lookup_ip6_endpoint'?
/var/lib/cilium/bpf/lib/eps.h:13:1: note: 'lookup_ip6_endpoint' declared here
lookup_ip6_endpoint(struct ipv6hdr *ip6)
^
/var/lib/cilium/bpf/bpf_lxc.c:159:8: error: incompatible integer to pointer conversion assigning to 'struct remote_endpoint_info *' from 'int' [-Werror,-Wint-conversion]
                info = lookup_ip6_remote_endpoint(&orig_dip);
                     ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/var/lib/cilium/bpf/bpf_lxc.c:529:10: error: implicit declaration of function 'lookup_ip4_remote_endpoint' [-Werror,-Wimplicit-function-declaration]
                info = lookup_ip4_remote_endpoint(orig_dip);
                       ^
/var/lib/cilium/bpf/bpf_lxc.c:529:10: note: did you mean 'lookup_ip4_endpoint'?
/var/lib/cilium/bpf/lib/eps.h:35:1: note: 'lookup_ip4_endpoint' declared here
lookup_ip4_endpoint(const struct iphdr *ip4)
^
/var/lib/cilium/bpf/bpf_lxc.c:529:8: error: incompatible integer to pointer conversion assigning to 'struct remote_endpoint_info *' from 'int' [-Werror,-Wint-conversion]
                info = lookup_ip4_remote_endpoint(orig_dip);
                     ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/var/lib/cilium/bpf/bpf_lxc.c:1027:10: error: implicit declaration of function 'lookup_ip6_remote_endpoint' [-Werror,-Wimplicit-function-declaration]
                info = lookup_ip6_remote_endpoint(src);
                       ^
/var/lib/cilium/bpf/bpf_lxc.c:1027:8: error: incompatible integer to pointer conversion assigning to 'struct remote_endpoint_info *' from 'int' [-Werror,-Wint-conversion]
                info = lookup_ip6_remote_endpoint(src);
                     ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/var/lib/cilium/bpf/bpf_lxc.c:1256:10: error: implicit declaration of function 'lookup_ip4_remote_endpoint' [-Werror,-Wimplicit-function-declaration]
                info = lookup_ip4_remote_endpoint(ip4->saddr);
                       ^
/var/lib/cilium/bpf/bpf_lxc.c:1256:8: error: incompatible integer to pointer conversion assigning to 'struct remote_endpoint_info *' from 'int' [-Werror,-Wint-conversion]
                info = lookup_ip4_remote_endpoint(ip4->saddr);
                     ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
11 errors generated.
```

Fixes cilium#10630
Fixes cilium#12005

Signed-off-by: Chris Tarazi <chris@isovalent.com>
@christarazi christarazi force-pushed the pr/christarazi/fix-endpoint-manual-regenerate branch from d5d54e3 to d138143 Compare July 15, 2020 00:25
@pchaigno
Copy link
Member

@christarazi Did you check if this fix also works for the host endpoint? I can have a look as a follow up if not.

@pchaigno
Copy link
Member

test-me-please

Copy link
Member

@borkmann borkmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm as far as I can see

@maintainer-s-little-helper maintainer-s-little-helper bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Jul 15, 2020
@qmonnet qmonnet merged commit 21783d5 into cilium:master Jul 15, 2020
@brb brb mentioned this pull request Jul 15, 2020
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Needs backport from master to Backport pending to v1.8 in 1.8.2 Jul 15, 2020
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Needs backport from master to Backport pending to v1.8 in 1.8.2 Jul 15, 2020
@christarazi christarazi deleted the pr/christarazi/fix-endpoint-manual-regenerate branch July 15, 2020 16:55
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Backport pending to v1.8 to Backport done to v1.8 in 1.8.2 Jul 20, 2020
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Needs backport from master to Backport pending to v1.7 in 1.7.7 Jul 21, 2020
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Backport pending to v1.7 to Backport done to v1.7 in 1.7.7 Jul 22, 2020
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Needs backport from master to Backport pending to v1.6 in 1.6.11 Jul 29, 2020
@maintainer-s-little-helper maintainer-s-little-helper bot moved this from Backport pending to v1.6 to Backport done to v1.6 in 1.6.11 Jul 30, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug This is a bug in the Cilium logic. kind/regression This functionality worked fine before, but was broken in a newer release of Cilium. ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/bug This PR fixes an issue in a previous release of Cilium. sig/datapath Impacts bpf/ or low-level forwarding details, including map management and monitor messages. sig/loader Impacts the loading of BPF programs into the kernel.
Projects
No open projects
1.6.11
Backport done to v1.6
1.7.7
Backport done to v1.7
1.8.2
Backport done to v1.8
Development

Successfully merging this pull request may close these issues.

cilium endpoint regenerate broken
9 participants