Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix implicit conversion warning in DSR with GENEVE #25299

Merged
merged 1 commit into from May 15, 2023

Conversation

ysksuzuki
Copy link
Member

@ysksuzuki ysksuzuki commented May 6, 2023

Currently, ETH_HLEN is explicitly defined as __u32 if there's any L2-less device, and it causes implicit conversion loses integer precision warning when we calculate encap_len using ETH_HLEN. This commit fixes the warning by defining ETH_HELN as __u16.

Please ensure your pull request adheres to the following guidelines:

  • For first time contributors, read Submitting a pull request
  • All code is covered by unit and/or runtime tests where feasible.
  • All commits contain a well written commit description including a title,
    description and a Fixes: #XXX line if the commit addresses a particular
    GitHub issue.
  • If your commit description contains a Fixes: <commit-id> tag, then
    please add the commit author[s] as reviewer[s] to this issue.
  • All commits are signed off. See the section Developer’s Certificate of Origin
  • Provide a title or release-note blurb suitable for the release notes.
  • Are you a user of Cilium? Please add yourself to the Users doc
  • Thanks for contributing!

Fixes: #25238

@ysksuzuki ysksuzuki requested a review from a team as a code owner May 6, 2023 13:22
@maintainer-s-little-helper maintainer-s-little-helper bot added the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label May 6, 2023
@ysksuzuki
Copy link
Member Author

/test

@ysksuzuki
Copy link
Member Author

/test-runtime

@julianwiedmann
Copy link
Member

I suppose the alternative would be to have the agent emit ETH_HLEN using defineUint16() instead?

@ysksuzuki
Copy link
Member Author

I suppose the alternative would be to have the agent emit ETH_HLEN using defineUint16() instead?

I thought it needed to be uint32 for the ELF templating, but it isn't the case anymore?

@julianwiedmann
Copy link
Member

I suppose the alternative would be to have the agent emit ETH_HLEN using defineUint16() instead?

I thought it needed to be uint32 for the ELF templating, but it isn't the case anymore?

It's all a bit magic to me 😄. Looks like #25195 just improved this area. But don't ask me why it required the opts["ETH_HLEN"] = uint64(0) change...

Either way - it would be cool if we could handle this sort of value clamping in one location (ie. the agent), instead of potentially duplicating it across multiple locations in the datapath.

@dylandreimerink
Copy link
Member

Hmm, this is, lets call it interesting. So for devices with L2, ETH_HLEN translates to just 14 without type info, that is the default value, so it can be used with any combination of types since its a pre-processor macro. But if at least 1 L2-less devices is detected then we define ETH_HLEN as a u32 and make it ELF re-writable so we can zero it out when loading the program for the L2-less device.

There are also plenty of location where it is used as in int in the datapath, so making it a u16 by default might cause more trouble then its worth. However, it might make sense to just always have ETH_HLEN be templated at u32 so we do not have conditional warnings/errors that only show up in very specific situations.

@ysksuzuki
Copy link
Member Author

Tested defineUint16() locally, and it seems that ETH_HLEN has been replaced with 0 on the l2-less device (cilium_wg0).
(Reproduced with encrypt-node=true and enable-wireguar=true)

diff --git a/bpf/bpf_host.c b/bpf/bpf_host.c
index 06c1614807..375d233e5d 100644
--- a/bpf/bpf_host.c
+++ b/bpf/bpf_host.c
@@ -1234,6 +1234,9 @@ int cil_from_netdev(struct __ctx_buff *ctx)
 #endif
 #endif

+#ifdef DEBUG
+    printk("ETH_HLEN = %u\n", ETH_HLEN);
+#endif
        /* Filter allowed vlan id's and pass them back to kernel.
         * We will see the packet again in from-netdev@eth0.vlanXXX.
         */
diff --git a/pkg/datapath/linux/config/config.go b/pkg/datapath/linux/config/config.go
index 8aed3d379b..c199cf9eaf 100644
--- a/pkg/datapath/linux/config/config.go
+++ b/pkg/datapath/linux/config/config.go
@@ -816,7 +816,7 @@ func (h *HeaderfileWriter) writeStaticData(fw io.Writer, e datapath.EndpointConf
                // Use templating for ETH_HLEN only if there is any L2-less device
                if !mac.HaveMACAddrs(option.Config.GetDevices()) {
                        // L2 hdr len (for L2-less devices it will be replaced with "0")
-                       fmt.Fprint(fw, defineUint32("ETH_HLEN", mac.EthHdrLen))
+                       fmt.Fprint(fw, defineUint16("ETH_HLEN", mac.EthHdrLen))
                }
        } else {
                // We want to ensure that the template BPF program always has "LXC_IP"

bpftool prog tracelog

    kworker/10:1-3065749 [010] d.s1. 521767.985961: bpf_trace_printk: ETH_HLEN = 0
           <...>-3068965 [011] d.s1. 521778.140069: bpf_trace_printk: ETH_HLEN = 0
     kworker/1:2-3069620 [001] d.s1. 521808.141643: bpf_trace_printk: ETH_HLEN = 0
           <...>-3067659 [003] d.s1. 521837.875402: bpf_trace_printk: ETH_HLEN = 0
           <...>-3060717 [005] d.s1. 521867.876827: bpf_trace_printk: ETH_HLEN = 0
           <...>-3070728 [006] d.s1. 521897.878277: bpf_trace_printk: ETH_HLEN = 0
           <...>-3072570 [007] d.s1. 521898.720166: bpf_trace_printk: ETH_HLEN = 0

@dylandreimerink
Copy link
Member

Tested defineUint16() locally, and it seems that ETH_HLEN has been replaced with 0 on the l2-less device (cilium_wg0).

And this did not cause type comparison errors/warnings in bpf_lxc.c and bpf_host.c?

@ysksuzuki
Copy link
Member Author

ysksuzuki commented May 8, 2023

And this did not cause type comparison errors/warnings in bpf_lxc.c and bpf_host.c?

ETH_HLEN using defineUint16() didn't cause the type comparison errors/warnings and it's working normally, as far as I checked it in my env.

# All agents are healthy
$ kubectl -n kube-system get po
NAME                                     READY   STATUS      RESTARTS       AGE
cilium-8vv2g                             1/1     Running     0              9m2s
cilium-kb5r5                             1/1     Running     0              9m
cilium-operator-86d47cfb64-5x2hf         1/1     Running     0              3d21h
cilium-operator-86d47cfb64-zxzcw         1/1     Running     0              3d21h
cilium-rphcm                             1/1     Running     0              9m3s
cilium-x9pk7                             1/1     Running     0              9m3s
cilium-zf8vz                             1/1     Running     0              9m3s
cluster-dns-77bdb56774-lj87g             1/1     Running     0              6d3h
cluster-dns-77bdb56774-zxc8r             1/1     Running     0              6d3h
hubble-generate-certs-5d5d11e923-q9hn8   0/1     Completed   0              2m32s
hubble-relay-586c7cd94f-7cv7m            1/1     Running     0              3d21h
node-dns-btjst                           3/3     Running     0              6d3h
node-dns-f7tp5                           3/3     Running     0              6d3h
node-dns-fwhd6                           3/3     Running     0              6d3h
node-dns-t52bs                           3/3     Running     0              6d3h
node-dns-wcpjt                           3/3     Running     0              6d3h

# There's no implicit conversion warning in the agent logs
$ stern -n kube-system cilium | grep "implicit conversion"
+ cilium-x9pk7 › cilium-agent
+ cilium-8vv2g › cilium-agent
+ cilium-rphcm › cilium-agent
+ cilium-operator-86d47cfb64-zxzcw › cilium-operator
+ cilium-kb5r5 › cilium-agent
+ cilium-zf8vz › cilium-agent
+ cilium-operator-86d47cfb64-5x2hf › cilium-operator

# All BPF programs are properly loaded
$ kubectl -n kube-system exec cilium-x9pk7 -- bpftool net list
Defaulted container "cilium-agent" out of: cilium-agent, config (init), mount-bpf-fs (init), clean-cilium-state (init), install-cni-binaries (init)
xdp:

tc:
eth0(2) clsact/ingress cil_from_netdev-eth0 id 80535
eth0(2) clsact/egress cil_to_netdev-eth0 id 80558
eth1(3) clsact/ingress cil_from_netdev-eth1 id 80561
eth1(3) clsact/egress cil_to_netdev-eth1 id 80583
cilium_net(5) clsact/ingress cil_to_host-cilium_net id 80526
cilium_host(6) clsact/ingress cil_to_host-cilium_host id 80506
cilium_host(6) clsact/egress cil_from_host-cilium_host id 80519
veth949416046cb(7) clsact/ingress cil_from_container-veth949416046cb id 80468
veth949416046cb(7) clsact/egress cil_to_container-veth949416046cb id 80461
vetha042c2e9a30(11) clsact/ingress cil_from_container-vetha042c2e9a30 id 80487
vetha042c2e9a30(11) clsact/egress cil_to_container-vetha042c2e9a30 id 80476
veth1d07c434499(14) clsact/ingress cil_from_container-veth1d07c434499 id 80458
veth1d07c434499(14) clsact/egress cil_to_container-veth1d07c434499 id 80474
cilium_wg0(17) clsact/ingress cil_from_netdev-cilium_wg0 id 80596
cilium_geneve(24) clsact/ingress cil_from_overlay-cilium_geneve id 80452
cilium_geneve(24) clsact/egress cil_to_overlay-cilium_geneve id 80449

flow_dissector:

# All endpoints are ready
$ kubectl -n kube-system exec cilium-x9pk7 -- cilium endpoint list
Defaulted container "cilium-agent" out of: cilium-agent, config (init), mount-bpf-fs (init), clean-cilium-state (init), install-cni-binaries (init)
ENDPOINT   POLICY (ingress)   POLICY (egress)   IDENTITY   LABELS (source:key[=value])                                                  IPv6                        IPv4         STATUS
           ENFORCEMENT        ENFORCEMENT
542        Disabled           Disabled          16100      k8s:cke.cybozu.com/appname=cluster-dns                                       fe80::9c02:86ff:fe86:377b   10.64.0.32   ready
                                                           k8s:io.cilium.k8s.namespace.labels.kubernetes.io/metadata.name=kube-system
                                                           k8s:io.cilium.k8s.policy.cluster=default
                                                           k8s:io.cilium.k8s.policy.serviceaccount=cke-cluster-dns
                                                           k8s:io.kubernetes.pod.namespace=kube-system
                                                           k8s:k8s-app=coredns
1272       Disabled           Disabled          17068      k8s:app.kubernetes.io/name=hubble-relay                                      fe80::1c6d:fbff:fe76:2798   10.64.0.37   ready
                                                           k8s:app.kubernetes.io/part-of=cilium
                                                           k8s:io.cilium.k8s.namespace.labels.kubernetes.io/metadata.name=kube-system
                                                           k8s:io.cilium.k8s.policy.cluster=default
                                                           k8s:io.cilium.k8s.policy.serviceaccount=hubble-relay
                                                           k8s:io.kubernetes.pod.namespace=kube-system
                                                           k8s:k8s-app=hubble-relay
2328       Disabled           Disabled          1          k8s:cke.cybozu.com/index-in-rack=5                                                                                    ready
                                                           k8s:cke.cybozu.com/master=true
                                                           k8s:cke.cybozu.com/rack=2
                                                           k8s:cke.cybozu.com/role=cs
                                                           k8s:sabakan.cke.cybozu.com/datacenter=dc1
                                                           k8s:sabakan.cke.cybozu.com/machine-type=qemu
                                                           k8s:sabakan.cke.cybozu.com/product=vm
                                                           k8s:topology.kubernetes.io/zone=rack2
                                                           reserved:host
4072       Disabled           Disabled          32616      k8s:app.kubernetes.io/name=suzuki-test                                       fe80::543e:1eff:fe8e:1f9c   10.64.0.36   ready
                                                           k8s:io.cilium.k8s.namespace.labels.kubernetes.io/metadata.name=dctest
                                                           k8s:io.cilium.k8s.policy.cluster=default
                                                           k8s:io.cilium.k8s.policy.serviceaccount=default
                                                           k8s:io.kubernetes.pod.namespace=dctest

# DSR with GENEVE enabled with Wireguard (with l2-less cilium_wg0 device)
$ cilium config view
bpf-lb-acceleration                            disabled
bpf-lb-algorithm                               maglev
bpf-lb-dsr-dispatch                            geneve
bpf-lb-dsr-l4-xlate                            backend
bpf-lb-external-clusterip                      false
bpf-lb-map-max                                 65536
bpf-lb-mode                                    dsr
bpf-lb-sock                                    true
bpf-lb-sock-hostns-only                        true
bpf-map-dynamic-size-ratio                     0.0025
bpf-policy-map-max                             65536
bpf-root                                       /sys/fs/bpf
cgroup-root                                    /sys/fs/cgroup
enable-wireguard                               true
encrypt-node                                   true

@dylandreimerink
Copy link
Member

I suppose the alternative would be to have the agent emit ETH_HLEN using defineUint16() instead?

ETH_HLEN using defineUint16() didn't cause the type comparison errors/warnings and it's working normally

@julianwiedmann so I guess in that case defineUint16() is preferred so we define datatype once and don't have to repeat it everywhere. I think clang does not mind assigning smaller types to bigger types only visa-versa.

I had a quick look, but always having ETH_HLEN in global data would require way more changes, so I will drop that suggestion.

@julianwiedmann if you agree, I think that should be the path forward for this PR

@julianwiedmann
Copy link
Member

I suppose the alternative would be to have the agent emit ETH_HLEN using defineUint16() instead?

ETH_HLEN using defineUint16() didn't cause the type comparison errors/warnings and it's working normally

@julianwiedmann so I guess in that case defineUint16() is preferred so we define datatype once and don't have to repeat it everywhere. I think clang does not mind assigning smaller types to bigger types only visa-versa.

I had a quick look, but always having ETH_HLEN in global data would require way more changes, so I will drop that suggestion.

@julianwiedmann if you agree, I think that should be the path forward for this PR

Yep feels like the smoothest approach. It's not like we would actually support ETH_HLEN > UINT16_MAX in the datapath 😁.

@ysksuzuki ysksuzuki requested a review from a team as a code owner May 13, 2023 12:53
@ysksuzuki
Copy link
Member Author

/test

1 similar comment
@ysksuzuki
Copy link
Member Author

/test

@ysksuzuki
Copy link
Member Author

The Jenkins job status isn't reported

Currently, ETH_HLEN is explicitly defined as __u32 if there's any L2-less
device, and it causes implicit conversion loses integer precision warning
when we calculate encap_len using ETH_HLEN. This commit fixes the warning
by defining ETH_HELN as __u16.

Signed-off-by: Yusuke Suzuki <yusuke-suzuki@cybozu.co.jp>
@ysksuzuki
Copy link
Member Author

/test

@dylandreimerink dylandreimerink added the release-note/misc This PR makes changes that have no direct user impact. label May 15, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot removed the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label May 15, 2023
@dylandreimerink
Copy link
Member

/test

@maintainer-s-little-helper maintainer-s-little-helper bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label May 15, 2023
@aditighag aditighag merged commit 830d0fb into cilium:main May 15, 2023
57 of 58 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/misc This PR makes changes that have no direct user impact.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Unable to use DSR mode with Geneve
4 participants