-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
v1.11 backports 2022-02-14 #18800
v1.11 backports 2022-02-14 #18800
Conversation
[ upstream commit 4f25254 ] These new functions will be used in subsequent commits to check if iptables-based masquerading is enabled. Signed-off-by: Paul Chaignon <paul@cilium.io> Signed-off-by: Gilberto Bertin <gilberto@isovalent.com>
[ upstream commit 60245fe ] In commit 49cb220 ("iptables: Don't masquerade traffic to cluster nodes"), we introduced an ipset to track the set of node IP addresses and avoid masquerading traffic to those IP in native routing mode. This is however only needed if we're using iptables-based masquerading (vs. BPF-based). When that's not the case, there's no reason to track node IP addresses in this way, or even to require the ipset binary as a dependency. Fixes: 49cb220 ("iptables: Don't masquerade traffic to cluster nodes") Reported-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Paul Chaignon <paul@cilium.io> Signed-off-by: Gilberto Bertin <gilberto@isovalent.com>
[ upstream commit 99c4540 ] In several places in the iptables code, we use an OLD_ string as a prefix for iptables rules. This commit refactors the code a little to ensure we use the same constant everywhere. Suggested-by: Gilberto Bertin <gilberto@isovalent.com> Signed-off-by: Paul Chaignon <paul@cilium.io> Signed-off-by: Gilberto Bertin <gilberto@isovalent.com>
[ upstream commit 76551df ] Remove old ipsets before installing new iptables rules, similarly to how we remove old iptables rules and chains. The ipsets will then only be created if we're using iptables-based masquerading. This ensures we don't keep the ipsets when disabling iptables-based masquerading (e.g., when switching to BPF-based masquerading). Signed-off-by: Paul Chaignon <paul@cilium.io> Signed-off-by: Gilberto Bertin <gilberto@isovalent.com>
[ upstream commit bd2ac6c ] The way our agent startup is structured can lead to a data race on the EnableBPFMasquerade flag. In the main thread, we initialize k8s watchers via goroutines which may end up using flag EnableBPFMasquerade. We then update the value of that flag. So we can have a race between the main thread and a k8s watcher goroutine. The full stack traces for the data race are shown below. Moving the k8s watcher initialization after the flags are fully updated removes this data race. WARNING: DATA RACE Write at 0x000005803072 by main goroutine: github.com/cilium/cilium/daemon/cmd.NewDaemon() /go/src/github.com/cilium/cilium/daemon/cmd/daemon.go:710 +0x5587 github.com/cilium/cilium/daemon/cmd.runDaemon() /go/src/github.com/cilium/cilium/daemon/cmd/daemon_main.go:1652 +0xa98 github.com/cilium/cilium/daemon/cmd.glob..func1() /go/src/github.com/cilium/cilium/daemon/cmd/daemon_main.go:126 +0x494 github.com/spf13/cobra.(*Command).execute() /go/src/github.com/cilium/cilium/vendor/github.com/spf13/cobra/command.go:860 +0xaf3 github.com/spf13/cobra.(*Command).ExecuteC() /go/src/github.com/cilium/cilium/vendor/github.com/spf13/cobra/command.go:974 +0x5da github.com/spf13/cobra.(*Command).Execute() /go/src/github.com/cilium/cilium/vendor/github.com/spf13/cobra/command.go:902 +0x65 github.com/cilium/cilium/daemon/cmd.Execute() /go/src/github.com/cilium/cilium/daemon/cmd/daemon_main.go:140 +0x4e main.main() /go/src/github.com/cilium/cilium/daemon/main.go:16 +0x24 runtime.main() /usr/local/go/src/runtime/proc.go:255 +0x226 github.com/cilium/ebpf.init() /go/src/github.com/cilium/cilium/vendor/github.com/cilium/ebpf/syscalls.go:220 +0x868 github.com/cilium/ebpf.init() /go/src/github.com/cilium/cilium/vendor/github.com/cilium/ebpf/syscalls.go:188 +0x808 github.com/cilium/ebpf.init() /go/src/github.com/cilium/cilium/vendor/github.com/cilium/ebpf/syscalls.go:166 +0x7a8 github.com/cilium/ebpf.init() /go/src/github.com/cilium/cilium/vendor/github.com/cilium/ebpf/syscalls.go:148 +0x748 github.com/cilium/ebpf.init() /go/src/github.com/cilium/cilium/vendor/github.com/cilium/ebpf/syscalls.go:108 +0x6e8 github.com/cilium/ebpf.init() /go/src/github.com/cilium/cilium/vendor/github.com/cilium/ebpf/syscalls.go:92 +0x688 github.com/cilium/ebpf.init() /go/src/github.com/cilium/cilium/vendor/github.com/cilium/ebpf/syscalls.go:76 +0x628 github.com/cilium/ebpf.init() /go/src/github.com/cilium/cilium/vendor/github.com/cilium/ebpf/syscalls.go:59 +0x5c8 github.com/cilium/ebpf.init() /go/src/github.com/cilium/cilium/vendor/github.com/cilium/ebpf/syscalls.go:41 +0x568 runtime.doInit() /usr/local/go/src/runtime/proc.go:6498 +0x122 github.com/cilium/ebpf/internal/btf.init() /go/src/github.com/cilium/cilium/vendor/github.com/cilium/ebpf/internal/btf/btf.go:820 +0x1bc Previous read at 0x000005803072 by goroutine 179: github.com/cilium/cilium/pkg/option.(*DaemonConfig).IptablesMasqueradingIPv4Enabled() /go/src/github.com/cilium/cilium/pkg/option/config.go:2182 +0x17a9 github.com/cilium/cilium/pkg/option.(*DaemonConfig).IptablesMasqueradingEnabled() /go/src/github.com/cilium/cilium/pkg/option/config.go:2194 +0x17dc github.com/cilium/cilium/pkg/node/manager.(*Manager).NodeUpdated() /go/src/github.com/cilium/cilium/pkg/node/manager/manager.go:414 +0x1778 github.com/cilium/cilium/pkg/k8s/watchers.(*K8sWatcher).ciliumNodeInit.func1() /go/src/github.com/cilium/cilium/pkg/k8s/watchers/cilium_node.go:43 +0x374 k8s.io/client-go/tools/cache.ResourceEventHandlerFuncs.OnAdd() /go/src/github.com/cilium/cilium/vendor/k8s.io/client-go/tools/cache/controller.go:231 +0x79 k8s.io/client-go/tools/cache.(*ResourceEventHandlerFuncs).OnAdd() <autogenerated>:1 +0x33 github.com/cilium/cilium/pkg/k8s/informer.NewInformerWithStore.func1() /go/src/github.com/cilium/cilium/pkg/k8s/informer/informer.go:108 +0x2d8 k8s.io/client-go/tools/cache.(*DeltaFIFO).Pop() /go/src/github.com/cilium/cilium/vendor/k8s.io/client-go/tools/cache/delta_fifo.go:554 +0x8c9 k8s.io/client-go/tools/cache.(*controller).processLoop() /go/src/github.com/cilium/cilium/vendor/k8s.io/client-go/tools/cache/controller.go:183 +0x61 k8s.io/client-go/tools/cache.(*controller).processLoop-fm() /go/src/github.com/cilium/cilium/vendor/k8s.io/client-go/tools/cache/controller.go:181 +0x39 k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1() /go/src/github.com/cilium/cilium/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:155 +0x81 k8s.io/apimachinery/pkg/util/wait.BackoffUntil() /go/src/github.com/cilium/cilium/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:156 +0xce k8s.io/apimachinery/pkg/util/wait.JitterUntil() /go/src/github.com/cilium/cilium/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133 +0x104 k8s.io/apimachinery/pkg/util/wait.Until() /go/src/github.com/cilium/cilium/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:90 +0x616 k8s.io/client-go/tools/cache.(*controller).Run() /go/src/github.com/cilium/cilium/vendor/k8s.io/client-go/tools/cache/controller.go:154 +0x5cd github.com/cilium/cilium/pkg/k8s/watchers.(*K8sWatcher).ciliumNodeInit·dwrap·10() /go/src/github.com/cilium/cilium/pkg/k8s/watchers/cilium_node.go:96 +0x59 Goroutine 179 (running) created at: github.com/cilium/cilium/pkg/k8s/watchers.(*K8sWatcher).ciliumNodeInit() /go/src/github.com/cilium/cilium/pkg/k8s/watchers/cilium_node.go:96 +0x8d github.com/cilium/cilium/pkg/k8s/watchers.(*K8sWatcher).EnableK8sWatcher·dwrap·21() /go/src/github.com/cilium/cilium/pkg/k8s/watchers/watcher.go:438 +0x58 Signed-off-by: Paul Chaignon <paul@cilium.io> Signed-off-by: Gilberto Bertin <gilberto@isovalent.com>
[ upstream commit 1e3174b ] This patch fixes bpffs migrations meant to occur at runtime. The tc/globals/ part of the path was missing, and the bpffs root was being scanned instead, causing migrations not to run. This would lead to endpoints continuously failing to regenerate at runtime when map definitions in the ELF differ from their pinned counterparts, e.g. during an agent upgrade. For example, when resizing CALLS_MAP and trying to replace a program, the old map will be loaded from bpffs, while the program expects to see a map of a different capacity: ``` ~ tc filter replace dev cilium-probe ingress bpf da obj bpf/bpf_lxc.o sec from-container libbpf: couldn't reuse pinned map at '/sys/fs/bpf/tc//globals/test_cilium_calls_65535': parameter mismatch libbpf: map 'test_cilium_calls_65535': error reusing pinned map libbpf: map 'test_cilium_calls_65535': failed to create: Invalid argument(-22) libbpf: failed to load object 'bpf/bpf_lxc.o' Unable to load program ``` The CLI equivalent executed as part of init.sh does work as expected, as the full path including tc/globals/ is specified there. Signed-off-by: Timo Beckers <timo@isovalent.com> Signed-off-by: Gilberto Bertin <gilberto@isovalent.com>
[ upstream commit 9fb1668 ] To prevent unpinning tail call maps after replacing only one half of an interface's programs, e.g. from-netdev and to-netdev, allow the caller to defer finalizer operations until a batch of interfaces have been replaced. This avoids short windows of packet drops due to missed tail calls. Fixes #16561 Signed-off-by: Timo Beckers <timo@isovalent.com> Signed-off-by: Gilberto Bertin <gilberto@isovalent.com>
[ upstream commit f6a4104 ] This patch updates the Cilium logic for handling remote node identity updates to ensure that when Cilium's '--enable-remote-node-identity' flag is configured, each Cilium node will consistently consider all other nodes as having the "remote-node" identity. This fixes an issue where users reported policy drops from remote nodes -> pods, even though the policy appeared to allow this. The issue was limited to kvstore configurations of Cilium, and does not affect configurations where CRDs are used for sharing information within the cluster. For background: When Cilium starts up, it locally scans for IP addresses associated with the node, and updates its own IPcache to associate those IPs with the "host" identity. Additionally, it will also publish this information to other nodes so that they can make policy decisions regarding traffic coming from IPs attached to nodes in the cluster. Before commit 7bf60a5 ("nodediscovery: Fix local host identity propagation"), Cilium would propagate the identity "remote-node" as part of these updates to other nodes. After that commit, it would propagate the identity "host" as part of these updates to other nodes. When receiving these updates, Cilium would trust the identity directly and push IP->Identity mappings like this into the datapath, regardless of whether the '--enable-remote-node-identity' setting was configured or not. As such, when the above commit changed the behaviour, it triggered a change in policy handling behaviour. The '--enable-remote-node-identity' flag was initially introduced to allow the security domain of remote nodes in the cluster to be considered differently vs. the local host. This can be important as Kubernetes defines that the host should always have access to pods on the node, so if all nodes are considered the same as the "host", this can represent a larger open policy surface for pods than necessary in a zero trust environment. Given the potential security implications of this setting, at the time that it was introduced, we introduced mitigations both in the control plane and in the data plane. Whenever the datapath is configured with --enable-remote-node-identity=true, it will also distrust any reports that peer node identities are "host", even if the ipcache itself reports this. In this situation, the datapath does not accept that the traffic is from the "host". Rather, it demotes the identity of the traffic to considering it as part of the "world". The motivation behind this is that allowing "world" is a very permissive policy, so if the user is OK with allowing "world" traffic then it is likely that they will be OK with accepting any traffic like this which purports to be coming from a "host" in the cluster. As a result of the above conditions, users running in kvstore mode who upgraded from earlier Cilium versions to 1.9.12, 1.10.6 or 1.11.0 (and other releases up until this patch is released as part of an official version) could observe traffic drops for traffic from nodes in the cluster towards pods on other nodes in the cluster. Hubble would report that the traffic is coming "from the world" (identity=2), despite having a source address of another node in the cluster. We considered multiple approaches to solving this issue: A) Revert the commit that introduced the issue (see GH-18763). * Evidently, by this point there are multiple other codepaths relying on the internal storage of the local node's identity as Host, which would make this more difficult. B) Ensure that the kvstore propagation code propagates the current node's identity as "remote-node", as other nodes may expect. * In cases of versions with mixed knowledge of remote-node-identity (for instance during upgrade), then newer nodes could end up propagating the new identity, but old nodes would not understand how to calculate policy with this identity in consideration, so this could result in similar sorts of policy drops during upgrade. C) In the case when --enable-remote-node-identity=true, ensure that when Cilium receives updates from peer nodes, it demotes the "host" identity reported by peer nodes down to "remote-node" for the associated IP addresses. This way, the impact of the flag is limited to the way that the current node configures itself only. If the datapath is then informed (via ipcache) that thes IPs correspond to "remote-node", then the policy will be correctly assessed. This commit takes approach (C). Fixes: 7bf60a5 ("nodediscovery: Fix local host identity propagation") Co-authored-by: André Martins <andre@cilium.io> Signed-off-by: Joe Stringer <joe@cilium.io> Signed-off-by: Gilberto Bertin <gilberto@isovalent.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My changes look good. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Backport of #18777 LGTM, thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My changes look good, thanks!
d668a4a
to
3a236f8
Compare
/test-backport-1.11 Job 'Cilium-PR-K8s-GKE' failed and has not been observed before, so may be related to your PR: Click to show.Test Name
Failure Output
If it is a flake, comment |
/test-gke |
IptablesMasqueradingIPv4Enabled()
and inpkg/node/manager/manager.go
), please double checklog.WithField(logfields.Interface, iface).WithError(err).Fatal("Load encryption network failed")
->log.WithField(logfields.Interface, iface).Fatal("Load encryption network failed")
), please double checkPRs skipped due conflicts:
edit:
I dropped also #18151 as that PR is being reverted
Once this PR is merged, you can update the PR labels via:
or with