Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v1.11 backports 2022-02-14 #18800

Merged
merged 9 commits into from
Feb 16, 2022
Merged

v1.11 backports 2022-02-14 #18800

merged 9 commits into from
Feb 16, 2022

Conversation

jibi
Copy link
Member

@jibi jibi commented Feb 14, 2022

PRs skipped due conflicts:

edit:
I dropped also #18151 as that PR is being reverted

Once this PR is merged, you can update the PR labels via:

$ for pr in 17871 18770 17744 18777 18661; do contrib/backporting/set-labels.py $pr done 1.11; done

or with

$ make add-label branch=v1.11 issues=17871,18770,17744,18777,18661

pchaigno and others added 8 commits February 14, 2022 16:32
[ upstream commit 4f25254 ]

These new functions will be used in subsequent commits to check if
iptables-based masquerading is enabled.

Signed-off-by: Paul Chaignon <paul@cilium.io>
Signed-off-by: Gilberto Bertin <gilberto@isovalent.com>
[ upstream commit 60245fe ]

In commit 49cb220 ("iptables: Don't masquerade traffic to cluster
nodes"), we introduced an ipset to track the set of node IP addresses
and avoid masquerading traffic to those IP in native routing mode.

This is however only needed if we're using iptables-based masquerading
(vs. BPF-based). When that's not the case, there's no reason to track
node IP addresses in this way, or even to require the ipset binary as a
dependency.

Fixes: 49cb220 ("iptables: Don't masquerade traffic to cluster nodes")
Reported-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Paul Chaignon <paul@cilium.io>
Signed-off-by: Gilberto Bertin <gilberto@isovalent.com>
[ upstream commit 99c4540 ]

In several places in the iptables code, we use an OLD_ string as a
prefix for iptables rules. This commit refactors the code a little to
ensure we use the same constant everywhere.

Suggested-by: Gilberto Bertin <gilberto@isovalent.com>
Signed-off-by: Paul Chaignon <paul@cilium.io>
Signed-off-by: Gilberto Bertin <gilberto@isovalent.com>
[ upstream commit 76551df ]

Remove old ipsets before installing new iptables rules, similarly to how
we remove old iptables rules and chains. The ipsets will then only be
created if we're using iptables-based masquerading. This ensures we
don't keep the ipsets when disabling iptables-based masquerading (e.g.,
when switching to BPF-based masquerading).

Signed-off-by: Paul Chaignon <paul@cilium.io>
Signed-off-by: Gilberto Bertin <gilberto@isovalent.com>
[ upstream commit bd2ac6c ]

The way our agent startup is structured can lead to a data race on the
EnableBPFMasquerade flag. In the main thread, we initialize k8s watchers
via goroutines which may end up using flag EnableBPFMasquerade. We then
update the value of that flag. So we can have a race between the main
thread and a k8s watcher goroutine. The full stack traces for the data
race are shown below.

Moving the k8s watcher initialization after the flags are fully updated
removes this data race.

    WARNING: DATA RACE
    Write at 0x000005803072 by main goroutine:
      github.com/cilium/cilium/daemon/cmd.NewDaemon()
          /go/src/github.com/cilium/cilium/daemon/cmd/daemon.go:710 +0x5587
      github.com/cilium/cilium/daemon/cmd.runDaemon()
          /go/src/github.com/cilium/cilium/daemon/cmd/daemon_main.go:1652 +0xa98
      github.com/cilium/cilium/daemon/cmd.glob..func1()
          /go/src/github.com/cilium/cilium/daemon/cmd/daemon_main.go:126 +0x494
      github.com/spf13/cobra.(*Command).execute()
          /go/src/github.com/cilium/cilium/vendor/github.com/spf13/cobra/command.go:860 +0xaf3
      github.com/spf13/cobra.(*Command).ExecuteC()
          /go/src/github.com/cilium/cilium/vendor/github.com/spf13/cobra/command.go:974 +0x5da
      github.com/spf13/cobra.(*Command).Execute()
          /go/src/github.com/cilium/cilium/vendor/github.com/spf13/cobra/command.go:902 +0x65
      github.com/cilium/cilium/daemon/cmd.Execute()
          /go/src/github.com/cilium/cilium/daemon/cmd/daemon_main.go:140 +0x4e
      main.main()
          /go/src/github.com/cilium/cilium/daemon/main.go:16 +0x24
      runtime.main()
          /usr/local/go/src/runtime/proc.go:255 +0x226
      github.com/cilium/ebpf.init()
          /go/src/github.com/cilium/cilium/vendor/github.com/cilium/ebpf/syscalls.go:220 +0x868
      github.com/cilium/ebpf.init()
          /go/src/github.com/cilium/cilium/vendor/github.com/cilium/ebpf/syscalls.go:188 +0x808
      github.com/cilium/ebpf.init()
          /go/src/github.com/cilium/cilium/vendor/github.com/cilium/ebpf/syscalls.go:166 +0x7a8
      github.com/cilium/ebpf.init()
          /go/src/github.com/cilium/cilium/vendor/github.com/cilium/ebpf/syscalls.go:148 +0x748
      github.com/cilium/ebpf.init()
          /go/src/github.com/cilium/cilium/vendor/github.com/cilium/ebpf/syscalls.go:108 +0x6e8
      github.com/cilium/ebpf.init()
          /go/src/github.com/cilium/cilium/vendor/github.com/cilium/ebpf/syscalls.go:92 +0x688
      github.com/cilium/ebpf.init()
          /go/src/github.com/cilium/cilium/vendor/github.com/cilium/ebpf/syscalls.go:76 +0x628
      github.com/cilium/ebpf.init()
          /go/src/github.com/cilium/cilium/vendor/github.com/cilium/ebpf/syscalls.go:59 +0x5c8
      github.com/cilium/ebpf.init()
          /go/src/github.com/cilium/cilium/vendor/github.com/cilium/ebpf/syscalls.go:41 +0x568
      runtime.doInit()
          /usr/local/go/src/runtime/proc.go:6498 +0x122
      github.com/cilium/ebpf/internal/btf.init()
          /go/src/github.com/cilium/cilium/vendor/github.com/cilium/ebpf/internal/btf/btf.go:820 +0x1bc

    Previous read at 0x000005803072 by goroutine 179:
      github.com/cilium/cilium/pkg/option.(*DaemonConfig).IptablesMasqueradingIPv4Enabled()
          /go/src/github.com/cilium/cilium/pkg/option/config.go:2182 +0x17a9
      github.com/cilium/cilium/pkg/option.(*DaemonConfig).IptablesMasqueradingEnabled()
          /go/src/github.com/cilium/cilium/pkg/option/config.go:2194 +0x17dc
      github.com/cilium/cilium/pkg/node/manager.(*Manager).NodeUpdated()
          /go/src/github.com/cilium/cilium/pkg/node/manager/manager.go:414 +0x1778
      github.com/cilium/cilium/pkg/k8s/watchers.(*K8sWatcher).ciliumNodeInit.func1()
          /go/src/github.com/cilium/cilium/pkg/k8s/watchers/cilium_node.go:43 +0x374
      k8s.io/client-go/tools/cache.ResourceEventHandlerFuncs.OnAdd()
          /go/src/github.com/cilium/cilium/vendor/k8s.io/client-go/tools/cache/controller.go:231 +0x79
      k8s.io/client-go/tools/cache.(*ResourceEventHandlerFuncs).OnAdd()
          <autogenerated>:1 +0x33
      github.com/cilium/cilium/pkg/k8s/informer.NewInformerWithStore.func1()
          /go/src/github.com/cilium/cilium/pkg/k8s/informer/informer.go:108 +0x2d8
      k8s.io/client-go/tools/cache.(*DeltaFIFO).Pop()
          /go/src/github.com/cilium/cilium/vendor/k8s.io/client-go/tools/cache/delta_fifo.go:554 +0x8c9
      k8s.io/client-go/tools/cache.(*controller).processLoop()
          /go/src/github.com/cilium/cilium/vendor/k8s.io/client-go/tools/cache/controller.go:183 +0x61
      k8s.io/client-go/tools/cache.(*controller).processLoop-fm()
          /go/src/github.com/cilium/cilium/vendor/k8s.io/client-go/tools/cache/controller.go:181 +0x39
      k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1()
          /go/src/github.com/cilium/cilium/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:155 +0x81
      k8s.io/apimachinery/pkg/util/wait.BackoffUntil()
          /go/src/github.com/cilium/cilium/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:156 +0xce
      k8s.io/apimachinery/pkg/util/wait.JitterUntil()
          /go/src/github.com/cilium/cilium/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133 +0x104
      k8s.io/apimachinery/pkg/util/wait.Until()
          /go/src/github.com/cilium/cilium/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:90 +0x616
      k8s.io/client-go/tools/cache.(*controller).Run()
          /go/src/github.com/cilium/cilium/vendor/k8s.io/client-go/tools/cache/controller.go:154 +0x5cd
      github.com/cilium/cilium/pkg/k8s/watchers.(*K8sWatcher).ciliumNodeInit·dwrap·10()
          /go/src/github.com/cilium/cilium/pkg/k8s/watchers/cilium_node.go:96 +0x59

    Goroutine 179 (running) created at:
      github.com/cilium/cilium/pkg/k8s/watchers.(*K8sWatcher).ciliumNodeInit()
          /go/src/github.com/cilium/cilium/pkg/k8s/watchers/cilium_node.go:96 +0x8d
      github.com/cilium/cilium/pkg/k8s/watchers.(*K8sWatcher).EnableK8sWatcher·dwrap·21()
          /go/src/github.com/cilium/cilium/pkg/k8s/watchers/watcher.go:438 +0x58

Signed-off-by: Paul Chaignon <paul@cilium.io>
Signed-off-by: Gilberto Bertin <gilberto@isovalent.com>
[ upstream commit 1e3174b ]

This patch fixes bpffs migrations meant to occur at runtime. The tc/globals/
part of the path was missing, and the bpffs root was being scanned instead,
causing migrations not to run. This would lead to endpoints continuously
failing to regenerate at runtime when map definitions in the ELF differ from
their pinned counterparts, e.g. during an agent upgrade.

For example, when resizing CALLS_MAP and trying to replace a program, the old
map will be loaded from bpffs, while the program expects to see a map of a
different capacity:

```
~ tc filter replace dev cilium-probe ingress bpf da obj bpf/bpf_lxc.o sec from-container
libbpf: couldn't reuse pinned map at '/sys/fs/bpf/tc//globals/test_cilium_calls_65535': parameter mismatch
libbpf: map 'test_cilium_calls_65535': error reusing pinned map
libbpf: map 'test_cilium_calls_65535': failed to create: Invalid argument(-22)
libbpf: failed to load object 'bpf/bpf_lxc.o'
Unable to load program
```

The CLI equivalent executed as part of init.sh does work as expected, as the
full path including tc/globals/ is specified there.

Signed-off-by: Timo Beckers <timo@isovalent.com>
Signed-off-by: Gilberto Bertin <gilberto@isovalent.com>
[ upstream commit 9fb1668 ]

To prevent unpinning tail call maps after replacing only one half of an
interface's programs, e.g. from-netdev and to-netdev, allow the caller to
defer finalizer operations until a batch of interfaces have been replaced.

This avoids short windows of packet drops due to missed tail calls.

Fixes #16561

Signed-off-by: Timo Beckers <timo@isovalent.com>
Signed-off-by: Gilberto Bertin <gilberto@isovalent.com>
[ upstream commit f6a4104 ]

This patch updates the Cilium logic for handling remote node identity
updates to ensure that when Cilium's '--enable-remote-node-identity'
flag is configured, each Cilium node will consistently consider all
other nodes as having the "remote-node" identity. This fixes an issue
where users reported policy drops from remote nodes -> pods, even though
the policy appeared to allow this. The issue was limited to kvstore
configurations of Cilium, and does not affect configurations where CRDs
are used for sharing information within the cluster.

For background:

When Cilium starts up, it locally scans for IP addresses associated with
the node, and updates its own IPcache to associate those IPs with the
"host" identity. Additionally, it will also publish this information to
other nodes so that they can make policy decisions regarding traffic
coming from IPs attached to nodes in the cluster.

Before commit 7bf60a5 ("nodediscovery: Fix local host identity
propagation"), Cilium would propagate the identity "remote-node" as part
of these updates to other nodes. After that commit, it would propagate
the identity "host" as part of these updates to other nodes.

When receiving these updates, Cilium would trust the identity directly
and push IP->Identity mappings like this into the datapath, regardless
of whether the '--enable-remote-node-identity' setting was configured or
not. As such, when the above commit changed the behaviour, it triggered
a change in policy handling behaviour.

The '--enable-remote-node-identity' flag was initially introduced to
allow the security domain of remote nodes in the cluster to be
considered differently vs. the local host. This can be important as
Kubernetes defines that the host should always have access to pods on
the node, so if all nodes are considered the same as the "host", this
can represent a larger open policy surface for pods than necessary in a
zero trust environment.

Given the potential security implications of this setting, at the time
that it was introduced, we introduced mitigations both in the control
plane and in the data plane. Whenever the datapath is configured with
--enable-remote-node-identity=true, it will also distrust any reports
that peer node identities are "host", even if the ipcache itself reports
this. In this situation, the datapath does not accept that the traffic
is from the "host". Rather, it demotes the identity of the traffic to
considering it as part of the "world". The motivation behind this is
that allowing "world" is a very permissive policy, so if the user is OK
with allowing "world" traffic then it is likely that they will be OK
with accepting any traffic like this which purports to be coming from a
"host" in the cluster.

As a result of the above conditions, users running in kvstore mode who
upgraded from earlier Cilium versions to 1.9.12, 1.10.6 or 1.11.0 (and
other releases up until this patch is released as part of an official
version) could observe traffic drops for traffic from nodes in the
cluster towards pods on other nodes in the cluster. Hubble would report
that the traffic is coming "from the world" (identity=2), despite having
a source address of another node in the cluster.

We considered multiple approaches to solving this issue:
A) Revert the commit that introduced the issue (see GH-18763).
   * Evidently, by this point there are multiple other codepaths relying
     on the internal storage of the local node's identity as Host, which
     would make this more difficult.
B) Ensure that the kvstore propagation code propagates the current node's
   identity as "remote-node", as other nodes may expect.
   * In cases of versions with mixed knowledge of remote-node-identity
     (for instance during upgrade), then newer nodes could end up
     propagating the new identity, but old nodes would not understand
     how to calculate policy with this identity in consideration, so
     this could result in similar sorts of policy drops during upgrade.
C) In the case when --enable-remote-node-identity=true, ensure that when
   Cilium receives updates from peer nodes, it demotes the "host"
   identity reported by peer nodes down to "remote-node" for the
   associated IP addresses. This way, the impact of the flag is limited
   to the way that the current node configures itself only. If the
   datapath is then informed (via ipcache) that thes IPs correspond to
   "remote-node", then the policy will be correctly assessed.

This commit takes approach (C).

Fixes: 7bf60a5 ("nodediscovery: Fix local host identity propagation")
Co-authored-by: André Martins <andre@cilium.io>
Signed-off-by: Joe Stringer <joe@cilium.io>
Signed-off-by: Gilberto Bertin <gilberto@isovalent.com>
@jibi jibi requested a review from a team as a code owner February 14, 2022 15:57
@maintainer-s-little-helper maintainer-s-little-helper bot added backport/1.11 kind/backports This PR provides functionality previously merged into master. labels Feb 14, 2022
Copy link
Member

@pchaigno pchaigno left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My changes look good. Thanks!

Copy link
Member

@joestringer joestringer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Backport of #18777 LGTM, thanks.

Copy link
Contributor

@ti-mo ti-mo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My changes look good, thanks!

[ upstream commit 1851275 ]

This PR fixes incorrect values given in the document for hubble-ui
standalone install.

Fixes: #18650

Signed-off-by: Yusuke Suzuki <yusuke-suzuki@cybozu.co.jp>
Signed-off-by: Gilberto Bertin <gilberto@isovalent.com>
@jibi jibi force-pushed the pr/v1.11-backport-2022-02-14 branch from d668a4a to 3a236f8 Compare February 15, 2022 11:04
@jibi
Copy link
Member Author

jibi commented Feb 15, 2022

/test-backport-1.11

Job 'Cilium-PR-K8s-GKE' failed and has not been observed before, so may be related to your PR:

Click to show.

Test Name

K8sServicesTest Checks ClusterIP Connectivity Checks service accessing itself (hairpin flow)

Failure Output

FAIL: Request from echo-8fd54d9fd-cg8ww pod to service http://10.55.255.9:80/ failed

If it is a flake, comment /mlh new-flake Cilium-PR-K8s-GKE so I can create a new GitHub issue to track it.

@jibi
Copy link
Member Author

jibi commented Feb 15, 2022

/test-gke

@jibi jibi added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Feb 16, 2022
@jibi jibi merged commit 2cb4128 into v1.11 Feb 16, 2022
@jibi jibi deleted the pr/v1.11-backport-2022-02-14 branch February 16, 2022 09:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/backports This PR provides functionality previously merged into master. ready-to-merge This PR has passed all tests and received consensus from code owners to merge.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants