New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
v1.10 backports 2022-02-14 #18801
Merged
Merged
v1.10 backports 2022-02-14 #18801
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
[ upstream commit f6a4104 ] This patch updates the Cilium logic for handling remote node identity updates to ensure that when Cilium's '--enable-remote-node-identity' flag is configured, each Cilium node will consistently consider all other nodes as having the "remote-node" identity. This fixes an issue where users reported policy drops from remote nodes -> pods, even though the policy appeared to allow this. The issue was limited to kvstore configurations of Cilium, and does not affect configurations where CRDs are used for sharing information within the cluster. For background: When Cilium starts up, it locally scans for IP addresses associated with the node, and updates its own IPcache to associate those IPs with the "host" identity. Additionally, it will also publish this information to other nodes so that they can make policy decisions regarding traffic coming from IPs attached to nodes in the cluster. Before commit 7bf60a5 ("nodediscovery: Fix local host identity propagation"), Cilium would propagate the identity "remote-node" as part of these updates to other nodes. After that commit, it would propagate the identity "host" as part of these updates to other nodes. When receiving these updates, Cilium would trust the identity directly and push IP->Identity mappings like this into the datapath, regardless of whether the '--enable-remote-node-identity' setting was configured or not. As such, when the above commit changed the behaviour, it triggered a change in policy handling behaviour. The '--enable-remote-node-identity' flag was initially introduced to allow the security domain of remote nodes in the cluster to be considered differently vs. the local host. This can be important as Kubernetes defines that the host should always have access to pods on the node, so if all nodes are considered the same as the "host", this can represent a larger open policy surface for pods than necessary in a zero trust environment. Given the potential security implications of this setting, at the time that it was introduced, we introduced mitigations both in the control plane and in the data plane. Whenever the datapath is configured with --enable-remote-node-identity=true, it will also distrust any reports that peer node identities are "host", even if the ipcache itself reports this. In this situation, the datapath does not accept that the traffic is from the "host". Rather, it demotes the identity of the traffic to considering it as part of the "world". The motivation behind this is that allowing "world" is a very permissive policy, so if the user is OK with allowing "world" traffic then it is likely that they will be OK with accepting any traffic like this which purports to be coming from a "host" in the cluster. As a result of the above conditions, users running in kvstore mode who upgraded from earlier Cilium versions to 1.9.12, 1.10.6 or 1.11.0 (and other releases up until this patch is released as part of an official version) could observe traffic drops for traffic from nodes in the cluster towards pods on other nodes in the cluster. Hubble would report that the traffic is coming "from the world" (identity=2), despite having a source address of another node in the cluster. We considered multiple approaches to solving this issue: A) Revert the commit that introduced the issue (see GH-18763). * Evidently, by this point there are multiple other codepaths relying on the internal storage of the local node's identity as Host, which would make this more difficult. B) Ensure that the kvstore propagation code propagates the current node's identity as "remote-node", as other nodes may expect. * In cases of versions with mixed knowledge of remote-node-identity (for instance during upgrade), then newer nodes could end up propagating the new identity, but old nodes would not understand how to calculate policy with this identity in consideration, so this could result in similar sorts of policy drops during upgrade. C) In the case when --enable-remote-node-identity=true, ensure that when Cilium receives updates from peer nodes, it demotes the "host" identity reported by peer nodes down to "remote-node" for the associated IP addresses. This way, the impact of the flag is limited to the way that the current node configures itself only. If the datapath is then informed (via ipcache) that thes IPs correspond to "remote-node", then the policy will be correctly assessed. This commit takes approach (C). Fixes: 7bf60a5 ("nodediscovery: Fix local host identity propagation") Co-authored-by: André Martins <andre@cilium.io> Signed-off-by: Joe Stringer <joe@cilium.io> Signed-off-by: Gilberto Bertin <gilberto@isovalent.com>
maintainer-s-little-helper
bot
added
backport/1.10
kind/backports
This PR provides functionality previously merged into master.
labels
Feb 14, 2022
/test-backport-1.10 Job 'Cilium-PR-K8s-1.17-kernel-4.9' failed and has not been observed before, so may be related to your PR: Click to show.Test Name
Failure Output
If it is a flake, comment |
joestringer
approved these changes
Feb 14, 2022
/test-upstream-k8s |
/test-1.17-4.9 |
jibi
added
the
ready-to-merge
This PR has passed all tests and received consensus from code owners to merge.
label
Feb 16, 2022
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
kind/backports
This PR provides functionality previously merged into master.
ready-to-merge
This PR has passed all tests and received consensus from code owners to merge.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Once this PR is merged, you can update the PR labels via:
or with