Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v1.14 Author backport of #30423 #30534

Merged
merged 6 commits into from
Jan 31, 2024

Conversation

gandro
Copy link
Member

@gandro gandro commented Jan 30, 2024

Once this PR is merged, a GitHub action will update the labels of these PRs:

 30423

[ upstream commit 0f01daa ]

This global setter is no longer used since 95029ec
("daemon: Implement LocalNodeInitializer to fill local node info").

Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
@gandro gandro added kind/backports This PR provides functionality previously merged into master. backport/1.14 This PR represents a backport for Cilium 1.14.x of a PR that was merged to main. labels Jan 30, 2024
@gandro gandro changed the title v1.14 Backports 2024-01-30 v1.14 Author backport of #30423 Jan 30, 2024
[ upstream commit 9cd56a2 ]

WireGuard-based encryption uses different values when it comes to the
EncryptKey field in the IPCache:

 - For endpoints, the EncryptKey should always be non-zero if pod-to-pod
   encryption is enabled.
 - For nodes, the EncryptKey should be non-zero if node-to-node
   encryption is enabled, and if the node has not opted out of
   node-to-node encryption.

Before this commit, we were deriving the EncryptKey of endpoints written
to the kvstore IPCache (see `runIPIdentitySync`) based on the node's
EncryptKey - which is the wrong source of truth for that value.

Luckily, due to a bug, it is currently not possible for Cilium nodes
running in kvstore mode to opt-out of node-to-node encryption, so the
value was always effectively non-zero and the result was accidentally
correct.

However, as we want to fix the node-to-node opt-out mechanism in a
subsequent commit, we ought to fix that first. Therefore, this commit
fixes up all call sites which set an endpoint's EncryptKey to use the
same source of truth.

Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
[ upstream commit 42910d4 ]

This removes an unnecessary if condition in
`GetEndpointEncryptKeyIndex`. If WireGuard is enabled, the WireGuard
agent (see `pkg/wireguard/agent.NewAgent`) always sets a public key
for the local node.

Therefore this if condition is always true and we currently have no
plans for nodes to have WireGuard enabled without them also producing a
public key. Therefore this commit removes that superfluous condition.

Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
[ upstream commit 230bdd9 ]

This commit removes the localNodeStore field from the WireGuard agent,
as is it not accessed outside of the constructor.

Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
[ upstream commit 930d442 ]

This commit fixes an issue where the node's EncryptKey field in the
kvstore was not set to zero for nodes which opted out of node-to-node
encryption, thereby effectively encrypting traffic for nodes which
should not have been encrypted.

This was due to the fact that the logic zeroing the local node's
EncryptKey based on `localNode.OptOutNodeEncryption` was only performed
when writing to the K8s CiliumNode CRD (in
`NodeDiscovery.mutateNodeResource`), but not when writing to the kvstore
(in the controller started by `NodeDiscovery.updateLocalNode`).

This commit fixes this issue by zeroing `localNode.EncryptKey` directly
at the source of truth in the `LocalNodeStore`.

In addition, we need to be careful when that value is being written:

 - To avoid flapping values, the EncryptKey value needs to be initialized
   before we publish the local node object to the kvstore or k8s.
   Therefore, initialization of the EncryptKey field needs to happen
   before `LocalNodeSynchronizer.InitLocalNode` returns.

 - In order to determine if the local node has to opt out of
   node-to-node encryption, we need to know the local node's labels.
   Those are only available after `localNodeSynchronizer.initFromK8s` has
   been called - therefore we cannot set the field in a hive constructor
   like we do for some other fields that need to be initialized early.

Therefore, this commit moves the initialization of the WireGuard related
fields into `LocalNodeSynchronizer.InitLocalNode`. This satisfies the
the above constraints for the `EncryptKey` and `OptOutNodeEncryption`
values. The initialization of `WireguardPubKey` and the accompanying
annotation can be performed earlier without loss of correctness, but
to keep all local node initialization in the same place, we now also
write those fields during `InitLocalNode`. This is safe, because
`InitLocalNode` happens before the K8s/kvstore node object is published.

This commit incidentally also fixes a bug where the
`OptOutNodeEncryption` field in the WireGuard agent was read out too
early, i.e. before `InitLocalNode` had a chance to initialize it. That
bug had little consequence, as it only materialized in the `cilium
status` not reporting the opt-out status correctly, and caused node IPs
to be unnecessarily added to the WireGuard peer list. Both issues did
not affect the encrypted traffic.

Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
[ upstream commit d4be9e8 ]

WireGuard-based encryption uses different values when it comes to the
EncryptKey field in the IPCache:

 - For endpoints, the EncryptKey should always be non-zero if pod-to-pod
   encryption is enabled.
 - For nodes, the EncryptKey should be non-zero if node-to-node
   encryption is enabled, and if the node has not opted out of
   node-to-node encryption.

When creating IPCache entries for regular endpoints, the EncryptKey
value is taken from either the CEP/CES Kubernetes custom resource, or
from the IP entry in the kvstore.

However, the IPCache entries for the health and ingress endpoints are
dervied from the node resource. Before this commit, those entries
therefore also used the node's EncryptKey. When a node opted out of
node-to-node encryption however, that meant that we did not encrypt
traffic for health and ingress to those nodes.

Thus, this commit works around that issue by ignoring the node's
EncryptKey for the health and ingress endpoints and instead always
encrypt that traffic (which is what we do for regular endpoints with
WireGuard encryption already).

This commit can be considered a workaround. Ideally, we would have
separate fields in the node resource for the node and its special
endpoints. That however is a larger schema change, thus this commit
focuses on backportable changes.

Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
@gandro gandro force-pushed the pr/v1.14-backport-2024-01-30 branch from 101313f to 8d29f83 Compare January 30, 2024 16:43
@gandro
Copy link
Member Author

gandro commented Jan 30, 2024

/test-backport-1.14

@gandro gandro marked this pull request as ready for review January 31, 2024 09:35
@gandro gandro requested a review from a team as a code owner January 31, 2024 09:35
@gandro gandro requested a review from brb January 31, 2024 09:45
@gandro
Copy link
Member Author

gandro commented Jan 31, 2024

I've manually tested this branch with kvstore encryption locally, it seems to work as advertised.

@maintainer-s-little-helper maintainer-s-little-helper bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Jan 31, 2024
@julianwiedmann julianwiedmann merged commit 7dad76c into cilium:v1.14 Jan 31, 2024
59 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport/1.14 This PR represents a backport for Cilium 1.14.x of a PR that was merged to main. kind/backports This PR provides functionality previously merged into master. ready-to-merge This PR has passed all tests and received consensus from code owners to merge.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants