-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ipcache: Fix node-to-node encryption with kube-apiserver #19318
Conversation
37cfbbd
to
180ac10
Compare
180ac10
to
f9179b3
Compare
Updated the PR based on a suggestion by Chris. Doing some manual testing as well. |
// Upsert() will return true if the ipcache entry is owned by | ||
// the source of the node update that triggered this node | ||
// update (kvstore, k8s, ...) The datapath is only updated if | ||
// that source of truth is updated. | ||
if err != nil { | ||
dpUpdate = false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this is somewhat safe to remove, as we also have a source overwrite check here:
https://github.com/cilium/cilium/blob/f9179b3b351d55ee54a5fa9538751dcac44b17ca/pkg/node/manager/manager.go#L470-L477
But given the asynchronous nature of TriggerLabelInjection
, it would become much harder to learn if the upsert succeeded.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I presume you're referring to the removal of the dpUpdate
logic?
From what I understand, the idea is that we first try to inject an ipcache entry, which could plausibly fail (for instance if the node update is for the local node, where we already knew about the local node locally via NodeDiscovery
), so then in that case we don't need to go and update the datapath and invoke nh.NodeUpdate(...)
/ nh.NodeAdd(...)
below.
So I think the result is that all of the NodeManager subscribers (hubble and datapath, where datapath does the route updates, tunnel map updates, ipsec/wireguard config etc.) will now receive updates even if the ipcache suggests that nothing needs to change. Hopefully those subscribers are mostly idempotent anyway?
IMHO the previous logic where ipsAdded
was extended or not based on the success of the ipcache modification seemed a bit odd, that doesn't change whether the node's IP is changing or not. 🤷
f9179b3
to
a287086
Compare
7cb562a
to
6fef76a
Compare
6fef76a
to
bd49a90
Compare
pkg/ipcache/metadata_test.go
Outdated
@@ -41,18 +41,20 @@ func TestInjectLabels(t *testing.T) { | |||
// Upsert node labels to the kube-apiserver to validate that the CIDR ID is | |||
// deallocated and the kube-apiserver reserved ID is associated with this | |||
// IP now. | |||
IPIdentityCache.UpsertMetadata("10.0.0.4", labels.LabelRemoteNode) | |||
IPIdentityCache.UpsertMetadata("10.0.0.4", labels.LabelRemoteNode, source.CustomResource) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note: I replaced most sources here to source.CustomResource
to mimic how the real invocation will look like. The local node (source.Local
) should never get the kubeapiserver identity.
bd49a90
to
1259275
Compare
/test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't look too closely at the manager_test.go
changes but overall the PR LGTM.
}) | ||
|
||
m.upsertIntoIDMD(ipAddrStr, remoteHostIdentity, n.Source) | ||
m.ipcache.UpsertAuxiliary(ipAddrStr, tunnelIP, key) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where's the corresponding removal of Auxiliary data during the update (for the old IP, if any)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I guess we just rely on the current ipcache.Delete()
calls to clean these up. That's fine for now.
// Upsert() will return true if the ipcache entry is owned by | ||
// the source of the node update that triggered this node | ||
// update (kvstore, k8s, ...) The datapath is only updated if | ||
// that source of truth is updated. | ||
if err != nil { | ||
dpUpdate = false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I presume you're referring to the removal of the dpUpdate
logic?
From what I understand, the idea is that we first try to inject an ipcache entry, which could plausibly fail (for instance if the node update is for the local node, where we already knew about the local node locally via NodeDiscovery
), so then in that case we don't need to go and update the datapath and invoke nh.NodeUpdate(...)
/ nh.NodeAdd(...)
below.
So I think the result is that all of the NodeManager subscribers (hubble and datapath, where datapath does the route updates, tunnel map updates, ipsec/wireguard config etc.) will now receive updates even if the ipcache suggests that nothing needs to change. Hopefully those subscribers are mostly idempotent anyway?
IMHO the previous logic where ipsAdded
was extended or not based on the success of the ipcache modification seemed a bit odd, that doesn't change whether the node's IP is changing or not. 🤷
pkg/ipcache/metadata.go
Outdated
@@ -146,7 +164,7 @@ func (ipc *IPCache) InjectLabels(src source.Source) error { | |||
// cluster), or CIDR IDs for kube-apiservers deployed outside of the | |||
// cluster. | |||
// Also, any new identity should be upserted, regardless. | |||
if hasKubeAPIServerLabel || isNew { | |||
if id.ID.IsReservedNodeIdentity() || hasKubeAPIServerLabel || isNew { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a vague suspicion that this is the line that's causing the failure:
\n\t ipcache-inject-labels 4m5s ago 7s ago 22 failed to inject labels into ipcache: failed to upsert fd04::11 into ipcache with identity 1: unable to overwrite source \"local\" with source \"kvstore\" \n\t k8s-heartbeat 11s ago never 0 no error
Basically, this function can be called at almost any time, perhaps multiple times, to evaluate the entire metadata map and then attempt to apply the changes described in this code. However, so far the node's entries (except for the node with the kube-apiserver) are primarily managed by directly ipcache.Upsert()
. When the source of the information may be set to kvstore
here (because the current agent learned about its own existence via updates from the kvstore), it will go into this conditional statement (due to the IsReservedNodeIdentity()
change here), then attempt to upsert an entry into the IPCache with something like {IP X has identity 1, and the source of this info is the kvstore}
. However, because NodeDiscovery
module would have already injected this IP/ID into the IPCache before with a higher-priority source (Local, ie directly observed from the kernel), so now when this code runs to update with the kvstore-sourced info, the upsert will fail... and then that failure gets returned back out to the controller. Once the controller fails enough times, the test will check this and fail out purely because of the failing controller.
The whole source
thing is really difficult to work with right now, since it assumes that there is only one source of information for any particular piece of information, which is simply untrue. Chris and I have been working on a rework of the IPCache that we hope to post soon which will help alleviate these sorts of issues, but that doesn't really help you much in the mean time. Hmm.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that your goal here is that when an update comes in from the NodeManager, you want the TriggerLabelInjection()
to re-evaluate the Auxiliary data (ie encrypt key) and update any existing entries with that info.
You might be able to get away with this by basically keeping this line as-is, but then having another statement similar but differently below the hasKubeAPIServerLabel
case inside this conditional statement, where you look up the existing entry's source
and use that iff the entry corresponds to a reserved node identity. It's a bit ugly but we're about to rewrite this code anyway, so if it's just enough to solve your bug then it may be a reasonable option.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for the detailed feedback and review. My original intent was to keep the Upsert
source via UpsertLabels
(a832573), but as you pointed out that doesn't seem to work in all cases.
Reusing the existing entry's source I think sounds indeed like an easy workaround, I'll give that a try.
This adds a new method to update the encryption IP and Key pair without a full upsert. This will be used in a subsequent commit when entries are created via InjectLabels, as it will read the ipToHostIPCache map when upserting entries. Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
The kube-apiserver reserved identity requires us to create a IPCache entry based on the endpoints of the `default/kubernetes` K8s service. This kube-apiserver IPCache entry will overwrite any existing remote host entries. The entry also blocks the creation of remote host entries for the same IP (via `source.AllowsOverwrite`). When the kube-apiserver entry is created, we check if there is an existing remote node entry, and if so, copy over the IPSec keypair information into the newly created kube-apiserver entry. If however the order is reversed (we first create the kube-apiserver entry before the remote node is discovered), then the IPSec keypair is never updated. This in turn means that connectivity to any remote nodes which also run the kube-apiserver is broken when node-to-node encryption is enabled. This commit works around the issue by adding the IPKeyPair before the labels are injected. This allows InjectLabels to look up the pair whenever the entry is created or updated. We also update InjectLabels to always upsert entries for reserved node identities, to ensure new IPCache entries are created for any discovered (remote or local) nodes. The user-space IPCache does check if an upsert actually modifies the existing entry and will return early if not, so no unnecessary BPF map updates should be triggered for this. The node manager unit test is also updated to use the real IPCache instead of a mock one to ensure that the interaction between the two packages works as expected. Signed-off-by: Sebastian Wicki <sebastian@isovalent.com>
1259275
to
e77ec4f
Compare
/test |
Seems like this approach breaks host connectivity in tunnel mode. Not really sure yet why, I don't see anything obviously wrong with IPCache in the sysdumps. 🤔 |
This commit moves the node/manager package to use the new asynchronous IPCache API. Instead of directly performing Upserts and Delete on the various node IPs (InternalIP, ExternalIP, HealthIPs etc), we now associate each node IP with the corresponding labels. The CEW identity is now also determined by the node's labels, rather than its numeric identity. This also fixes an issue where concurrent use of the synchronous and asynchronous API would lead to the encryption key for the kube-apiserver node being lost (c.f. #19318). While we are at it, change the test to use netip types instead of net.IP. Co-authored-by: Chris Tarazi <chris@isovalent.com> Signed-off-by: Sebastian Wicki <sebastian@isovalent.com> Signed-off-by: Chris Tarazi <chris@isovalent.com>
This commit moves the node/manager package to use the new asynchronous IPCache API. Instead of directly performing Upserts and Delete on the various node IPs (InternalIP, ExternalIP, HealthIPs etc), we now associate each node IP with the corresponding labels. The CEW identity is now also determined by the node's labels, rather than its numeric identity. This also fixes an issue where concurrent use of the synchronous and asynchronous API would lead to the encryption key for the kube-apiserver node being lost (c.f. #19318). While we are at it, change the test to use netip types instead of net.IP. Co-authored-by: Chris Tarazi <chris@isovalent.com> Signed-off-by: Sebastian Wicki <sebastian@isovalent.com> Signed-off-by: Chris Tarazi <chris@isovalent.com>
This commit moves the node/manager package to use the new asynchronous IPCache API. Instead of directly performing Upserts and Delete on the various node IPs (InternalIP, ExternalIP, HealthIPs etc), we now associate each node IP with the corresponding labels. The CEW identity is now also determined by the node's labels, rather than its numeric identity. This also fixes an issue where concurrent use of the synchronous and asynchronous API would lead to the encryption key for the kube-apiserver node being lost (c.f. #19318). While we are at it, change the test to use netip types instead of net.IP. Co-authored-by: Chris Tarazi <chris@isovalent.com> Signed-off-by: Sebastian Wicki <sebastian@isovalent.com> Signed-off-by: Chris Tarazi <chris@isovalent.com>
This commit moves the node/manager package to use the new asynchronous IPCache API. Instead of directly performing Upserts and Delete on the various node IPs (InternalIP, ExternalIP, HealthIPs etc), we now associate each node IP with the corresponding labels. The CEW identity is now also determined by the node's labels, rather than its numeric identity. This also fixes an issue where concurrent use of the synchronous and asynchronous API would lead to the encryption key for the kube-apiserver node being lost (c.f. #19318). While we are at it, change the test to use netip types instead of net.IP. Co-authored-by: Chris Tarazi <chris@isovalent.com> Signed-off-by: Sebastian Wicki <sebastian@isovalent.com> Signed-off-by: Chris Tarazi <chris@isovalent.com>
This commit moves the node/manager package to use the new asynchronous IPCache API. Instead of directly performing Upserts and Delete on the various node IPs (InternalIP, ExternalIP, HealthIPs etc), we now associate each node IP with the corresponding labels. The CEW identity is now also determined by the node's labels, rather than its numeric identity. This also fixes an issue where concurrent use of the synchronous and asynchronous API would lead to the encryption key for the kube-apiserver node being lost (c.f. #19318). While we are at it, change the test to use netip types instead of net.IP. Co-authored-by: Chris Tarazi <chris@isovalent.com> Signed-off-by: Sebastian Wicki <sebastian@isovalent.com> Signed-off-by: Chris Tarazi <chris@isovalent.com>
This commit moves the node/manager package to use the new asynchronous IPCache API. Instead of directly performing Upserts and Delete on the various node IPs (InternalIP, ExternalIP, HealthIPs etc), we now associate each node IP with the corresponding labels. The CEW identity is now also determined by the node's labels, rather than its numeric identity. This also fixes an issue where concurrent use of the synchronous and asynchronous API would lead to the encryption key for the kube-apiserver node being lost (c.f. #19318). While we are at it, change the test to use netip types instead of net.IP. Co-authored-by: Chris Tarazi <chris@isovalent.com> Signed-off-by: Sebastian Wicki <sebastian@isovalent.com> Signed-off-by: Chris Tarazi <chris@isovalent.com>
This commit moves the node/manager package to use the new asynchronous IPCache API. Instead of directly performing Upserts and Delete on the various node IPs (InternalIP, ExternalIP, HealthIPs etc), we now associate each node IP with the corresponding labels. The CEW identity is now also determined by the node's labels, rather than its numeric identity. This also fixes an issue where concurrent use of the synchronous and asynchronous API would lead to the encryption key for the kube-apiserver node being lost (c.f. #19318). While we are at it, change the test to use netip types instead of net.IP. Co-authored-by: Chris Tarazi <chris@isovalent.com> Signed-off-by: Sebastian Wicki <sebastian@isovalent.com> Signed-off-by: Chris Tarazi <chris@isovalent.com>
This commit moves the node/manager package to use the new asynchronous IPCache API. Instead of directly performing Upserts and Delete on the various node IPs (InternalIP, ExternalIP, HealthIPs etc), we now associate each node IP with the corresponding labels. The CEW identity is now also determined by the node's labels, rather than its numeric identity. This also fixes an issue where concurrent use of the synchronous and asynchronous API would lead to the encryption key for the kube-apiserver node being lost (c.f. #19318). While we are at it, change the test to use netip types instead of net.IP. Co-authored-by: Chris Tarazi <chris@isovalent.com> Signed-off-by: Sebastian Wicki <sebastian@isovalent.com> Signed-off-by: Chris Tarazi <chris@isovalent.com>
This commit moves the node/manager package to use the new asynchronous IPCache API. Instead of directly performing Upserts and Delete on the various node IPs (InternalIP, ExternalIP, HealthIPs etc), we now associate each node IP with the corresponding labels. The CEW identity is now also determined by the node's labels, rather than its numeric identity. This also fixes an issue where concurrent use of the synchronous and asynchronous API would lead to the encryption key for the kube-apiserver node being lost (c.f. #19318). While we are at it, change the test to use netip types instead of net.IP. Co-authored-by: Chris Tarazi <chris@isovalent.com> Signed-off-by: Sebastian Wicki <sebastian@isovalent.com> Signed-off-by: Chris Tarazi <chris@isovalent.com>
This commit moves the node/manager package to use the new asynchronous IPCache API. Instead of directly performing Upserts and Delete on the various node IPs (InternalIP, ExternalIP, HealthIPs etc), we now associate each node IP with the corresponding labels. The CEW identity is now also determined by the node's labels, rather than its numeric identity. This also fixes an issue where concurrent use of the synchronous and asynchronous API would lead to the encryption key for the kube-apiserver node being lost (c.f. #19318). While we are at it, change the test to use netip types instead of net.IP. Co-authored-by: Chris Tarazi <chris@isovalent.com> Signed-off-by: Sebastian Wicki <sebastian@isovalent.com> Signed-off-by: Chris Tarazi <chris@isovalent.com>
This commit moves the node/manager package to use the new asynchronous IPCache API. Instead of directly performing Upserts and Delete on the various node IPs (InternalIP, ExternalIP, HealthIPs etc), we now associate each node IP with the corresponding labels. The CEW identity is now also determined by the node's labels, rather than its numeric identity. This also fixes an issue where concurrent use of the synchronous and asynchronous API would lead to the encryption key for the kube-apiserver node being lost (c.f. #19318). While we are at it, change the test to use netip types instead of net.IP. Co-authored-by: Chris Tarazi <chris@isovalent.com> Signed-off-by: Sebastian Wicki <sebastian@isovalent.com> Signed-off-by: Chris Tarazi <chris@isovalent.com>
The kube-apiserver reserved identity requires us to create a
IPCache entry based on the endpoints of the
default/kubernetes
K8sservice.
This kube-apiserver IPCache entry will overwrite any existing remote
host entries. The entry also blocks the creation of remote host entries
for the same IP (via
source.AllowsOverwrite
).When the kube-apiserver entry is created, we check if there is an
existing remote node entry, and if so, copy over the IPSec keypair
information into the newly created kube-apiserver entry. If however the
order is reversed (we first create the kube-apiserver entry before the
remote node is discovered), then the IPSec keypair is never updated.
This in turn means that connectivity to any remote nodes which also run
the kube-apiserver is broken when node-to-node encryption is enabled.
This commit works around the issue by adding the IPKeyPair before the
labels are injected. This allows InjectLabels to look up the pair
whenever the entry is created or updated.
This work is also needed for the upcoming WireGuard node-to-node
encryption.
--
(Edit: This has been partially addressed) @joestringer @christarazi I would highly value your feedback on this, as this is rather hacky at the moment. I think an alternative (maybe cleaner?) approach would be to bring in the IPSec key pair via
UpsertMetadata
/TriggerLabelInjection
or something, but it would require us to pass in the encrypt key and host IP to those methods.