"kube-apiserver" policy entity is broken on dual-stack clusters #28259

squeed · 2023-09-25T10:51:35Z

What happened?

A bug happened!

The mechanism that applies the kube-apiserver label can cause flaps when the IP in question belongs to the local node. This only applies to clusters where nodes have ipv6 addresses. (Note that whether or not Cilium is in dual-stack mode is irrelevant).

The root cause is somewhat complicated due to the number of special cases:

the ipcache "back-propagates" labels up to a global value only for the local identity.
the default/kubernetes service is always single-stack, even for dual-stack clusters
the node-manager doesn't understand the kube-apsierver identity; it is applied purely to IPs in the ipcache metadata layer.

So when there are multiple IPs for the reserved:node identity, but their set of labels is not uniform, it is last-write-wins. Whichever IP is most recently triggered will have those labels applied to the reserved:node identity.

Cilium Version

v1.14 at least, perhaps older.

Edit: See #28259 (comment).

The text was updated successfully, but these errors were encountered:

squeed · 2023-09-25T11:17:51Z

For example, running cilium identity list in short succession in the same pod results in different values:

root@kind-control-plane:/home/cilium# cilium identity list
ID      LABELS
1       reserved:host
...
root@kind-control-plane:/home/cilium# cilium identity list
ID      LABELS
1       reserved:host
        reserved:kube-apiserver

(this is a kind cluster, the kube-apiserver is stable)

squeed · 2023-09-25T13:33:48Z

We're going to have to break an abstraction here somewhere. The code that watches the kubernetes/default service only updates the ip->label mappings in the ipcache. The ipcache doesn't really understand identities, but it does update the labels of reserved identity 1. The node manager only knows about node-level labels.

Something is going to have to break the abstraction and and pull together any and all ip-level identities that happen to apply to a local-node IP.

The `reserved:host` identity is special: the numeric identity is fixed and the set of labels is mutable. (The datapath requires this.) So, we need to determine all prefixes that have the `reserved:host` label and capture their labels. Then, we must aggregate *all* labels from all IPs and insert them as the `reserved:host` identity labels. Likewise, we need to always update the SelectorCache; we cannot short-circuit if the ipcache already has that identity. Again, this is needed because the identity is mutable. Without this, when there are multiple IPs with the host label, the identity may flap and the SelectorCache may be missing updates. Fixes: cilium#28259 Signed-off-by: Casey Callendrello <cdc@isovalent.com>

squeed · 2023-10-04T09:28:17Z

Update: @christarazi determined that this bug was introduced in #19765. Thus, this is present starting from v1.13.

The fix, #28332, should be a relatively simple backport.

The `reserved:host` identity is special: the numeric identity is fixed and the set of labels is mutable. (The datapath requires this.) So, we need to determine all prefixes that have the `reserved:host` label and capture their labels. Then, we must aggregate *all* labels from all IPs and insert them as the `reserved:host` identity labels. However, the code as written has a race condition whenever the local node has more than one IP address. This can happen when, for example vxlan or ipv6 is enabled. The basic sequence is this: 1. Insert IP A as `reserved:host` in to the ipcache. ID 1 now has labels `reserved:host` 2. Insert IP A as `reserved:kube-apiserver` in to the ipcache. ID 1 is updated with labels `reserved:host, reserved:kube-apsierver` 3. Insert IP B as `reserved:host` in to the ipcache. ID 1 is updated with labels `reserved:host`. And now policies that select `reserved:kube-apiserver` are broken Likewise, we need to always update the SelectorCache; we cannot short-circuit if the ipcache already has that identity. Again, this is needed because the identity is mutable. So this bug can take another form: 1. Insert IP A as `reserved:host` in to the ipcache. Because IP A is not known to the ipcache, treat ID 1 as a new identity and update the selector cache 2. Insert IP A as `reserved:kube-apiserver`. Mutate the labels of ID 1. But, because IP A already has ID 1, short-circuit the update to the selector cache (if the Source is the same, which it _may_ be). 3. Now the selector cache has incorrect labels for ID 1. Without this, when there are multiple IPs with the host label, the identity may flap and the SelectorCache may be missing updates. Fixes: cilium#28259 Fixes: e0d403a Fixes: 308c142 Signed-off-by: Casey Callendrello <cdc@isovalent.com>

The `reserved:host` identity is special: the numeric identity is fixed and the set of labels is mutable. (The datapath requires this.) So, we need to determine all prefixes that have the `reserved:host` label and capture their labels. Then, we must aggregate *all* labels from all IPs and insert them as the `reserved:host` identity labels. However, the code as written has a race condition whenever the local node has more than one IP address. This can happen when, for example vxlan or ipv6 is enabled. The basic sequence is this: 1. Insert IP A as `reserved:host` in to the ipcache. ID 1 now has labels `reserved:host` 2. Insert IP A as `reserved:kube-apiserver` in to the ipcache. ID 1 is updated with labels `reserved:host, reserved:kube-apsierver` 3. Insert IP B as `reserved:host` in to the ipcache. ID 1 is updated with labels `reserved:host`. And now policies that select `reserved:kube-apiserver` are broken Likewise, we need to always update the SelectorCache; we cannot short-circuit if the ipcache already has that identity. Again, this is needed because the identity is mutable. So this bug can take another form: 1. Insert IP A as `reserved:host` in to the ipcache. Because IP A is not known to the ipcache, treat ID 1 as a new identity and update the selector cache 2. Insert IP A as `reserved:kube-apiserver`. Mutate the labels of ID 1. But, because IP A already has ID 1, short-circuit the update to the selector cache (if the Source is the same, which it _may_ be). 3. Now the selector cache has incorrect labels for ID 1. Without this, when there are multiple IPs with the host label, the identity may flap and the SelectorCache may be missing updates. Fixes: #28259 Fixes: e0d403a Fixes: 308c142 Signed-off-by: Casey Callendrello <cdc@isovalent.com>

[ upstream commit d50a525 ] [ backporter's notes: - oldID was renamed to id, adapted to that - v1.13 already mocked out the selector cache, updated that ] The `reserved:host` identity is special: the numeric identity is fixed and the set of labels is mutable. (The datapath requires this.) So, we need to determine all prefixes that have the `reserved:host` label and capture their labels. Then, we must aggregate *all* labels from all IPs and insert them as the `reserved:host` identity labels. However, the code as written has a race condition whenever the local node has more than one IP address. This can happen when, for example vxlan or ipv6 is enabled. The basic sequence is this: 1. Insert IP A as `reserved:host` in to the ipcache. ID 1 now has labels `reserved:host` 2. Insert IP A as `reserved:kube-apiserver` in to the ipcache. ID 1 is updated with labels `reserved:host, reserved:kube-apsierver` 3. Insert IP B as `reserved:host` in to the ipcache. ID 1 is updated with labels `reserved:host`. And now policies that select `reserved:kube-apiserver` are broken Likewise, we need to always update the SelectorCache; we cannot short-circuit if the ipcache already has that identity. Again, this is needed because the identity is mutable. So this bug can take another form: 1. Insert IP A as `reserved:host` in to the ipcache. Because IP A is not known to the ipcache, treat ID 1 as a new identity and update the selector cache 2. Insert IP A as `reserved:kube-apiserver`. Mutate the labels of ID 1. But, because IP A already has ID 1, short-circuit the update to the selector cache (if the Source is the same, which it _may_ be). 3. Now the selector cache has incorrect labels for ID 1. Without this, when there are multiple IPs with the host label, the identity may flap and the SelectorCache may be missing updates. Fixes: cilium#28259 Fixes: e0d403a Fixes: 308c142 Signed-off-by: Casey Callendrello <cdc@isovalent.com>

[ upstream commit d50a525 ] [ Backporter's notes: minor renames, otherwise a clean backport ] The `reserved:host` identity is special: the numeric identity is fixed and the set of labels is mutable. (The datapath requires this.) So, we need to determine all prefixes that have the `reserved:host` label and capture their labels. Then, we must aggregate *all* labels from all IPs and insert them as the `reserved:host` identity labels. However, the code as written has a race condition whenever the local node has more than one IP address. This can happen when, for example vxlan or ipv6 is enabled. The basic sequence is this: 1. Insert IP A as `reserved:host` in to the ipcache. ID 1 now has labels `reserved:host` 2. Insert IP A as `reserved:kube-apiserver` in to the ipcache. ID 1 is updated with labels `reserved:host, reserved:kube-apsierver` 3. Insert IP B as `reserved:host` in to the ipcache. ID 1 is updated with labels `reserved:host`. And now policies that select `reserved:kube-apiserver` are broken Likewise, we need to always update the SelectorCache; we cannot short-circuit if the ipcache already has that identity. Again, this is needed because the identity is mutable. So this bug can take another form: 1. Insert IP A as `reserved:host` in to the ipcache. Because IP A is not known to the ipcache, treat ID 1 as a new identity and update the selector cache 2. Insert IP A as `reserved:kube-apiserver`. Mutate the labels of ID 1. But, because IP A already has ID 1, short-circuit the update to the selector cache (if the Source is the same, which it _may_ be). 3. Now the selector cache has incorrect labels for ID 1. Without this, when there are multiple IPs with the host label, the identity may flap and the SelectorCache may be missing updates. Fixes: cilium#28259 Fixes: e0d403a Fixes: 308c142 Signed-off-by: Casey Callendrello <cdc@isovalent.com>

[ upstream commit d50a525 ] [ backporter's notes: - oldID was renamed to id, adapted to that - v1.13 already mocked out the selector cache, updated that ] The `reserved:host` identity is special: the numeric identity is fixed and the set of labels is mutable. (The datapath requires this.) So, we need to determine all prefixes that have the `reserved:host` label and capture their labels. Then, we must aggregate *all* labels from all IPs and insert them as the `reserved:host` identity labels. However, the code as written has a race condition whenever the local node has more than one IP address. This can happen when, for example vxlan or ipv6 is enabled. The basic sequence is this: 1. Insert IP A as `reserved:host` in to the ipcache. ID 1 now has labels `reserved:host` 2. Insert IP A as `reserved:kube-apiserver` in to the ipcache. ID 1 is updated with labels `reserved:host, reserved:kube-apsierver` 3. Insert IP B as `reserved:host` in to the ipcache. ID 1 is updated with labels `reserved:host`. And now policies that select `reserved:kube-apiserver` are broken Likewise, we need to always update the SelectorCache; we cannot short-circuit if the ipcache already has that identity. Again, this is needed because the identity is mutable. So this bug can take another form: 1. Insert IP A as `reserved:host` in to the ipcache. Because IP A is not known to the ipcache, treat ID 1 as a new identity and update the selector cache 2. Insert IP A as `reserved:kube-apiserver`. Mutate the labels of ID 1. But, because IP A already has ID 1, short-circuit the update to the selector cache (if the Source is the same, which it _may_ be). 3. Now the selector cache has incorrect labels for ID 1. Without this, when there are multiple IPs with the host label, the identity may flap and the SelectorCache may be missing updates. Fixes: #28259 Fixes: e0d403a Fixes: 308c142 Signed-off-by: Casey Callendrello <cdc@isovalent.com>

[ upstream commit d50a525 ] [ Backporter's notes: minor renames, otherwise a clean backport ] The `reserved:host` identity is special: the numeric identity is fixed and the set of labels is mutable. (The datapath requires this.) So, we need to determine all prefixes that have the `reserved:host` label and capture their labels. Then, we must aggregate *all* labels from all IPs and insert them as the `reserved:host` identity labels. However, the code as written has a race condition whenever the local node has more than one IP address. This can happen when, for example vxlan or ipv6 is enabled. The basic sequence is this: 1. Insert IP A as `reserved:host` in to the ipcache. ID 1 now has labels `reserved:host` 2. Insert IP A as `reserved:kube-apiserver` in to the ipcache. ID 1 is updated with labels `reserved:host, reserved:kube-apsierver` 3. Insert IP B as `reserved:host` in to the ipcache. ID 1 is updated with labels `reserved:host`. And now policies that select `reserved:kube-apiserver` are broken Likewise, we need to always update the SelectorCache; we cannot short-circuit if the ipcache already has that identity. Again, this is needed because the identity is mutable. So this bug can take another form: 1. Insert IP A as `reserved:host` in to the ipcache. Because IP A is not known to the ipcache, treat ID 1 as a new identity and update the selector cache 2. Insert IP A as `reserved:kube-apiserver`. Mutate the labels of ID 1. But, because IP A already has ID 1, short-circuit the update to the selector cache (if the Source is the same, which it _may_ be). 3. Now the selector cache has incorrect labels for ID 1. Without this, when there are multiple IPs with the host label, the identity may flap and the SelectorCache may be missing updates. Fixes: #28259 Fixes: e0d403a Fixes: 308c142 Signed-off-by: Casey Callendrello <cdc@isovalent.com>

[ upstream commit d50a525 ] [ Backporter's notes: minor renames, otherwise a clean backport ] The `reserved:host` identity is special: the numeric identity is fixed and the set of labels is mutable. (The datapath requires this.) So, we need to determine all prefixes that have the `reserved:host` label and capture their labels. Then, we must aggregate *all* labels from all IPs and insert them as the `reserved:host` identity labels. However, the code as written has a race condition whenever the local node has more than one IP address. This can happen when, for example vxlan or ipv6 is enabled. The basic sequence is this: 1. Insert IP A as `reserved:host` in to the ipcache. ID 1 now has labels `reserved:host` 2. Insert IP A as `reserved:kube-apiserver` in to the ipcache. ID 1 is updated with labels `reserved:host, reserved:kube-apsierver` 3. Insert IP B as `reserved:host` in to the ipcache. ID 1 is updated with labels `reserved:host`. And now policies that select `reserved:kube-apiserver` are broken Likewise, we need to always update the SelectorCache; we cannot short-circuit if the ipcache already has that identity. Again, this is needed because the identity is mutable. So this bug can take another form: 1. Insert IP A as `reserved:host` in to the ipcache. Because IP A is not known to the ipcache, treat ID 1 as a new identity and update the selector cache 2. Insert IP A as `reserved:kube-apiserver`. Mutate the labels of ID 1. But, because IP A already has ID 1, short-circuit the update to the selector cache (if the Source is the same, which it _may_ be). 3. Now the selector cache has incorrect labels for ID 1. Without this, when there are multiple IPs with the host label, the identity may flap and the SelectorCache may be missing updates. Fixes: cilium#28259 Fixes: e0d403a Fixes: 308c142 Signed-off-by: Casey Callendrello <cdc@isovalent.com> Signed-off-by: Jussi Maki <jussi@isovalent.com>

The `reserved:host` identity is special: the numeric identity is fixed and the set of labels is mutable. (The datapath requires this.) So, we need to determine all prefixes that have the `reserved:host` label and capture their labels. Then, we must aggregate *all* labels from all IPs and insert them as the `reserved:host` identity labels. However, the code as written has a race condition whenever the local node has more than one IP address. This can happen when, for example vxlan or ipv6 is enabled. The basic sequence is this: 1. Insert IP A as `reserved:host` in to the ipcache. ID 1 now has labels `reserved:host` 2. Insert IP A as `reserved:kube-apiserver` in to the ipcache. ID 1 is updated with labels `reserved:host, reserved:kube-apsierver` 3. Insert IP B as `reserved:host` in to the ipcache. ID 1 is updated with labels `reserved:host`. And now policies that select `reserved:kube-apiserver` are broken Likewise, we need to always update the SelectorCache; we cannot short-circuit if the ipcache already has that identity. Again, this is needed because the identity is mutable. So this bug can take another form: 1. Insert IP A as `reserved:host` in to the ipcache. Because IP A is not known to the ipcache, treat ID 1 as a new identity and update the selector cache 2. Insert IP A as `reserved:kube-apiserver`. Mutate the labels of ID 1. But, because IP A already has ID 1, short-circuit the update to the selector cache (if the Source is the same, which it _may_ be). 3. Now the selector cache has incorrect labels for ID 1. Without this, when there are multiple IPs with the host label, the identity may flap and the SelectorCache may be missing updates. Fixes: cilium#28259 Fixes: e0d403a Fixes: 308c142 Signed-off-by: Casey Callendrello <cdc@isovalent.com>

[ upstream commit d50a525 ] [ backporter's notes: - oldID was renamed to id, adapted to that - v1.13 already mocked out the selector cache, updated that ] The `reserved:host` identity is special: the numeric identity is fixed and the set of labels is mutable. (The datapath requires this.) So, we need to determine all prefixes that have the `reserved:host` label and capture their labels. Then, we must aggregate *all* labels from all IPs and insert them as the `reserved:host` identity labels. However, the code as written has a race condition whenever the local node has more than one IP address. This can happen when, for example vxlan or ipv6 is enabled. The basic sequence is this: 1. Insert IP A as `reserved:host` in to the ipcache. ID 1 now has labels `reserved:host` 2. Insert IP A as `reserved:kube-apiserver` in to the ipcache. ID 1 is updated with labels `reserved:host, reserved:kube-apsierver` 3. Insert IP B as `reserved:host` in to the ipcache. ID 1 is updated with labels `reserved:host`. And now policies that select `reserved:kube-apiserver` are broken Likewise, we need to always update the SelectorCache; we cannot short-circuit if the ipcache already has that identity. Again, this is needed because the identity is mutable. So this bug can take another form: 1. Insert IP A as `reserved:host` in to the ipcache. Because IP A is not known to the ipcache, treat ID 1 as a new identity and update the selector cache 2. Insert IP A as `reserved:kube-apiserver`. Mutate the labels of ID 1. But, because IP A already has ID 1, short-circuit the update to the selector cache (if the Source is the same, which it _may_ be). 3. Now the selector cache has incorrect labels for ID 1. Without this, when there are multiple IPs with the host label, the identity may flap and the SelectorCache may be missing updates. Fixes: #28259 Fixes: e0d403a Fixes: 308c142 Signed-off-by: Casey Callendrello <cdc@isovalent.com>

squeed added kind/bug This is a bug in the Cilium logic. sig/policy Impacts whether traffic is allowed or denied based on user-defined policies. sig/agent Cilium agent related. labels Sep 25, 2023

aanm added needs/triage This issue requires triaging to establish severity and next steps. kind/community-report This was reported by a user in the Cilium community, eg via Slack. labels Sep 26, 2023

squeed mentioned this issue Sep 29, 2023

ipcache: fix flapping labels in SelectorCache when reserved:host identity has multiple IPs #28332

Merged

squeed added affects/v1.13 This issue affects v1.13 branch affects/v1.14 This issue affects v1.14 branch labels Oct 4, 2023

nathanjsweet self-assigned this Oct 4, 2023

squeed closed this as completed in #28332 Oct 5, 2023

squeed mentioned this issue Nov 28, 2023

Better control plane tests for kube-apiserver entities #29432

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

"kube-apiserver" policy entity is broken on dual-stack clusters #28259

"kube-apiserver" policy entity is broken on dual-stack clusters #28259

squeed commented Sep 25, 2023 •

edited by christarazi

squeed commented Sep 25, 2023

squeed commented Sep 25, 2023

squeed commented Oct 4, 2023

"kube-apiserver" policy entity is broken on dual-stack clusters #28259

"kube-apiserver" policy entity is broken on dual-stack clusters #28259

Comments

squeed commented Sep 25, 2023 • edited by christarazi

What happened?

Cilium Version

squeed commented Sep 25, 2023

squeed commented Sep 25, 2023

squeed commented Oct 4, 2023

squeed commented Sep 25, 2023 •

edited by christarazi