Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"kube-apiserver" policy entity is broken on dual-stack clusters #28259

Closed
squeed opened this issue Sep 25, 2023 · 3 comments · Fixed by #28332
Closed

"kube-apiserver" policy entity is broken on dual-stack clusters #28259

squeed opened this issue Sep 25, 2023 · 3 comments · Fixed by #28332
Assignees
Labels
affects/v1.13 This issue affects v1.13 branch affects/v1.14 This issue affects v1.14 branch kind/bug This is a bug in the Cilium logic. kind/community-report This was reported by a user in the Cilium community, eg via Slack. needs/triage This issue requires triaging to establish severity and next steps. sig/agent Cilium agent related. sig/policy Impacts whether traffic is allowed or denied based on user-defined policies.

Comments

@squeed
Copy link
Contributor

squeed commented Sep 25, 2023

What happened?

A bug happened!

The mechanism that applies the kube-apiserver label can cause flaps when the IP in question belongs to the local node. This only applies to clusters where nodes have ipv6 addresses. (Note that whether or not Cilium is in dual-stack mode is irrelevant).

The root cause is somewhat complicated due to the number of special cases:

  1. the ipcache "back-propagates" labels up to a global value only for the local identity.
  2. the default/kubernetes service is always single-stack, even for dual-stack clusters
  3. the node-manager doesn't understand the kube-apsierver identity; it is applied purely to IPs in the ipcache metadata layer.

So when there are multiple IPs for the reserved:node identity, but their set of labels is not uniform, it is last-write-wins. Whichever IP is most recently triggered will have those labels applied to the reserved:node identity.

Cilium Version

v1.14 at least, perhaps older.

Edit: See #28259 (comment).

@squeed squeed added kind/bug This is a bug in the Cilium logic. sig/policy Impacts whether traffic is allowed or denied based on user-defined policies. sig/agent Cilium agent related. labels Sep 25, 2023
@squeed
Copy link
Contributor Author

squeed commented Sep 25, 2023

For example, running cilium identity list in short succession in the same pod results in different values:

root@kind-control-plane:/home/cilium# cilium identity list
ID      LABELS
1       reserved:host
...
root@kind-control-plane:/home/cilium# cilium identity list
ID      LABELS
1       reserved:host
        reserved:kube-apiserver

(this is a kind cluster, the kube-apiserver is stable)

@squeed
Copy link
Contributor Author

squeed commented Sep 25, 2023

We're going to have to break an abstraction here somewhere. The code that watches the kubernetes/default service only updates the ip->label mappings in the ipcache. The ipcache doesn't really understand identities, but it does update the labels of reserved identity 1. The node manager only knows about node-level labels.

Something is going to have to break the abstraction and and pull together any and all ip-level identities that happen to apply to a local-node IP.

@aanm aanm added needs/triage This issue requires triaging to establish severity and next steps. kind/community-report This was reported by a user in the Cilium community, eg via Slack. labels Sep 26, 2023
squeed added a commit to squeed/cilium that referenced this issue Sep 29, 2023
The `reserved:host` identity is special: the numeric identity is fixed
and the set of labels is mutable. (The datapath requires this.) So,
we need to determine all prefixes that have the `reserved:host` label and
capture their labels. Then, we must aggregate *all* labels from all IPs and
insert them as the `reserved:host` identity labels.

Likewise, we need to always update the SelectorCache; we cannot
short-circuit if the ipcache already has that identity. Again, this is
needed because the identity is mutable.

Without this, when there are multiple IPs with the host label, the
identity may flap and the SelectorCache may be missing updates.

Fixes: cilium#28259

Signed-off-by: Casey Callendrello <cdc@isovalent.com>
@squeed squeed added affects/v1.13 This issue affects v1.13 branch affects/v1.14 This issue affects v1.14 branch labels Oct 4, 2023
@squeed
Copy link
Contributor Author

squeed commented Oct 4, 2023

Update: @christarazi determined that this bug was introduced in #19765. Thus, this is present starting from v1.13.

The fix, #28332, should be a relatively simple backport.

@nathanjsweet nathanjsweet self-assigned this Oct 4, 2023
squeed added a commit to squeed/cilium that referenced this issue Oct 5, 2023
The `reserved:host` identity is special: the numeric identity is fixed and the
set of labels is mutable. (The datapath requires this.) So, we need to
determine all prefixes that have the `reserved:host` label and capture their
labels. Then, we must aggregate *all* labels from all IPs and insert them as
the `reserved:host` identity labels.

However, the code as written has a race condition whenever the local node has
more than one IP address. This can happen when, for example vxlan or ipv6 is
enabled. The basic sequence is this:

1. Insert IP A as `reserved:host` in to the ipcache. ID 1 now has labels
  `reserved:host`
2. Insert IP A as `reserved:kube-apiserver` in to the ipcache. ID 1 is updated
  with labels `reserved:host, reserved:kube-apsierver`
3. Insert IP B as `reserved:host` in to the ipcache. ID 1 is updated with
  labels `reserved:host`. And now policies that select
  `reserved:kube-apiserver` are broken

Likewise, we need to always update the SelectorCache; we cannot short-circuit
if the ipcache already has that identity. Again, this is needed because the
identity is mutable. So this bug can take another form:

1. Insert IP A as `reserved:host` in to the ipcache. Because IP A is not known
to the ipcache, treat ID 1 as a new identity and update the selector cache
2. Insert IP A as `reserved:kube-apiserver`. Mutate the labels of ID 1. But,
because IP A already has ID 1, short-circuit the update to the selector cache
(if the Source is the same, which it _may_ be).
3. Now the selector cache has incorrect labels for ID 1.

Without this, when there are multiple IPs with the host label, the identity may
flap and the SelectorCache may be missing updates.

Fixes: cilium#28259
Fixes: e0d403a
Fixes: 308c142

Signed-off-by: Casey Callendrello <cdc@isovalent.com>
squeed added a commit that referenced this issue Oct 5, 2023
The `reserved:host` identity is special: the numeric identity is fixed and the
set of labels is mutable. (The datapath requires this.) So, we need to
determine all prefixes that have the `reserved:host` label and capture their
labels. Then, we must aggregate *all* labels from all IPs and insert them as
the `reserved:host` identity labels.

However, the code as written has a race condition whenever the local node has
more than one IP address. This can happen when, for example vxlan or ipv6 is
enabled. The basic sequence is this:

1. Insert IP A as `reserved:host` in to the ipcache. ID 1 now has labels
  `reserved:host`
2. Insert IP A as `reserved:kube-apiserver` in to the ipcache. ID 1 is updated
  with labels `reserved:host, reserved:kube-apsierver`
3. Insert IP B as `reserved:host` in to the ipcache. ID 1 is updated with
  labels `reserved:host`. And now policies that select
  `reserved:kube-apiserver` are broken

Likewise, we need to always update the SelectorCache; we cannot short-circuit
if the ipcache already has that identity. Again, this is needed because the
identity is mutable. So this bug can take another form:

1. Insert IP A as `reserved:host` in to the ipcache. Because IP A is not known
to the ipcache, treat ID 1 as a new identity and update the selector cache
2. Insert IP A as `reserved:kube-apiserver`. Mutate the labels of ID 1. But,
because IP A already has ID 1, short-circuit the update to the selector cache
(if the Source is the same, which it _may_ be).
3. Now the selector cache has incorrect labels for ID 1.

Without this, when there are multiple IPs with the host label, the identity may
flap and the SelectorCache may be missing updates.

Fixes: #28259
Fixes: e0d403a
Fixes: 308c142

Signed-off-by: Casey Callendrello <cdc@isovalent.com>
squeed added a commit to squeed/cilium that referenced this issue Oct 5, 2023
[ upstream commit d50a525 ]

[ backporter's notes:
    - oldID was renamed to id, adapted to that
    - v1.13 already mocked out the selector cache, updated that
]

The `reserved:host` identity is special: the numeric identity is fixed and the
set of labels is mutable. (The datapath requires this.) So, we need to
determine all prefixes that have the `reserved:host` label and capture their
labels. Then, we must aggregate *all* labels from all IPs and insert them as
the `reserved:host` identity labels.

However, the code as written has a race condition whenever the local node has
more than one IP address. This can happen when, for example vxlan or ipv6 is
enabled. The basic sequence is this:

1. Insert IP A as `reserved:host` in to the ipcache. ID 1 now has labels
  `reserved:host`
2. Insert IP A as `reserved:kube-apiserver` in to the ipcache. ID 1 is updated
  with labels `reserved:host, reserved:kube-apsierver`
3. Insert IP B as `reserved:host` in to the ipcache. ID 1 is updated with
  labels `reserved:host`. And now policies that select
  `reserved:kube-apiserver` are broken

Likewise, we need to always update the SelectorCache; we cannot short-circuit
if the ipcache already has that identity. Again, this is needed because the
identity is mutable. So this bug can take another form:

1. Insert IP A as `reserved:host` in to the ipcache. Because IP A is not known
to the ipcache, treat ID 1 as a new identity and update the selector cache
2. Insert IP A as `reserved:kube-apiserver`. Mutate the labels of ID 1. But,
because IP A already has ID 1, short-circuit the update to the selector cache
(if the Source is the same, which it _may_ be).
3. Now the selector cache has incorrect labels for ID 1.

Without this, when there are multiple IPs with the host label, the identity may
flap and the SelectorCache may be missing updates.

Fixes: cilium#28259
Fixes: e0d403a
Fixes: 308c142

Signed-off-by: Casey Callendrello <cdc@isovalent.com>
squeed added a commit to squeed/cilium that referenced this issue Oct 5, 2023
[ upstream commit d50a525 ]

[ Backporter's notes: minor renames, otherwise a clean backport ]

The `reserved:host` identity is special: the numeric identity is fixed and the
set of labels is mutable. (The datapath requires this.) So, we need to
determine all prefixes that have the `reserved:host` label and capture their
labels. Then, we must aggregate *all* labels from all IPs and insert them as
the `reserved:host` identity labels.

However, the code as written has a race condition whenever the local node has
more than one IP address. This can happen when, for example vxlan or ipv6 is
enabled. The basic sequence is this:

1. Insert IP A as `reserved:host` in to the ipcache. ID 1 now has labels
  `reserved:host`
2. Insert IP A as `reserved:kube-apiserver` in to the ipcache. ID 1 is updated
  with labels `reserved:host, reserved:kube-apsierver`
3. Insert IP B as `reserved:host` in to the ipcache. ID 1 is updated with
  labels `reserved:host`. And now policies that select
  `reserved:kube-apiserver` are broken

Likewise, we need to always update the SelectorCache; we cannot short-circuit
if the ipcache already has that identity. Again, this is needed because the
identity is mutable. So this bug can take another form:

1. Insert IP A as `reserved:host` in to the ipcache. Because IP A is not known
to the ipcache, treat ID 1 as a new identity and update the selector cache
2. Insert IP A as `reserved:kube-apiserver`. Mutate the labels of ID 1. But,
because IP A already has ID 1, short-circuit the update to the selector cache
(if the Source is the same, which it _may_ be).
3. Now the selector cache has incorrect labels for ID 1.

Without this, when there are multiple IPs with the host label, the identity may
flap and the SelectorCache may be missing updates.

Fixes: cilium#28259
Fixes: e0d403a
Fixes: 308c142

Signed-off-by: Casey Callendrello <cdc@isovalent.com>
christarazi pushed a commit that referenced this issue Oct 5, 2023
[ upstream commit d50a525 ]

[ backporter's notes:
    - oldID was renamed to id, adapted to that
    - v1.13 already mocked out the selector cache, updated that
]

The `reserved:host` identity is special: the numeric identity is fixed and the
set of labels is mutable. (The datapath requires this.) So, we need to
determine all prefixes that have the `reserved:host` label and capture their
labels. Then, we must aggregate *all* labels from all IPs and insert them as
the `reserved:host` identity labels.

However, the code as written has a race condition whenever the local node has
more than one IP address. This can happen when, for example vxlan or ipv6 is
enabled. The basic sequence is this:

1. Insert IP A as `reserved:host` in to the ipcache. ID 1 now has labels
  `reserved:host`
2. Insert IP A as `reserved:kube-apiserver` in to the ipcache. ID 1 is updated
  with labels `reserved:host, reserved:kube-apsierver`
3. Insert IP B as `reserved:host` in to the ipcache. ID 1 is updated with
  labels `reserved:host`. And now policies that select
  `reserved:kube-apiserver` are broken

Likewise, we need to always update the SelectorCache; we cannot short-circuit
if the ipcache already has that identity. Again, this is needed because the
identity is mutable. So this bug can take another form:

1. Insert IP A as `reserved:host` in to the ipcache. Because IP A is not known
to the ipcache, treat ID 1 as a new identity and update the selector cache
2. Insert IP A as `reserved:kube-apiserver`. Mutate the labels of ID 1. But,
because IP A already has ID 1, short-circuit the update to the selector cache
(if the Source is the same, which it _may_ be).
3. Now the selector cache has incorrect labels for ID 1.

Without this, when there are multiple IPs with the host label, the identity may
flap and the SelectorCache may be missing updates.

Fixes: #28259
Fixes: e0d403a
Fixes: 308c142

Signed-off-by: Casey Callendrello <cdc@isovalent.com>
ti-mo pushed a commit that referenced this issue Oct 6, 2023
[ upstream commit d50a525 ]

[ Backporter's notes: minor renames, otherwise a clean backport ]

The `reserved:host` identity is special: the numeric identity is fixed and the
set of labels is mutable. (The datapath requires this.) So, we need to
determine all prefixes that have the `reserved:host` label and capture their
labels. Then, we must aggregate *all* labels from all IPs and insert them as
the `reserved:host` identity labels.

However, the code as written has a race condition whenever the local node has
more than one IP address. This can happen when, for example vxlan or ipv6 is
enabled. The basic sequence is this:

1. Insert IP A as `reserved:host` in to the ipcache. ID 1 now has labels
  `reserved:host`
2. Insert IP A as `reserved:kube-apiserver` in to the ipcache. ID 1 is updated
  with labels `reserved:host, reserved:kube-apsierver`
3. Insert IP B as `reserved:host` in to the ipcache. ID 1 is updated with
  labels `reserved:host`. And now policies that select
  `reserved:kube-apiserver` are broken

Likewise, we need to always update the SelectorCache; we cannot short-circuit
if the ipcache already has that identity. Again, this is needed because the
identity is mutable. So this bug can take another form:

1. Insert IP A as `reserved:host` in to the ipcache. Because IP A is not known
to the ipcache, treat ID 1 as a new identity and update the selector cache
2. Insert IP A as `reserved:kube-apiserver`. Mutate the labels of ID 1. But,
because IP A already has ID 1, short-circuit the update to the selector cache
(if the Source is the same, which it _may_ be).
3. Now the selector cache has incorrect labels for ID 1.

Without this, when there are multiple IPs with the host label, the identity may
flap and the SelectorCache may be missing updates.

Fixes: #28259
Fixes: e0d403a
Fixes: 308c142

Signed-off-by: Casey Callendrello <cdc@isovalent.com>
gandro pushed a commit to gandro/cilium that referenced this issue Oct 19, 2023
[ upstream commit d50a525 ]

[ Backporter's notes: minor renames, otherwise a clean backport ]

The `reserved:host` identity is special: the numeric identity is fixed and the
set of labels is mutable. (The datapath requires this.) So, we need to
determine all prefixes that have the `reserved:host` label and capture their
labels. Then, we must aggregate *all* labels from all IPs and insert them as
the `reserved:host` identity labels.

However, the code as written has a race condition whenever the local node has
more than one IP address. This can happen when, for example vxlan or ipv6 is
enabled. The basic sequence is this:

1. Insert IP A as `reserved:host` in to the ipcache. ID 1 now has labels
  `reserved:host`
2. Insert IP A as `reserved:kube-apiserver` in to the ipcache. ID 1 is updated
  with labels `reserved:host, reserved:kube-apsierver`
3. Insert IP B as `reserved:host` in to the ipcache. ID 1 is updated with
  labels `reserved:host`. And now policies that select
  `reserved:kube-apiserver` are broken

Likewise, we need to always update the SelectorCache; we cannot short-circuit
if the ipcache already has that identity. Again, this is needed because the
identity is mutable. So this bug can take another form:

1. Insert IP A as `reserved:host` in to the ipcache. Because IP A is not known
to the ipcache, treat ID 1 as a new identity and update the selector cache
2. Insert IP A as `reserved:kube-apiserver`. Mutate the labels of ID 1. But,
because IP A already has ID 1, short-circuit the update to the selector cache
(if the Source is the same, which it _may_ be).
3. Now the selector cache has incorrect labels for ID 1.

Without this, when there are multiple IPs with the host label, the identity may
flap and the SelectorCache may be missing updates.

Fixes: cilium#28259
Fixes: e0d403a
Fixes: 308c142

Signed-off-by: Casey Callendrello <cdc@isovalent.com>
Signed-off-by: Jussi Maki <jussi@isovalent.com>
gandro pushed a commit to gandro/cilium that referenced this issue Dec 7, 2023
The `reserved:host` identity is special: the numeric identity is fixed and the
set of labels is mutable. (The datapath requires this.) So, we need to
determine all prefixes that have the `reserved:host` label and capture their
labels. Then, we must aggregate *all* labels from all IPs and insert them as
the `reserved:host` identity labels.

However, the code as written has a race condition whenever the local node has
more than one IP address. This can happen when, for example vxlan or ipv6 is
enabled. The basic sequence is this:

1. Insert IP A as `reserved:host` in to the ipcache. ID 1 now has labels
  `reserved:host`
2. Insert IP A as `reserved:kube-apiserver` in to the ipcache. ID 1 is updated
  with labels `reserved:host, reserved:kube-apsierver`
3. Insert IP B as `reserved:host` in to the ipcache. ID 1 is updated with
  labels `reserved:host`. And now policies that select
  `reserved:kube-apiserver` are broken

Likewise, we need to always update the SelectorCache; we cannot short-circuit
if the ipcache already has that identity. Again, this is needed because the
identity is mutable. So this bug can take another form:

1. Insert IP A as `reserved:host` in to the ipcache. Because IP A is not known
to the ipcache, treat ID 1 as a new identity and update the selector cache
2. Insert IP A as `reserved:kube-apiserver`. Mutate the labels of ID 1. But,
because IP A already has ID 1, short-circuit the update to the selector cache
(if the Source is the same, which it _may_ be).
3. Now the selector cache has incorrect labels for ID 1.

Without this, when there are multiple IPs with the host label, the identity may
flap and the SelectorCache may be missing updates.

Fixes: cilium#28259
Fixes: e0d403a
Fixes: 308c142

Signed-off-by: Casey Callendrello <cdc@isovalent.com>
pchaigno pushed a commit that referenced this issue Jan 8, 2024
[ upstream commit d50a525 ]

[ backporter's notes:
    - oldID was renamed to id, adapted to that
    - v1.13 already mocked out the selector cache, updated that
]

The `reserved:host` identity is special: the numeric identity is fixed and the
set of labels is mutable. (The datapath requires this.) So, we need to
determine all prefixes that have the `reserved:host` label and capture their
labels. Then, we must aggregate *all* labels from all IPs and insert them as
the `reserved:host` identity labels.

However, the code as written has a race condition whenever the local node has
more than one IP address. This can happen when, for example vxlan or ipv6 is
enabled. The basic sequence is this:

1. Insert IP A as `reserved:host` in to the ipcache. ID 1 now has labels
  `reserved:host`
2. Insert IP A as `reserved:kube-apiserver` in to the ipcache. ID 1 is updated
  with labels `reserved:host, reserved:kube-apsierver`
3. Insert IP B as `reserved:host` in to the ipcache. ID 1 is updated with
  labels `reserved:host`. And now policies that select
  `reserved:kube-apiserver` are broken

Likewise, we need to always update the SelectorCache; we cannot short-circuit
if the ipcache already has that identity. Again, this is needed because the
identity is mutable. So this bug can take another form:

1. Insert IP A as `reserved:host` in to the ipcache. Because IP A is not known
to the ipcache, treat ID 1 as a new identity and update the selector cache
2. Insert IP A as `reserved:kube-apiserver`. Mutate the labels of ID 1. But,
because IP A already has ID 1, short-circuit the update to the selector cache
(if the Source is the same, which it _may_ be).
3. Now the selector cache has incorrect labels for ID 1.

Without this, when there are multiple IPs with the host label, the identity may
flap and the SelectorCache may be missing updates.

Fixes: #28259
Fixes: e0d403a
Fixes: 308c142

Signed-off-by: Casey Callendrello <cdc@isovalent.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
affects/v1.13 This issue affects v1.13 branch affects/v1.14 This issue affects v1.14 branch kind/bug This is a bug in the Cilium logic. kind/community-report This was reported by a user in the Cilium community, eg via Slack. needs/triage This issue requires triaging to establish severity and next steps. sig/agent Cilium agent related. sig/policy Impacts whether traffic is allowed or denied based on user-defined policies.
Projects
None yet
3 participants