-
Notifications
You must be signed in to change notification settings - Fork 2.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
"kube-apiserver" policy entity is broken on dual-stack clusters #28259
Comments
For example, running
(this is a kind cluster, the kube-apiserver is stable) |
We're going to have to break an abstraction here somewhere. The code that watches the Something is going to have to break the abstraction and and pull together any and all ip-level identities that happen to apply to a local-node IP. |
The `reserved:host` identity is special: the numeric identity is fixed and the set of labels is mutable. (The datapath requires this.) So, we need to determine all prefixes that have the `reserved:host` label and capture their labels. Then, we must aggregate *all* labels from all IPs and insert them as the `reserved:host` identity labels. Likewise, we need to always update the SelectorCache; we cannot short-circuit if the ipcache already has that identity. Again, this is needed because the identity is mutable. Without this, when there are multiple IPs with the host label, the identity may flap and the SelectorCache may be missing updates. Fixes: cilium#28259 Signed-off-by: Casey Callendrello <cdc@isovalent.com>
Update: @christarazi determined that this bug was introduced in #19765. Thus, this is present starting from v1.13. The fix, #28332, should be a relatively simple backport. |
The `reserved:host` identity is special: the numeric identity is fixed and the set of labels is mutable. (The datapath requires this.) So, we need to determine all prefixes that have the `reserved:host` label and capture their labels. Then, we must aggregate *all* labels from all IPs and insert them as the `reserved:host` identity labels. However, the code as written has a race condition whenever the local node has more than one IP address. This can happen when, for example vxlan or ipv6 is enabled. The basic sequence is this: 1. Insert IP A as `reserved:host` in to the ipcache. ID 1 now has labels `reserved:host` 2. Insert IP A as `reserved:kube-apiserver` in to the ipcache. ID 1 is updated with labels `reserved:host, reserved:kube-apsierver` 3. Insert IP B as `reserved:host` in to the ipcache. ID 1 is updated with labels `reserved:host`. And now policies that select `reserved:kube-apiserver` are broken Likewise, we need to always update the SelectorCache; we cannot short-circuit if the ipcache already has that identity. Again, this is needed because the identity is mutable. So this bug can take another form: 1. Insert IP A as `reserved:host` in to the ipcache. Because IP A is not known to the ipcache, treat ID 1 as a new identity and update the selector cache 2. Insert IP A as `reserved:kube-apiserver`. Mutate the labels of ID 1. But, because IP A already has ID 1, short-circuit the update to the selector cache (if the Source is the same, which it _may_ be). 3. Now the selector cache has incorrect labels for ID 1. Without this, when there are multiple IPs with the host label, the identity may flap and the SelectorCache may be missing updates. Fixes: cilium#28259 Fixes: e0d403a Fixes: 308c142 Signed-off-by: Casey Callendrello <cdc@isovalent.com>
The `reserved:host` identity is special: the numeric identity is fixed and the set of labels is mutable. (The datapath requires this.) So, we need to determine all prefixes that have the `reserved:host` label and capture their labels. Then, we must aggregate *all* labels from all IPs and insert them as the `reserved:host` identity labels. However, the code as written has a race condition whenever the local node has more than one IP address. This can happen when, for example vxlan or ipv6 is enabled. The basic sequence is this: 1. Insert IP A as `reserved:host` in to the ipcache. ID 1 now has labels `reserved:host` 2. Insert IP A as `reserved:kube-apiserver` in to the ipcache. ID 1 is updated with labels `reserved:host, reserved:kube-apsierver` 3. Insert IP B as `reserved:host` in to the ipcache. ID 1 is updated with labels `reserved:host`. And now policies that select `reserved:kube-apiserver` are broken Likewise, we need to always update the SelectorCache; we cannot short-circuit if the ipcache already has that identity. Again, this is needed because the identity is mutable. So this bug can take another form: 1. Insert IP A as `reserved:host` in to the ipcache. Because IP A is not known to the ipcache, treat ID 1 as a new identity and update the selector cache 2. Insert IP A as `reserved:kube-apiserver`. Mutate the labels of ID 1. But, because IP A already has ID 1, short-circuit the update to the selector cache (if the Source is the same, which it _may_ be). 3. Now the selector cache has incorrect labels for ID 1. Without this, when there are multiple IPs with the host label, the identity may flap and the SelectorCache may be missing updates. Fixes: #28259 Fixes: e0d403a Fixes: 308c142 Signed-off-by: Casey Callendrello <cdc@isovalent.com>
[ upstream commit d50a525 ] [ backporter's notes: - oldID was renamed to id, adapted to that - v1.13 already mocked out the selector cache, updated that ] The `reserved:host` identity is special: the numeric identity is fixed and the set of labels is mutable. (The datapath requires this.) So, we need to determine all prefixes that have the `reserved:host` label and capture their labels. Then, we must aggregate *all* labels from all IPs and insert them as the `reserved:host` identity labels. However, the code as written has a race condition whenever the local node has more than one IP address. This can happen when, for example vxlan or ipv6 is enabled. The basic sequence is this: 1. Insert IP A as `reserved:host` in to the ipcache. ID 1 now has labels `reserved:host` 2. Insert IP A as `reserved:kube-apiserver` in to the ipcache. ID 1 is updated with labels `reserved:host, reserved:kube-apsierver` 3. Insert IP B as `reserved:host` in to the ipcache. ID 1 is updated with labels `reserved:host`. And now policies that select `reserved:kube-apiserver` are broken Likewise, we need to always update the SelectorCache; we cannot short-circuit if the ipcache already has that identity. Again, this is needed because the identity is mutable. So this bug can take another form: 1. Insert IP A as `reserved:host` in to the ipcache. Because IP A is not known to the ipcache, treat ID 1 as a new identity and update the selector cache 2. Insert IP A as `reserved:kube-apiserver`. Mutate the labels of ID 1. But, because IP A already has ID 1, short-circuit the update to the selector cache (if the Source is the same, which it _may_ be). 3. Now the selector cache has incorrect labels for ID 1. Without this, when there are multiple IPs with the host label, the identity may flap and the SelectorCache may be missing updates. Fixes: cilium#28259 Fixes: e0d403a Fixes: 308c142 Signed-off-by: Casey Callendrello <cdc@isovalent.com>
[ upstream commit d50a525 ] [ Backporter's notes: minor renames, otherwise a clean backport ] The `reserved:host` identity is special: the numeric identity is fixed and the set of labels is mutable. (The datapath requires this.) So, we need to determine all prefixes that have the `reserved:host` label and capture their labels. Then, we must aggregate *all* labels from all IPs and insert them as the `reserved:host` identity labels. However, the code as written has a race condition whenever the local node has more than one IP address. This can happen when, for example vxlan or ipv6 is enabled. The basic sequence is this: 1. Insert IP A as `reserved:host` in to the ipcache. ID 1 now has labels `reserved:host` 2. Insert IP A as `reserved:kube-apiserver` in to the ipcache. ID 1 is updated with labels `reserved:host, reserved:kube-apsierver` 3. Insert IP B as `reserved:host` in to the ipcache. ID 1 is updated with labels `reserved:host`. And now policies that select `reserved:kube-apiserver` are broken Likewise, we need to always update the SelectorCache; we cannot short-circuit if the ipcache already has that identity. Again, this is needed because the identity is mutable. So this bug can take another form: 1. Insert IP A as `reserved:host` in to the ipcache. Because IP A is not known to the ipcache, treat ID 1 as a new identity and update the selector cache 2. Insert IP A as `reserved:kube-apiserver`. Mutate the labels of ID 1. But, because IP A already has ID 1, short-circuit the update to the selector cache (if the Source is the same, which it _may_ be). 3. Now the selector cache has incorrect labels for ID 1. Without this, when there are multiple IPs with the host label, the identity may flap and the SelectorCache may be missing updates. Fixes: cilium#28259 Fixes: e0d403a Fixes: 308c142 Signed-off-by: Casey Callendrello <cdc@isovalent.com>
[ upstream commit d50a525 ] [ backporter's notes: - oldID was renamed to id, adapted to that - v1.13 already mocked out the selector cache, updated that ] The `reserved:host` identity is special: the numeric identity is fixed and the set of labels is mutable. (The datapath requires this.) So, we need to determine all prefixes that have the `reserved:host` label and capture their labels. Then, we must aggregate *all* labels from all IPs and insert them as the `reserved:host` identity labels. However, the code as written has a race condition whenever the local node has more than one IP address. This can happen when, for example vxlan or ipv6 is enabled. The basic sequence is this: 1. Insert IP A as `reserved:host` in to the ipcache. ID 1 now has labels `reserved:host` 2. Insert IP A as `reserved:kube-apiserver` in to the ipcache. ID 1 is updated with labels `reserved:host, reserved:kube-apsierver` 3. Insert IP B as `reserved:host` in to the ipcache. ID 1 is updated with labels `reserved:host`. And now policies that select `reserved:kube-apiserver` are broken Likewise, we need to always update the SelectorCache; we cannot short-circuit if the ipcache already has that identity. Again, this is needed because the identity is mutable. So this bug can take another form: 1. Insert IP A as `reserved:host` in to the ipcache. Because IP A is not known to the ipcache, treat ID 1 as a new identity and update the selector cache 2. Insert IP A as `reserved:kube-apiserver`. Mutate the labels of ID 1. But, because IP A already has ID 1, short-circuit the update to the selector cache (if the Source is the same, which it _may_ be). 3. Now the selector cache has incorrect labels for ID 1. Without this, when there are multiple IPs with the host label, the identity may flap and the SelectorCache may be missing updates. Fixes: #28259 Fixes: e0d403a Fixes: 308c142 Signed-off-by: Casey Callendrello <cdc@isovalent.com>
[ upstream commit d50a525 ] [ Backporter's notes: minor renames, otherwise a clean backport ] The `reserved:host` identity is special: the numeric identity is fixed and the set of labels is mutable. (The datapath requires this.) So, we need to determine all prefixes that have the `reserved:host` label and capture their labels. Then, we must aggregate *all* labels from all IPs and insert them as the `reserved:host` identity labels. However, the code as written has a race condition whenever the local node has more than one IP address. This can happen when, for example vxlan or ipv6 is enabled. The basic sequence is this: 1. Insert IP A as `reserved:host` in to the ipcache. ID 1 now has labels `reserved:host` 2. Insert IP A as `reserved:kube-apiserver` in to the ipcache. ID 1 is updated with labels `reserved:host, reserved:kube-apsierver` 3. Insert IP B as `reserved:host` in to the ipcache. ID 1 is updated with labels `reserved:host`. And now policies that select `reserved:kube-apiserver` are broken Likewise, we need to always update the SelectorCache; we cannot short-circuit if the ipcache already has that identity. Again, this is needed because the identity is mutable. So this bug can take another form: 1. Insert IP A as `reserved:host` in to the ipcache. Because IP A is not known to the ipcache, treat ID 1 as a new identity and update the selector cache 2. Insert IP A as `reserved:kube-apiserver`. Mutate the labels of ID 1. But, because IP A already has ID 1, short-circuit the update to the selector cache (if the Source is the same, which it _may_ be). 3. Now the selector cache has incorrect labels for ID 1. Without this, when there are multiple IPs with the host label, the identity may flap and the SelectorCache may be missing updates. Fixes: #28259 Fixes: e0d403a Fixes: 308c142 Signed-off-by: Casey Callendrello <cdc@isovalent.com>
[ upstream commit d50a525 ] [ Backporter's notes: minor renames, otherwise a clean backport ] The `reserved:host` identity is special: the numeric identity is fixed and the set of labels is mutable. (The datapath requires this.) So, we need to determine all prefixes that have the `reserved:host` label and capture their labels. Then, we must aggregate *all* labels from all IPs and insert them as the `reserved:host` identity labels. However, the code as written has a race condition whenever the local node has more than one IP address. This can happen when, for example vxlan or ipv6 is enabled. The basic sequence is this: 1. Insert IP A as `reserved:host` in to the ipcache. ID 1 now has labels `reserved:host` 2. Insert IP A as `reserved:kube-apiserver` in to the ipcache. ID 1 is updated with labels `reserved:host, reserved:kube-apsierver` 3. Insert IP B as `reserved:host` in to the ipcache. ID 1 is updated with labels `reserved:host`. And now policies that select `reserved:kube-apiserver` are broken Likewise, we need to always update the SelectorCache; we cannot short-circuit if the ipcache already has that identity. Again, this is needed because the identity is mutable. So this bug can take another form: 1. Insert IP A as `reserved:host` in to the ipcache. Because IP A is not known to the ipcache, treat ID 1 as a new identity and update the selector cache 2. Insert IP A as `reserved:kube-apiserver`. Mutate the labels of ID 1. But, because IP A already has ID 1, short-circuit the update to the selector cache (if the Source is the same, which it _may_ be). 3. Now the selector cache has incorrect labels for ID 1. Without this, when there are multiple IPs with the host label, the identity may flap and the SelectorCache may be missing updates. Fixes: cilium#28259 Fixes: e0d403a Fixes: 308c142 Signed-off-by: Casey Callendrello <cdc@isovalent.com> Signed-off-by: Jussi Maki <jussi@isovalent.com>
The `reserved:host` identity is special: the numeric identity is fixed and the set of labels is mutable. (The datapath requires this.) So, we need to determine all prefixes that have the `reserved:host` label and capture their labels. Then, we must aggregate *all* labels from all IPs and insert them as the `reserved:host` identity labels. However, the code as written has a race condition whenever the local node has more than one IP address. This can happen when, for example vxlan or ipv6 is enabled. The basic sequence is this: 1. Insert IP A as `reserved:host` in to the ipcache. ID 1 now has labels `reserved:host` 2. Insert IP A as `reserved:kube-apiserver` in to the ipcache. ID 1 is updated with labels `reserved:host, reserved:kube-apsierver` 3. Insert IP B as `reserved:host` in to the ipcache. ID 1 is updated with labels `reserved:host`. And now policies that select `reserved:kube-apiserver` are broken Likewise, we need to always update the SelectorCache; we cannot short-circuit if the ipcache already has that identity. Again, this is needed because the identity is mutable. So this bug can take another form: 1. Insert IP A as `reserved:host` in to the ipcache. Because IP A is not known to the ipcache, treat ID 1 as a new identity and update the selector cache 2. Insert IP A as `reserved:kube-apiserver`. Mutate the labels of ID 1. But, because IP A already has ID 1, short-circuit the update to the selector cache (if the Source is the same, which it _may_ be). 3. Now the selector cache has incorrect labels for ID 1. Without this, when there are multiple IPs with the host label, the identity may flap and the SelectorCache may be missing updates. Fixes: cilium#28259 Fixes: e0d403a Fixes: 308c142 Signed-off-by: Casey Callendrello <cdc@isovalent.com>
[ upstream commit d50a525 ] [ backporter's notes: - oldID was renamed to id, adapted to that - v1.13 already mocked out the selector cache, updated that ] The `reserved:host` identity is special: the numeric identity is fixed and the set of labels is mutable. (The datapath requires this.) So, we need to determine all prefixes that have the `reserved:host` label and capture their labels. Then, we must aggregate *all* labels from all IPs and insert them as the `reserved:host` identity labels. However, the code as written has a race condition whenever the local node has more than one IP address. This can happen when, for example vxlan or ipv6 is enabled. The basic sequence is this: 1. Insert IP A as `reserved:host` in to the ipcache. ID 1 now has labels `reserved:host` 2. Insert IP A as `reserved:kube-apiserver` in to the ipcache. ID 1 is updated with labels `reserved:host, reserved:kube-apsierver` 3. Insert IP B as `reserved:host` in to the ipcache. ID 1 is updated with labels `reserved:host`. And now policies that select `reserved:kube-apiserver` are broken Likewise, we need to always update the SelectorCache; we cannot short-circuit if the ipcache already has that identity. Again, this is needed because the identity is mutable. So this bug can take another form: 1. Insert IP A as `reserved:host` in to the ipcache. Because IP A is not known to the ipcache, treat ID 1 as a new identity and update the selector cache 2. Insert IP A as `reserved:kube-apiserver`. Mutate the labels of ID 1. But, because IP A already has ID 1, short-circuit the update to the selector cache (if the Source is the same, which it _may_ be). 3. Now the selector cache has incorrect labels for ID 1. Without this, when there are multiple IPs with the host label, the identity may flap and the SelectorCache may be missing updates. Fixes: #28259 Fixes: e0d403a Fixes: 308c142 Signed-off-by: Casey Callendrello <cdc@isovalent.com>
What happened?
A bug happened!
The mechanism that applies the
kube-apiserver
label can cause flaps when the IP in question belongs to the local node. This only applies to clusters where nodes have ipv6 addresses. (Note that whether or not Cilium is in dual-stack mode is irrelevant).The root cause is somewhat complicated due to the number of special cases:
default/kubernetes
service is always single-stack, even for dual-stack clustersSo when there are multiple IPs for the
reserved:node
identity, but their set of labels is not uniform, it is last-write-wins. Whichever IP is most recently triggered will have those labels applied to thereserved:node
identity.Cilium Version
v1.14 at least, perhaps older.
Edit: See #28259 (comment).
The text was updated successfully, but these errors were encountered: