Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[v1.11] Backport #18150 #18390

Merged
merged 8 commits into from
Jan 7, 2022

Conversation

christarazi
Copy link
Member

Once this PR is merged, you can update the PR labels via:

$ for pr in 18150; do contrib/backporting/set-labels.py $pr done 1.11; done

christarazi and others added 8 commits January 5, 2022 23:13
[ upstream commit 92d0805 ]

Clarify that the label injection is also triggered from the node
discovery package as well.

Fixes: 2b17d4d ("ipcache, policy: Inject labels from identity
metadata")

Signed-off-by: Chris Tarazi <chris@isovalent.com>
[ upstream commit 60d90f7 ]

Previously, the label injection logic was using the source passed into
InjectLabels(), and on top of that, modifying it instead of taking a
copy. This is important because if we modify the source to upgrade it to
KubeAPIServer source (highest priority), then all subsequent entries
into the ipcache though the IDMD (from the same InjectLabels()
invocation) will have KubeAPIServer source! This can cause entries to
never be modifiable again on a subsequent invocation of InjectLabels().

Here's an example error from the ipcache-inject-labels controller:

```
  ipcache-inject-labels                         37m4s ago      14s ago      63      failed to inject labels into ipcache: failed to upsert 10.24.1.96 into ipcache with identity 6: unable to overwrite source "kube-apiserver" with source "custom-resource"
```

In the above error, we are failing to update ID 6! This meant that any
updates to the remote-node-associated IPs would never be updateable
(unless they somehow were made with the KubeAPIServer source).

This commit fixes this by creating a copy of the source for each
iteration through the IDMD instead of sharing the global, as well as
fixing a typo where the source was not used from `toUpsert`.

Fixes: 2b17d4d ("ipcache, policy: Inject labels from identity
metadata")

Signed-off-by: Chris Tarazi <chris@isovalent.com>
[ upstream commit 471e3ef ]

Previously, any IP that mapped to a reserved ID was considered as
to-be-propagated to the ipcache in InjectLabels(). This was meant to
ensure any reserved IDs associated with the kube-apiserver would be
properly accounted for. To be specific with an example, if the
kube-apiserver is running within the cluster and the local host contains
the kube-apiserver, then the local host reserved ID (1) has the
`reserved:host` + `reserved:kube-apiserver` labels. Cilium starts up
with ID 1 only containing the `reserved:host` label, and only adds the
`reserved:kube-apiserver` label once the K8s Endpoint watcher triggers
the InjectLabels() logic. Hence why this condition existed in order to
update reserved ID 1 when the `reserved:kube-apiserver` label was
associated with it.

The problem with the scope of the condition is that it leads to
unnecessary updates to the ipcache. Specifically, any call to
InjectLabels() will cause ipcache update failures to occur for reserved
IDs 1 & 6, after InjectLabels() runs for the very first time. This is
because node discovery will trigger InjectLabels() to insert all known
IPs of the local node with source=Local. After this, the CiliumNode
watcher will trigger InjectLabels() with source=CustomResource which
causes the following failing controller:

```
ipcache-inject-labels    4m15s ago   17s ago 22    failed to inject labels into ipcache: failed to upsert 10.168.15.192 into ipcache with identity 6: unable to overwrite source "local" with source "custom-resource"
```

This commit fixes this by reducing the scope of the condition in which
to propagate updates to the ipcache, to only any IDMD entries which
contain the kube-apiserver label. This will include entries such as the
reserved ID 1.

In addition, ensure that the final set of labels that are attached to
the identity object itself are used when updating the identity selector
cache (UpdateIdentities()), rather than the labels assoicated with the
prefix from the IDMD. If the final set is not used, then a race is
possible where the next call to UpdateIdentities() will revert the
work done in the first call, causing the selectors to lose which
identities they selected.

Fixes: 2b17d4d ("ipcache, policy: Inject labels from identity
metadata")

Signed-off-by: Chris Tarazi <chris@isovalent.com>
[ upstream commit 884990f ]

Previously, there existed logic to handle "transient identities" while
Cilium is bootstrapping and receiving cluster information from EP & CN
watchers, such that it might generate transient identities related to
the kube-apiserver IPs. This logic would release the transient identity
and re-allocate a new one if needed. Given that InjectLabels() already
errors early if the K8s caches are not synced, it is actually not
possible to have transient identities generated from incomplete
information. In reality, by the time InjectLabels() is running, it has
the full set of cluster information needed from EP & CN watchers, so the
identities generated are "stable".

This logic needs to be removed because it was actually causing a bug
reported by a community user on Slack. The symptoms were that on any
update to the CN (CiliumNode) resource, it was triggering the label
injection, which was releasing and re-allocating the CIDR identity for
the kube-apiserver that was running outside of the cluster. This
unnecessary identity churn is problematic as it can cause policy drops
and exhaust all the local identity range.

Additionally, the question of what happens to the CIDR identity
reference counts is actually handled inside InjectLabels() already, when
the identity is neither reserved or new, it releases the identity
(refcount deference). Hence, this commit fixes the identity churn by
removing the extraneous, unneeded logic from injectLabels().

Fixes: 2b17d4d ("ipcache, policy: Inject labels from identity
metadata")

Reported-by: Alexander Block <ablock84@gmail.com>
Signed-off-by: Chris Tarazi <chris@isovalent.com>
[ upstream commit 6d0d411 ]

The previous variable will always be non-empty as the callers from
RemoveLabelsFromIPs() will provide labels to be removed from IPs,
otherwise the call would be useless. This would cause identities to
always be refreshed in the selector cache for no reason.

Fixes: 2b17d4d ("ipcache, policy: Inject labels from identity
metadata")

Signed-off-by: Chris Tarazi <chris@isovalent.com>
[ upstream commit 785b4e1 ]

This will be useful in the next commit where the unit tests will rely on
the changes in this commit to validate refcounts.

Signed-off-by: Chris Tarazi <chris@isovalent.com>
[ upstream commit f1b30c1 ]

This handles the case where there is a kube-apiserver (outside of the
cluster) and CIDR policy selecting the same IP(s) and identity. In this
case, there will be one identity that the policies both select (identity
refcount == 2). When a kube-apiserver changes IP(s), then the previous
IP(s) will no longer be assoicated with the kube-apiserver label inside
the IDMD, which causes the kube-apiserver and ToCIDR policy to now
select two different identities.

One of the identities from above will become a pure CIDR identity
because it will no longer have the kube-apiserver label (due to the IPs
changing) and only have CIDR labels left over. Once that identity is
generated, we need to replace the existing ipcache entry for the
previously-associated IP(s) of the kube-apiserver with the new identity
without the kube-apiserver label (just CIDR labels).

However, in order to "replace" the ipcache entry, we needed to allow
force upserting into the ipcache. This is because the ipcache entry for
the kube-apiserver IPs would be created with src=kube-apiserver (which
is the strongest source). The ipcache entry for the new identity will be
created with src=generated, which is weaker than src=kube-apiserver.

Unfortunately, the alternatives aren't great either. One of them is to
delete the ipcache entry with src=kube-apiserver and then re-add the
CIDR ipcache entry with src=generated. However, this is not atomic and
introduces an undesirable transient state in the datapath. The other
alternative is to address ciliumGH-18301, however that is non-trivial and was
avoided in this commit.

This commit allows for the aforementioned case to be handled and a unit
test was added to validate that the behavior is correct.

Fixes: 2b17d4d ("ipcache, policy: Inject labels from identity
metadata")

Co-authored-by: Joe Stringer <joe@cilium.io>
Signed-off-by: Chris Tarazi <chris@isovalent.com>
[ upstream commit 69e843c ]

Prevously, this code suffered from a TOCTOU bug. The IDMD lock would be
taken by two different calls into the IDMD logic and assumed that the
underlying IDMD would stay in the same state across both calls. However,
that is incorrect because of two separate calls took the IDMD lock, so
it's possible for the IDMD to change in between the two calls, hence
TOCTOU.

This commit fixes this by amending the API of the IDMD logic to expose
one function to perform the work of the previous two functions. Those
two functions are now unexported.

Fixes: 2b17d4d ("ipcache, policy: Inject labels from identity
metadata")

Reported-by: Joe Stringer <joe@cilium.io>
Signed-off-by: Chris Tarazi <chris@isovalent.com>
@christarazi christarazi requested a review from a team as a code owner January 6, 2022 07:14
@christarazi christarazi added backport/1.11 This PR represents a backport for Cilium 1.11.x of a PR that was merged to main. kind/backports This PR provides functionality previously merged into master. labels Jan 6, 2022
@christarazi christarazi changed the title v1.11 backports 2022-01-05 [v1.11] Backport #18150 Jan 6, 2022
@christarazi
Copy link
Member Author

/test-backport-1.11

@pchaigno
Copy link
Member

pchaigno commented Jan 7, 2022

Seems it was a trivial backport without any conflict, so we could have handled this through the usual process.

@pchaigno
Copy link
Member

pchaigno commented Jan 7, 2022

L4LB XDP test is broken but will be fixed by the backport of #18370. Other tests are passing. Merging.

@pchaigno pchaigno merged commit 911f64c into cilium:v1.11 Jan 7, 2022
@christarazi christarazi deleted the pr/v1.11-backport-2022-01-05 branch January 7, 2022 17:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport/1.11 This PR represents a backport for Cilium 1.11.x of a PR that was merged to main. kind/backports This PR provides functionality previously merged into master.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants