-
Notifications
You must be signed in to change notification settings - Fork 2.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[v1.11] Backport #18150 #18390
Merged
pchaigno
merged 8 commits into
cilium:v1.11
from
christarazi:pr/v1.11-backport-2022-01-05
Jan 7, 2022
Merged
[v1.11] Backport #18150 #18390
pchaigno
merged 8 commits into
cilium:v1.11
from
christarazi:pr/v1.11-backport-2022-01-05
Jan 7, 2022
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
[ upstream commit 60d90f7 ] Previously, the label injection logic was using the source passed into InjectLabels(), and on top of that, modifying it instead of taking a copy. This is important because if we modify the source to upgrade it to KubeAPIServer source (highest priority), then all subsequent entries into the ipcache though the IDMD (from the same InjectLabels() invocation) will have KubeAPIServer source! This can cause entries to never be modifiable again on a subsequent invocation of InjectLabels(). Here's an example error from the ipcache-inject-labels controller: ``` ipcache-inject-labels 37m4s ago 14s ago 63 failed to inject labels into ipcache: failed to upsert 10.24.1.96 into ipcache with identity 6: unable to overwrite source "kube-apiserver" with source "custom-resource" ``` In the above error, we are failing to update ID 6! This meant that any updates to the remote-node-associated IPs would never be updateable (unless they somehow were made with the KubeAPIServer source). This commit fixes this by creating a copy of the source for each iteration through the IDMD instead of sharing the global, as well as fixing a typo where the source was not used from `toUpsert`. Fixes: 2b17d4d ("ipcache, policy: Inject labels from identity metadata") Signed-off-by: Chris Tarazi <chris@isovalent.com>
[ upstream commit 471e3ef ] Previously, any IP that mapped to a reserved ID was considered as to-be-propagated to the ipcache in InjectLabels(). This was meant to ensure any reserved IDs associated with the kube-apiserver would be properly accounted for. To be specific with an example, if the kube-apiserver is running within the cluster and the local host contains the kube-apiserver, then the local host reserved ID (1) has the `reserved:host` + `reserved:kube-apiserver` labels. Cilium starts up with ID 1 only containing the `reserved:host` label, and only adds the `reserved:kube-apiserver` label once the K8s Endpoint watcher triggers the InjectLabels() logic. Hence why this condition existed in order to update reserved ID 1 when the `reserved:kube-apiserver` label was associated with it. The problem with the scope of the condition is that it leads to unnecessary updates to the ipcache. Specifically, any call to InjectLabels() will cause ipcache update failures to occur for reserved IDs 1 & 6, after InjectLabels() runs for the very first time. This is because node discovery will trigger InjectLabels() to insert all known IPs of the local node with source=Local. After this, the CiliumNode watcher will trigger InjectLabels() with source=CustomResource which causes the following failing controller: ``` ipcache-inject-labels 4m15s ago 17s ago 22 failed to inject labels into ipcache: failed to upsert 10.168.15.192 into ipcache with identity 6: unable to overwrite source "local" with source "custom-resource" ``` This commit fixes this by reducing the scope of the condition in which to propagate updates to the ipcache, to only any IDMD entries which contain the kube-apiserver label. This will include entries such as the reserved ID 1. In addition, ensure that the final set of labels that are attached to the identity object itself are used when updating the identity selector cache (UpdateIdentities()), rather than the labels assoicated with the prefix from the IDMD. If the final set is not used, then a race is possible where the next call to UpdateIdentities() will revert the work done in the first call, causing the selectors to lose which identities they selected. Fixes: 2b17d4d ("ipcache, policy: Inject labels from identity metadata") Signed-off-by: Chris Tarazi <chris@isovalent.com>
[ upstream commit 884990f ] Previously, there existed logic to handle "transient identities" while Cilium is bootstrapping and receiving cluster information from EP & CN watchers, such that it might generate transient identities related to the kube-apiserver IPs. This logic would release the transient identity and re-allocate a new one if needed. Given that InjectLabels() already errors early if the K8s caches are not synced, it is actually not possible to have transient identities generated from incomplete information. In reality, by the time InjectLabels() is running, it has the full set of cluster information needed from EP & CN watchers, so the identities generated are "stable". This logic needs to be removed because it was actually causing a bug reported by a community user on Slack. The symptoms were that on any update to the CN (CiliumNode) resource, it was triggering the label injection, which was releasing and re-allocating the CIDR identity for the kube-apiserver that was running outside of the cluster. This unnecessary identity churn is problematic as it can cause policy drops and exhaust all the local identity range. Additionally, the question of what happens to the CIDR identity reference counts is actually handled inside InjectLabels() already, when the identity is neither reserved or new, it releases the identity (refcount deference). Hence, this commit fixes the identity churn by removing the extraneous, unneeded logic from injectLabels(). Fixes: 2b17d4d ("ipcache, policy: Inject labels from identity metadata") Reported-by: Alexander Block <ablock84@gmail.com> Signed-off-by: Chris Tarazi <chris@isovalent.com>
[ upstream commit 6d0d411 ] The previous variable will always be non-empty as the callers from RemoveLabelsFromIPs() will provide labels to be removed from IPs, otherwise the call would be useless. This would cause identities to always be refreshed in the selector cache for no reason. Fixes: 2b17d4d ("ipcache, policy: Inject labels from identity metadata") Signed-off-by: Chris Tarazi <chris@isovalent.com>
[ upstream commit 785b4e1 ] This will be useful in the next commit where the unit tests will rely on the changes in this commit to validate refcounts. Signed-off-by: Chris Tarazi <chris@isovalent.com>
[ upstream commit f1b30c1 ] This handles the case where there is a kube-apiserver (outside of the cluster) and CIDR policy selecting the same IP(s) and identity. In this case, there will be one identity that the policies both select (identity refcount == 2). When a kube-apiserver changes IP(s), then the previous IP(s) will no longer be assoicated with the kube-apiserver label inside the IDMD, which causes the kube-apiserver and ToCIDR policy to now select two different identities. One of the identities from above will become a pure CIDR identity because it will no longer have the kube-apiserver label (due to the IPs changing) and only have CIDR labels left over. Once that identity is generated, we need to replace the existing ipcache entry for the previously-associated IP(s) of the kube-apiserver with the new identity without the kube-apiserver label (just CIDR labels). However, in order to "replace" the ipcache entry, we needed to allow force upserting into the ipcache. This is because the ipcache entry for the kube-apiserver IPs would be created with src=kube-apiserver (which is the strongest source). The ipcache entry for the new identity will be created with src=generated, which is weaker than src=kube-apiserver. Unfortunately, the alternatives aren't great either. One of them is to delete the ipcache entry with src=kube-apiserver and then re-add the CIDR ipcache entry with src=generated. However, this is not atomic and introduces an undesirable transient state in the datapath. The other alternative is to address ciliumGH-18301, however that is non-trivial and was avoided in this commit. This commit allows for the aforementioned case to be handled and a unit test was added to validate that the behavior is correct. Fixes: 2b17d4d ("ipcache, policy: Inject labels from identity metadata") Co-authored-by: Joe Stringer <joe@cilium.io> Signed-off-by: Chris Tarazi <chris@isovalent.com>
[ upstream commit 69e843c ] Prevously, this code suffered from a TOCTOU bug. The IDMD lock would be taken by two different calls into the IDMD logic and assumed that the underlying IDMD would stay in the same state across both calls. However, that is incorrect because of two separate calls took the IDMD lock, so it's possible for the IDMD to change in between the two calls, hence TOCTOU. This commit fixes this by amending the API of the IDMD logic to expose one function to perform the work of the previous two functions. Those two functions are now unexported. Fixes: 2b17d4d ("ipcache, policy: Inject labels from identity metadata") Reported-by: Joe Stringer <joe@cilium.io> Signed-off-by: Chris Tarazi <chris@isovalent.com>
christarazi
added
backport/1.11
This PR represents a backport for Cilium 1.11.x of a PR that was merged to main.
kind/backports
This PR provides functionality previously merged into master.
labels
Jan 6, 2022
/test-backport-1.11 |
2 tasks
pchaigno
approved these changes
Jan 7, 2022
Seems it was a trivial backport without any conflict, so we could have handled this through the usual process. |
L4LB XDP test is broken but will be fixed by the backport of #18370. Other tests are passing. Merging. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
backport/1.11
This PR represents a backport for Cilium 1.11.x of a PR that was merged to main.
kind/backports
This PR provides functionality previously merged into master.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Once this PR is merged, you can update the PR labels via: