New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor LocalNode synchronization logic and remove NodeChain #29319
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
giorio94
added
kind/cleanup
This includes no functional changes.
area/daemon
Impacts operation of the Cilium daemon.
release-note/misc
This PR makes changes that have no direct user impact.
area/modularization
labels
Nov 22, 2023
/test |
giorio94
force-pushed
the
mio/local-node-refactor
branch
from
November 22, 2023 13:30
4c45df1
to
044f849
Compare
/test |
giorio94
force-pushed
the
mio/local-node-refactor
branch
from
November 22, 2023 14:37
044f849
to
89e1ef4
Compare
/test |
giorio94
force-pushed
the
mio/local-node-refactor
branch
from
November 23, 2023 11:15
89e1ef4
to
33e8323
Compare
This comment was marked as resolved.
This comment was marked as resolved.
maintainer-s-little-helper
bot
added
the
dont-merge/needs-sign-off
The author needs to add signoff to their commits before merge.
label
Nov 23, 2023
giorio94
force-pushed
the
mio/local-node-refactor
branch
from
November 23, 2023 11:16
33e8323
to
65d9cda
Compare
maintainer-s-little-helper
bot
removed
the
dont-merge/needs-sign-off
The author needs to add signoff to their commits before merge.
label
Nov 23, 2023
/test |
1 similar comment
/test |
ldelossa
approved these changes
Nov 30, 2023
The service cache needs to be informed about the labels associated with the local node, and react to possible changes, in case service topology aware hints support is enabled. As a preparatory change to subsequently remove the local node chain and uniform the handling of local node updates, let's switch the service cache to directly depend on the LocalNodeStore, and observe it. While being at it, let's also extract the associated configuration option, and register it through hive. Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
These fields are used when creating/updating the CiliumNode resource corresponding to the given local node. Hence, let's add them here, so that there's a consistent source of truth, and we can eventually get rid of the separate getter in the node discovery module. Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
These fields, and in particular cluster name, cluster ID and node identity are currently hard-coded in the fillLocalNode() node discovery function. As a preparation for the subsequent refactoring of that module, let's configure them also in the local node store. Additionally, let's also configure the fallback node name (based on the configured environment variable and hostname), to be used when not running in a Kubernetes node (most notably for external workloads). Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
Currently, the local node representation is initialized at start-up time by a dedicated cell, merging information from the local configuration and the corresponding k8s node object. Then, labels and annotations are indirectly kept in sync through a piece of logic part of the endpoint manager, which observes updates of the kubernetes node. Finally, yet another observer awakes the node discovery module, that propagates them to the CiliumNode resource (and optionally the kvstore representation). As an effort to simplify and centralize the local node synchronization logic, this commit extends the cell previously handling initialization tasks only to be also responsible for keeping the local representation in sync with the corresponding k8s node object for what concerns labels and annotations. Additionally, it also introduces the setup and synchronization of the local node UID and ProviderID, as well as support for incremental updates of labels and annotations. Hence, allowing other components (such as the WireGuard subsystem) to possibly register additional labels and or annotations, which get merged with the ones retrieved from k8s, instead of being blindly overwritten. The behavior in case the same label/annotation which is injected is also present as part of the k8s node is currently undefined (as it depends on events ordering), but it is not expected to happen in any legitimate case. Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
The Update method of the LocalNodeStore directly mutates the cached value, and then emits a shallow copy, to be eventually seen, and possibly stored, by all observers. Yet, when setting a new address, we might mutate the existing slice, causing this modification to be incorrectly propagated also to older copies (as a shallow copy of a slice points to the same one). To prevent this issue, let's always create a clone of the addresses slice before mutating it. Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
Additionally configure the encryption key to the placeholder value, and the annotation containing the public key value at startup time, so that the node discovery synchronization logic doesn't need to mutate the retrieved object specifically for wireguard. Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
Currently, we filter the annotations when converting the internal local node representation to the corresponding CiliumNode object. Yet, this makes more problematic for additional modules (such as WireGuard) to configure additional annotations, as they require extra ad-hoc logic. Instead, let's filter the annotations when creating the internal representation from the k8s node object, and then propagate all annotations to the CiliumNode object. For the sake of generality, we propagate all cilium specific annotation, i.e., whose prefix matches the '^([A-Za-z0-9]+\.)*cilium.io/' regex. Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
As a preparation for a subsequent commit, which will directly propagate the local node object as parameter rather than storing it as a global, let's make sure that at any time there's at most one active controller to synchronize the node object to the kvstore, and that it is updated correctly when necessary. This prevents a race condition that could possibly cause out-of-order upsertions of incorrect versions of the local node. Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
Currently, the node discovery module, which is responsible for synchronizing the local node object with the corresponding CiliumNode and kvstore representations, maintains its own internal representation of the local node object, makes heavy usage of the global getter functions of the node package, and depends on external triggers to execute a new synchronization run once the local node changes. Overall, this leads to possible subtle discrepancies between the CiliumNode and kvstore contents, as well as makes the logic difficult to follow. This commit refactors the module to directly depend on the LocalNodeStore to pull the appropriate information, as well as to trigger a new synchronization whenever appropriate. This ensures that updates of the local representation are always propagated to the corresponding CRD and kvstore representations. As a side effect, it also drops the ciliumNodeUpdater object, which is now no longer necessary to trigger an indirect synchronization run when the local kubernetes node changes. Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
The local node representation already contains all addresses (NodeInternal, NodeExternal, and NodeCiliumInternal) that need to be propagated into the CiliumNode CR. Hence, let's just copy them instead of pulling the addresses again from the k8s Node resource. This also mimics what we do in the kvstore case. Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
Let's retrieve these fields from the local node internal representation, so that we don't need a separate getter to access them from the underlying k8s node object. This also ensures that we use the local node object as a consistent source of truth during synchronization into the CiliumNode CR and the corresponding kvstore representation. Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
Use the LocalNodeStore API instead of the legacy global setter functions, for consistency with the rest of the node discovery module. Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
The endpoint manager needs to be informed about the labels associated with the local node, and react to possible changes, as they get associated with the host endpoint. To this end, it is currently registered as a subscriber to the local node chain. This commit switches it to directly depend on, and observe the LocalNodeStore, to uniform the handling of local node updates and prepare for the subsequent removal of the local node chain. Additionally, it also drops the extraneous updates of local node labels and annotations, which have been migrated in a previous commit to be handled by the dedicated module. Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
After the conversion of all NodeChain consumers over to directly depending on the LocalNodeStore, let's get rid of all the no longer used code implementing the chain logic. As part of it, we also stop explicitly blocking for synchronization of the local node resource from k8s as part of daemon initialization. Yet, the LocalNodeStore start hook already directly depends on it (and blocks until the first event is received), and that is guaranteed to always happen before the creation of the daemon object. Hence, leading to the same result. Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
giorio94
force-pushed
the
mio/local-node-refactor
branch
from
December 1, 2023 07:55
12e592c
to
f3d2c21
Compare
Rebased onto main to fix a conflict. |
giorio94
removed
the
dont-merge/discussion
A discussion is ongoing and should be resolved before merging, regardless of reviews & tests status.
label
Dec 1, 2023
/test |
giorio94
removed
the
ready-to-merge
This PR has passed all tests and received consensus from code owners to merge.
label
Dec 1, 2023
maintainer-s-little-helper
bot
added
ready-to-merge
This PR has passed all tests and received consensus from code owners to merge.
labels
Dec 1, 2023
Conformance External Workloads hit #29312 |
Conformance AKS failed towards |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
area/daemon
Impacts operation of the Cilium daemon.
area/modularization
kind/cleanup
This includes no functional changes.
ready-to-merge
This PR has passed all tests and received consensus from code owners to merge.
release-note/misc
This PR makes changes that have no direct user impact.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The intent of this PR is to:
Please review commit by commit and refer to the respective descriptions for additional details.
Depends on #29347