Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor LocalNode synchronization logic and remove NodeChain #29319

Merged
merged 14 commits into from Dec 1, 2023

Conversation

giorio94
Copy link
Member

@giorio94 giorio94 commented Nov 22, 2023

The intent of this PR is to:

  • Centralize the logic to synchronize the LocalNode internal representation when the corresponding k8s node object is mutated (supporting labels and annotations);
  • Support incremental updates of LocalNode labels and annotations, so that other modules (such as WireGuard) can register additional ones.
  • Convert the remaining subscribers to the NodeChain over to the newer LocalNodeStore abstraction, and specifically the node discovery logic.
  • Remove the now no longer used NodeChain, and the associated logic.

Please review commit by commit and refer to the respective descriptions for additional details.

Depends on #29347

@giorio94 giorio94 added kind/cleanup This includes no functional changes. area/daemon Impacts operation of the Cilium daemon. release-note/misc This PR makes changes that have no direct user impact. area/modularization labels Nov 22, 2023
@giorio94
Copy link
Member Author

/test

@giorio94
Copy link
Member Author

/test

@giorio94
Copy link
Member Author

/test

@maintainer-s-little-helper

This comment was marked as resolved.

@maintainer-s-little-helper maintainer-s-little-helper bot added the dont-merge/needs-sign-off The author needs to add signoff to their commits before merge. label Nov 23, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot removed the dont-merge/needs-sign-off The author needs to add signoff to their commits before merge. label Nov 23, 2023
@giorio94
Copy link
Member Author

/test

1 similar comment
@giorio94
Copy link
Member Author

/test

@giorio94 giorio94 marked this pull request as ready for review November 23, 2023 15:13
@giorio94 giorio94 requested review from a team as code owners November 23, 2023 15:13
The service cache needs to be informed about the labels associated with
the local node, and react to possible changes, in case service topology
aware hints support is enabled.

As a preparatory change to subsequently remove the local node chain and
uniform the handling of local node updates, let's switch the service
cache to directly depend on the LocalNodeStore, and observe it.

While being at it, let's also extract the associated configuration
option, and register it through hive.

Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
These fields are used when creating/updating the CiliumNode resource
corresponding to the given local node. Hence, let's add them here, so
that there's a consistent source of truth, and we can eventually get
rid of the separate getter in the node discovery module.

Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
These fields, and in particular cluster name, cluster ID and node
identity are currently hard-coded in the fillLocalNode() node
discovery function. As a preparation for the subsequent refactoring
of that module, let's configure them also in the local node store.

Additionally, let's also configure the fallback node name (based on the
configured environment variable and hostname), to be used when not
running in a Kubernetes node (most notably for external workloads).

Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
Currently, the local node representation is initialized at start-up
time by a dedicated cell, merging information from the local configuration
and the corresponding k8s node object. Then, labels and annotations
are indirectly kept in sync through a piece of logic part of the endpoint
manager, which observes updates of the kubernetes node. Finally, yet
another observer awakes the node discovery module, that propagates them
to the CiliumNode resource (and optionally the kvstore representation).

As an effort to simplify and centralize the local node synchronization
logic, this commit extends the cell previously handling initialization
tasks only to be also responsible for keeping the local representation
in sync with the corresponding k8s node object for what concerns labels
and annotations.

Additionally, it also introduces the setup and synchronization of the
local node UID and ProviderID, as well as support for incremental updates
of labels and annotations. Hence, allowing other components (such as
the WireGuard subsystem) to possibly register additional labels and
or annotations, which get merged with the ones retrieved from k8s,
instead of being blindly overwritten. The behavior in case the same
label/annotation which is injected is also present as part of the
k8s node is currently undefined (as it depends on events ordering),
but it is not expected to happen in any legitimate case.

Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
The Update method of the LocalNodeStore directly mutates the cached
value, and then emits a shallow copy, to be eventually seen, and
possibly stored, by all observers. Yet, when setting a new address,
we might mutate the existing slice, causing this modification to
be incorrectly propagated also to older copies (as a shallow copy
of a slice points to the same one). To prevent this issue, let's
always create a clone of the addresses slice before mutating it.

Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
Additionally configure the encryption key to the placeholder value,
and the annotation containing the public key value at startup time,
so that the node discovery synchronization logic doesn't need to
mutate the retrieved object specifically for wireguard.

Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
Currently, we filter the annotations when converting the internal
local node representation to the corresponding CiliumNode object.
Yet, this makes more problematic for additional modules (such as
WireGuard) to configure additional annotations, as they require
extra ad-hoc logic. Instead, let's filter the annotations when
creating the internal representation from the k8s node object, and
then propagate all annotations to the CiliumNode object. For the
sake of generality, we propagate all cilium specific annotation,
i.e., whose prefix matches the '^([A-Za-z0-9]+\.)*cilium.io/' regex.

Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
As a preparation for a subsequent commit, which will directly propagate
the local node object as parameter rather than storing it as a global,
let's make sure that at any time there's at most one active controller
to synchronize the node object to the kvstore, and that it is updated
correctly when necessary. This prevents a race condition that could
possibly cause out-of-order upsertions of incorrect versions of the
local node.

Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
Currently, the node discovery module, which is responsible for
synchronizing the local node object with the corresponding CiliumNode
and kvstore representations, maintains its own internal representation
of the local node object, makes heavy usage of the global getter
functions of the node package, and depends on external triggers to
execute a new synchronization run once the local node changes. Overall,
this leads to possible subtle discrepancies between the CiliumNode and
kvstore contents, as well as makes the logic difficult to follow.

This commit refactors the module to directly depend on the
LocalNodeStore to pull the appropriate information, as well as to
trigger a new synchronization whenever appropriate. This ensures
that updates of the local representation are always propagated to
the corresponding CRD and kvstore representations.

As a side effect, it also drops the ciliumNodeUpdater object, which is
now no longer necessary to trigger an indirect synchronization run when
the local kubernetes node changes.

Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
The local node representation already contains all addresses
(NodeInternal, NodeExternal, and NodeCiliumInternal) that need
to be propagated into the CiliumNode CR. Hence, let's just copy
them instead of pulling the addresses again from the k8s Node
resource. This also mimics what we do in the kvstore case.

Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
Let's retrieve these fields from the local node internal representation,
so that we don't need a separate getter to access them from the underlying
k8s node object. This also ensures that we use the local node object as a
consistent source of truth during synchronization into the CiliumNode CR
and the corresponding kvstore representation.

Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
Use the LocalNodeStore API instead of the legacy global setter
functions, for consistency with the rest of the node discovery
module.

Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
The endpoint manager needs to be informed about the labels associated
with the local node, and react to possible changes, as they get
associated with the host endpoint. To this end, it is currently
registered as a subscriber to the local node chain.

This commit switches it to directly depend on, and observe the
LocalNodeStore, to uniform the handling of local node updates and
prepare for the subsequent removal of the local node chain.
Additionally, it also drops the extraneous updates of local node
labels and annotations, which have been migrated in a previous
commit to be handled by the dedicated module.

Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
After the conversion of all NodeChain consumers over to directly
depending on the LocalNodeStore, let's get rid of all the no longer
used code implementing the chain logic. As part of it, we also stop
explicitly blocking for synchronization of the local node resource
from k8s as part of daemon initialization. Yet, the LocalNodeStore
start hook already directly depends on it (and blocks until the first
event is received), and that is guaranteed to always happen before
the creation of the daemon object. Hence, leading to the same result.

Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
@giorio94
Copy link
Member Author

giorio94 commented Dec 1, 2023

Rebased onto main to fix a conflict.

@giorio94 giorio94 removed the dont-merge/discussion A discussion is ongoing and should be resolved before merging, regardless of reviews & tests status. label Dec 1, 2023
@giorio94
Copy link
Member Author

giorio94 commented Dec 1, 2023

/test

@giorio94 giorio94 removed the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Dec 1, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot added ready-to-merge This PR has passed all tests and received consensus from code owners to merge. labels Dec 1, 2023
@giorio94
Copy link
Member Author

giorio94 commented Dec 1, 2023

Conformance External Workloads hit #29312

@giorio94
Copy link
Member Author

giorio94 commented Dec 1, 2023

Conformance AKS failed towards external-8844 (8.8.4.4:443).
I'm not rerunning it given that it is not required, and known to be flaky.

@aanm aanm added this pull request to the merge queue Dec 1, 2023
Merged via the queue into cilium:main with commit 649f2fe Dec 1, 2023
60 of 61 checks passed
@giorio94 giorio94 deleted the mio/local-node-refactor branch December 4, 2023 07:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/daemon Impacts operation of the Cilium daemon. area/modularization kind/cleanup This includes no functional changes. ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/misc This PR makes changes that have no direct user impact.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet