Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

clustermesh-apiserver: rework identities, endpoints and nodes synchronization to improve performance #25049

Merged

Conversation

giorio94
Copy link
Member

@giorio94 giorio94 commented Apr 21, 2023

This PR modifies the clustermesh-apiserver implementation to use the Resource[T] abstraction to retrieve CiliumIdentities, CiliumNodes and CiliumEndpoints from Kubernetes, rather than a vanilla informer, and synchronize them through the WorkqueueSyncStore. This ensures consistent behavior across all resource types (i.e., the retry logic is implemented by the SyncStore, rather than through the Resource[T] underlying workqueue).

This approach has several advantages, including:

  • Enables retrying kvstore operations in case of temporary failures;
  • Performs coalescencing in case of multiple updates for the same key;
  • Simplifies the overall logic, since conversions and sanity checks are already performed by the Resource[T] abstraction.

Please, refer to the commit descriptions for more details about the individual changes.

Related: #25562

clustermesh-apiserver: rework identities, endpoints and nodes synchronization to improve performance

@giorio94 giorio94 added area/clustermesh Relates to multi-cluster routing functionality in Cilium. release-note/misc This PR makes changes that have no direct user impact. labels Apr 21, 2023
@marseel marseel self-requested a review April 26, 2023 11:15
@giorio94 giorio94 force-pushed the mio/clustermesh-apiserver-resources branch from 7c3108f to 82347ca Compare April 27, 2023 13:25
@giorio94 giorio94 force-pushed the mio/clustermesh-apiserver-resources branch from 82347ca to 91e3a44 Compare May 5, 2023 16:48
@giorio94 giorio94 changed the title clustermesh-apiserver: read identities through Resource[T] abstraction clustermesh-apiserver: rework objects synchronization to improve resiliency/performance May 5, 2023
@giorio94 giorio94 changed the title clustermesh-apiserver: rework objects synchronization to improve resiliency/performance clustermesh-apiserver: rework identities, endpoints and nodes synchronization to improve performance May 5, 2023
@giorio94 giorio94 added the kind/performance There is a performance impact of this. label May 5, 2023
@giorio94 giorio94 force-pushed the mio/clustermesh-apiserver-resources branch from 91e3a44 to f8fef13 Compare May 5, 2023 16:55
@giorio94
Copy link
Member Author

giorio94 commented May 8, 2023

/ci-multicluster

@giorio94 giorio94 force-pushed the mio/clustermesh-apiserver-resources branch 5 times, most recently from b2edfa2 to a0428a0 Compare May 11, 2023 12:32
@giorio94
Copy link
Member Author

Force pushed to rebase onto main and resolve conflicts.

@giorio94 giorio94 force-pushed the mio/clustermesh-apiserver-resources branch 3 times, most recently from e121ac1 to 7dd2cd5 Compare May 16, 2023 16:09
@giorio94 giorio94 marked this pull request as ready for review May 16, 2023 16:11
@giorio94 giorio94 requested review from a team as code owners May 16, 2023 16:11
@giorio94 giorio94 force-pushed the mio/clustermesh-apiserver-resources branch from 7611b3c to 466af29 Compare June 1, 2023 10:37
@maintainer-s-little-helper maintainer-s-little-helper bot removed the dont-merge/needs-sign-off The author needs to add signoff to their commits before merge. label Jun 1, 2023
@giorio94
Copy link
Member Author

giorio94 commented Jun 1, 2023

Rebased onto main to fix conflicts. I've also added one additional commit to drop the temporary backend global variable introduced in #25554, since no longer necessary.

@giorio94
Copy link
Member Author

giorio94 commented Jun 1, 2023

/test

@giorio94 giorio94 force-pushed the mio/clustermesh-apiserver-resources branch from 466af29 to 8bf6550 Compare June 1, 2023 12:03
@giorio94
Copy link
Member Author

giorio94 commented Jun 1, 2023

/test

@giorio94 giorio94 force-pushed the mio/clustermesh-apiserver-resources branch from 8bf6550 to e323efc Compare June 1, 2023 15:31
@giorio94
Copy link
Member Author

giorio94 commented Jun 1, 2023

Rebased onto main to fix conflicts.

@giorio94
Copy link
Member Author

giorio94 commented Jun 1, 2023

/test

Job 'Cilium-PR-K8s-1.26-kernel-net-next' failed:

Click to show.

Test Name

K8sDatapathServicesTest Checks N/S loadbalancing Tests with XDP, direct routing, DSR and Maglev

Failure Output

FAIL: Can not connect to service "tftp://[fd04::11]:32754/hello" from outside cluster (7/10)

Jenkins URL: https://jenkins.cilium.io/job/Cilium-PR-K8s-1.26-kernel-net-next/384/

If it is a flake and a GitHub issue doesn't already exist to track it, comment /mlh new-flake Cilium-PR-K8s-1.26-kernel-net-next so I can create one.

Then please upload the Jenkins artifacts to that issue.

@giorio94 giorio94 force-pushed the mio/clustermesh-apiserver-resources branch from e323efc to d3c19cb Compare June 5, 2023 06:38
@giorio94
Copy link
Member Author

giorio94 commented Jun 5, 2023

Rebased onto main to fix conflicts.

@giorio94
Copy link
Member Author

giorio94 commented Jun 5, 2023

/test

Let's change the NewWorkqueueSyncStore function to always take the
clusterName as parameter, rather than relying on an option. This
prevents inadvertently forgetting to set it.

Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
This commit extends the IPIdentityPair struct implementing the store.Key
interface, to allow reading/writing it through the store abstractions.

Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
Add the CiliumSlimEndpointResource constructor, to be leveraged by
the clustermesh-apiserver to improve the synchronization logic.

Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
This preparatory commit introduces the `synchronize` helper function to
the clustermesh-apiserver, which will be subsequently used to uniform
the synchronization process of the different resources from kubernetes
to the kvstore. Essentially, the goal is to replace the vanilla
informers with the "Resource[T]" abstraction, which better fits in the
hive/cell paradigm and reduces the amount of boilerplate code. Each
received event is then fed into the specific `synchronizer`
implementation (to be introduced in the subsequent commits). It will
take care of converting the object into the appropriate kvstore
representation, and synchronizing it as required.

Note: the synchronizer interface is not generic, since that approach
turned out to cause issues with the `unused` linter, which failed to
recognize the different type-specific implementations as actually used.

Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
This commit modifies the clustermesh-apiserver to read CiliumNodes
through the Resource[T] abstraction rather than a vanilla informer, and
synchronize them through the WorkqueueSyncStore. This ensures consistent
behavior across all resource types (i.e., the retry logic is implemented
by the SyncStore, rather than through the Resource[T] underlying workqueue).

This approach has several advantages compared to the previous
implementation, including:
* Enables retrying kvstore operations in case of temporary failures;
* Performs coalescencing in case of multiple updates for the same key;
* Simplifies the overall logic, since conversions and sanity checks are
  already performed by the Resource[T] abstraction.

Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
This commit modifies the clustermesh-apiserver to read CiliumIdentities
through the Resource[T] abstraction rather than a vanilla informer, and
synchronize them through the WorkqueueSyncStore. This ensures consistent
behavior across all resource types (i.e., the retry logic is implemented
by the SyncStore, rather than through the Resource[T] underlying workqueue).

This approach has several advantages compared to the previous
implementation, including:
* Enables retrying kvstore operations in case of temporary failures;
* Performs coalescencing in case of multiple updates for the same key;
* Simplifies the overall logic, since conversions and sanity checks are
  already performed by the Resource[T] abstraction.

Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
This commit modifies the clustermesh-apiserver to read CiliumEndpoints
through the Resource[T] abstraction rather than a vanilla informer, and
synchronize them through the WorkqueueSyncStore. This ensures consistent
behavior across all resource types (i.e., the retry logic is implemented
by the SyncStore, rather than through the Resource[T] underlying workqueue).

This approach has several advantages compared to the previous
implementation, including:
* Enables retrying kvstore operations in case of temporary failures;
* Performs coalescencing in case of multiple updates for the same key;
* Simplifies the overall logic, since conversions and sanity checks are
  already performed by the Resource[T] abstraction.

Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
@giorio94 giorio94 force-pushed the mio/clustermesh-apiserver-resources branch from d3c19cb to b188c84 Compare June 5, 2023 07:20
@giorio94 giorio94 removed the dont-merge/blocked Another PR must be merged before this one. label Jun 5, 2023
@giorio94
Copy link
Member Author

giorio94 commented Jun 5, 2023

/test

The temporary global backend variable is now only used in a very limited
number of places. Let's propagate that parameter and drop the global variable.

Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
@giorio94 giorio94 force-pushed the mio/clustermesh-apiserver-resources branch from b188c84 to a4f3f26 Compare June 5, 2023 07:35
@giorio94
Copy link
Member Author

giorio94 commented Jun 5, 2023

/test

@giorio94
Copy link
Member Author

giorio94 commented Jun 5, 2023

All reviews from codeowners are in, and all tests are green. Marking as ready to merge

@giorio94 giorio94 added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Jun 5, 2023
@dylandreimerink dylandreimerink merged commit e4c082c into cilium:main Jun 5, 2023
61 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/clustermesh Relates to multi-cluster routing functionality in Cilium. kind/performance There is a performance impact of this. ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/misc This PR makes changes that have no direct user impact.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants