-
Notifications
You must be signed in to change notification settings - Fork 2.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
operator/clustermesh: add endpoint slice synchronization #28440
Conversation
Note that I'm creating this PR early but there are many things to iron out like helm chart integration, probably integration tests, documentation. I have not really tested it so far, I can just say that the reconciler unit tests succeed and the code compile. Will try to address those ~ next week or so. |
21b8b27
to
8597a35
Compare
63f61d4
to
009bd60
Compare
It seems to work fairly well locally:
I was hoping to add an integration tests but IIUC there is no test for clustermesh so it would be probably a bit harder but I can check if needed. I could add mcs-api conformance test which covers endpointslice mirroring from remote cluster as well apparently. However those need Cilium support for ServiceImport/Export so that should probably be done in a future PR I think 🤔. |
229ef8d
to
ea327b0
Compare
ea327b0
to
8ae4747
Compare
Sure, I re-requested a review to myself, I'll take a look ASAP. |
142c20f
to
04037b8
Compare
/test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks cleaner, thanks! 🚀
I like that we are not changing the operator and clustermesh-apiserver resources anymore, just customizing the agent ones as needed. 👍
I've just left a comment regarding a unit test that I'm afraid might become flaky. Let me know what you think.
43bdd4f
to
ba3c01b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving one more time after latest changes. 🥇
Thanks again!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
I left two small non-blocking nits.
install/kubernetes/cilium/templates/cilium-operator/deployment.yaml
Outdated
Show resolved
Hide resolved
b1fd0df
to
64c3361
Compare
This commit adds a new clustermesh module inside the operator which pull data from Cilium's clustermesh components and unlike the Cilium agent will act on the whole cluster level instead of the node level. This adds optional endpoint slice synchronization from remote clusters. Having endpoint slices of others clusters is useful for non Cilium controllers that would need this information (for instance CoreDNS, non Cilium ingress controller). This is also a base to support the endpoint slice synchronization from Multi-Cluster Services API (KEP-1645) and it is actually already using a mcs-api label to identify the source cluster. The endpoint slice synchronization is based on existing upstream Kubernetes endpoint slice controller/reconciler but adapted. Thanks to this we do not have to write ourselves some of the extensive logis to split the EndpointSlices, reuse them, handle dual stack and all the extensive logic there... As we don't deal with Pods there is some logic to "mock" the pod informer passed to the controller and based on the global services info in the cluster mesh return "fake" pods. There is also some similar mocking done to service informer and the client to create EndpointSlice as there should be one set of EndpointSlice per cluster. There is also a fake node informer that essentially return a fake "node" per cluster. The endpoint slice synchronization is guarded by a new annotation `cilium.io/global-sync-endpoint-slices` as enabling it on global services with high churn may be a scalability concern on all the etcd and/or kube-apiserver of all the Kubernetes cluster which involve the same global services with the endpoint slice sync enabled. Note that, the upstream controller is already optimized to do a low number of requests to kube-apiserver to alleviate some of the scalability concerns. Signed-off-by: Arthur Outhenin-Chalandre <arthur@cri.epita.fr>
- Allow operator write access to endpointslices - Allow operator to write events as the Kubernetes controller write some in case of errors - Add similar configuration as the cilium agent regarding clustermesh Signed-off-by: Arthur Outhenin-Chalandre <arthur@cri.epita.fr>
This commit allow Cilium watchers to ignore EndpointSlices mirrored by the Cilium clustermesh EndpointSlice controller. We do not need to watch those EndpointSlices as Cilium has special datapath logic for those already. Signed-off-by: Arthur Outhenin-Chalandre <arthur@cri.epita.fr>
Signed-off-by: Arthur Outhenin-Chalandre <arthur@cri.epita.fr>
Now that we synchronize EndpointSlice we also need to synchronize headless services so that we can synchronize their EndpointSlice too. We implement this by removing the exclusion on regular Endpoints/Service resources of headless Services and adding it back inside the daemon. As a result clustermesh-api-server and the operator will now watch headless Services/Endpoints as well. Signed-off-by: Arthur Outhenin-Chalandre <arthur@cri.epita.fr>
64c3361
to
944f7ba
Compare
/test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Look good for helm changes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work 🎉
By default Kubernetes EndpointSlice synchronization is disabled on Global services. | ||
To have Cilium discover remote clusters endpoints of a Global Service | ||
from DNS or any third party controllers, enable synchronization by adding | ||
the annotation ``service.cilium.io/global-sync-endpoint-slices: "true"``. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will not block the PR on this. @MrFreezeex can we have a follow up PR with the followin sentence:
the annotation ``service.cilium.io/global-sync-endpoint-slices: "true"``.
This will allow Cilium to create k8s endpoint slices that belong a remote cluster for services that have that annotation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure 👍, I will try to include that change in a separate commit in one of my followup PRs that I intend to do to address the hostname sync limitation.
Huge thanks to all the people involved here ❤️. Many of you provided super insightful reviews that made the code a lot better, wrote encouraging messages or even provided direct help! Now I am excited to to have other users trying this, annnd see some of you around for the next followups/related PRs :D. |
Please ensure your pull request adheres to the following guidelines:
description and a
Fixes: #XXX
line if the commit addresses a particularGitHub issue.
Fixes: <commit-id>
tag, thenplease add the commit author[s] as reviewer[s] to this issue.
This commit adds a new clustermesh module inside the operator which pull data from Cilium's clustermesh components and synchronize Endpoint Slices from remote cluster. See commits description for more details.
Related to: #27902