Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

clustermesh-apiserver service sessionAffinity regression #32646

Closed
2 of 3 tasks
thorn3r opened this issue May 21, 2024 · 0 comments · Fixed by #32657
Closed
2 of 3 tasks

clustermesh-apiserver service sessionAffinity regression #32646

thorn3r opened this issue May 21, 2024 · 0 comments · Fixed by #32657
Assignees
Labels
area/clustermesh Relates to multi-cluster routing functionality in Cilium. kind/bug This is a bug in the Cilium logic. kind/regression This functionality worked fine before, but was broken in a newer release of Cilium.

Comments

@thorn3r
Copy link
Contributor

thorn3r commented May 21, 2024

Is there an existing issue for this?

  • I have searched the existing issues

What happened?

Support for multiple clustermesh-apiserver replicas was recently merged into main. This change configures sessionAffinity: ClientIP for the clustermesh-apiserver service by default. When using a service of type LoadBalancer, the cloud provider deploys the backing LB. If the cloud provider does not support this type of session affinity, the LB will fail to deploy and the service will be unreachable. It seems at least Amazon EKS does not support this, which breaks Cluster Mesh deployments in this environment.

kube-system           clustermesh-apiserver                                LoadBalancer   172.20.33.170    <pending>                                                                   2379:31491/TCP                 8m27s
Warning  SyncLoadBalancerFailed  3m46s (x14 over 44m)  service-controller  Error syncing load balancer: failed to ensure load balancer: unsupported load balancer affinity: ClientIP

Cilium Version

main

Kernel Version

n/a

Kubernetes Version

n/a

Regression

This is a regression. By default, session affinity should only be enabled if the deployment is HA. There should also be an option to never use session affinity in a case where a user would like to deployment Cluster Mesh HA on a provider that does not support it. Cloud provider defaults for this value should be set appropriately.

Sysdump

No response

Relevant log output

No response

Anything else?

No response

Cilium Users Document

  • Are you a user of Cilium? Please add yourself to the Users doc

Code of Conduct

  • I agree to follow this project's Code of Conduct
@thorn3r thorn3r added kind/bug This is a bug in the Cilium logic. needs/triage This issue requires triaging to establish severity and next steps. kind/community-report This was reported by a user in the Cilium community, eg via Slack. labels May 21, 2024
@thorn3r thorn3r self-assigned this May 21, 2024
@giorio94 giorio94 added kind/regression This functionality worked fine before, but was broken in a newer release of Cilium. area/clustermesh Relates to multi-cluster routing functionality in Cilium. and removed needs/triage This issue requires triaging to establish severity and next steps. kind/community-report This was reported by a user in the Cilium community, eg via Slack. labels May 21, 2024
thorn3r added a commit to thorn3r/cilium that referenced this issue May 21, 2024
Some cloud providers may not support Service session affinity. Enabling
it by default can cause the clustermesh-apiserver Service to remain in
'pending' in some environments, as seen in EKS in issue cilium#32646.

Let's instead only enable session affinity if clustermesh-apiserver is
deployed with multiple replicas, as this was originally intended to
reduce reconnections in HA configurations. Optionally, the user can
override this behavior with "Always" or "Never".

fixes: cilium#32646
fixes: df3c02f ("ClusterMesh/helm: support multiple replicas")

Signed-off-by: Tim Horner <timothy.horner@isovalent.com>
thorn3r added a commit to thorn3r/cilium that referenced this issue May 21, 2024
Some cloud providers may not support Service session affinity. Enabling
it by default can cause the clustermesh-apiserver Service to remain in
'pending' in some environments, as seen in EKS in issue cilium#32646.

Let's instead only enable session affinity if clustermesh-apiserver is
deployed with multiple replicas, as this was originally intended to
reduce reconnections in HA configurations. Optionally, the user can
override this behavior with "Always" or "Never".

Fixes: cilium#32646
Fixes: df3c02f ("ClusterMesh/helm: support multiple replicas")

Signed-off-by: Tim Horner <timothy.horner@isovalent.com>
thorn3r added a commit to thorn3r/cilium that referenced this issue May 22, 2024
Some cloud providers may not support Service session affinity. Enabling
it by default can cause the clustermesh-apiserver Service to remain in
'pending' in some environments, as seen in EKS in issue cilium#32646.

Let's instead only enable session affinity if clustermesh-apiserver is
deployed with multiple replicas, as this was originally intended to
reduce reconnections in HA configurations. Optionally, the user can
override this behavior with "Always" or "Never".

Fixes: cilium#32646
Fixes: df3c02f ("ClusterMesh/helm: support multiple replicas")

Signed-off-by: Tim Horner <timothy.horner@isovalent.com>
thorn3r added a commit to thorn3r/cilium that referenced this issue May 22, 2024
Some cloud providers may not support Service session affinity. Enabling
it by default can cause the clustermesh-apiserver Service to remain in
'pending' in some environments, as seen in EKS in issue cilium#32646.

Let's instead only enable session affinity if clustermesh-apiserver is
deployed with multiple replicas, as this was originally intended to
reduce reconnections in HA configurations. Optionally, the user can
override this behavior with "Always" or "Never".

Fixes: cilium#32646
Fixes: df3c02f ("ClusterMesh/helm: support multiple replicas")

Signed-off-by: Tim Horner <timothy.horner@isovalent.com>
github-merge-queue bot pushed a commit that referenced this issue May 28, 2024
Some cloud providers may not support Service session affinity. Enabling
it by default can cause the clustermesh-apiserver Service to remain in
'pending' in some environments, as seen in EKS in issue #32646.

Let's instead only enable session affinity if clustermesh-apiserver is
deployed with multiple replicas, as this was originally intended to
reduce reconnections in HA configurations. Optionally, the user can
override this behavior with "Always" or "Never".

Fixes: #32646
Fixes: df3c02f ("ClusterMesh/helm: support multiple replicas")

Signed-off-by: Tim Horner <timothy.horner@isovalent.com>
sayboras pushed a commit that referenced this issue Jun 4, 2024
Some cloud providers may not support Service session affinity. Enabling
it by default can cause the clustermesh-apiserver Service to remain in
'pending' in some environments, as seen in EKS in issue #32646.

Let's instead only enable session affinity if clustermesh-apiserver is
deployed with multiple replicas, as this was originally intended to
reduce reconnections in HA configurations. Optionally, the user can
override this behavior with "Always" or "Never".

Fixes: #32646
Fixes: df3c02f ("ClusterMesh/helm: support multiple replicas")

Signed-off-by: Tim Horner <timothy.horner@isovalent.com>
sayboras pushed a commit that referenced this issue Jun 10, 2024
[ upstream commit 22139b2 ]

[ Backporter's notes: Dropped schema definition from Helm values since
schema generation was not backported to v1.15-ce ]

Some cloud providers may not support Service session affinity. Enabling
it by default can cause the clustermesh-apiserver Service to remain in
'pending' in some environments, as seen in EKS in issue #32646.

Let's instead only enable session affinity if clustermesh-apiserver is
deployed with multiple replicas, as this was originally intended to
reduce reconnections in HA configurations. Optionally, the user can
override this behavior with "Always" or "Never".

Fixes: #32646
Fixes: df3c02f ("ClusterMesh/helm: support multiple replicas")

Signed-off-by: Tim Horner <timothy.horner@isovalent.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/clustermesh Relates to multi-cluster routing functionality in Cilium. kind/bug This is a bug in the Cilium logic. kind/regression This functionality worked fine before, but was broken in a newer release of Cilium.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants