Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CiliumEndpointSlice Graduation to Stable #31904

Open
5 of 6 tasks
thorn3r opened this issue Apr 11, 2024 · 8 comments
Open
5 of 6 tasks

CiliumEndpointSlice Graduation to Stable #31904

thorn3r opened this issue Apr 11, 2024 · 8 comments
Assignees
Labels
kind/cfp pinned These issues are not marked stale by our issue bot. sig/agent Cilium agent related.
Milestone

Comments

@thorn3r
Copy link
Contributor

thorn3r commented Apr 11, 2024

CiliumEndpointSlice Graduation to Stable

SIG: sig-scalability

Begin Design Discussion: 2024-04-11

Cilium Release: 1.16

Authors: Tim Horner timothy.horner@isovalent.com, Marcel Zięba marcel.zieba@isovalent.com, Marco Iorio marco.iorio@isovalent.com

Summary

CiliumEndpointSlice facilitates the grouping of multiple CiliumEndpoint objects to achieve better scalability. This document describes the remaining work needed to consider CiliumEndpointSlice a Stable feature.

Goals

  • Blanket test CiliumEndpointSlice in all CI workflows.

    • CiliumEndpointSlice will be temporarily enabled for all CI workflows to identify failures or feature incompatibilities. The goal of the blanket test is to provide guidance on areas of improvement to focus on and any gaps in tooling.
  • Enable CiliumEndpointSlice in CI tests.

    • The feature has unit test coverage but there is no end-to-end testing. CiliumEndpointSlice will be enabled in the Conformance E2E and 100 Nodes Scale Test workflows. Additional workflows may be added based on the findings of the blanket test.
  • Identify feature incompatibilites

    • Identify features that rely on the use of CiliumEndpoint and summarize the work needed to support CiliumEndpointSlice. A decision should be made per-feature whether the incompatibility should be fixed or documented. The decision should be made based on the amount of work needed and the value add of making the feature compatible with CiliumEndpointSlice.
  • Simplify CiliumEndpointSlice configuration

    • Cilium Operator accepts the following 8 options for configuring CiliumEndpointSlice:

      • ces-max-ciliumendpoints-per-ces
      • ces-slice-mode
      • ces-write-qps-limit
      • ces-write-qps-burst
      • ces-enable-dynamic-rate-limit
      • ces-dynamic-rate-limit-nodes
      • ces-dynamic-rate-limit-qps-limit
      • ces-dynamic-rate-limit-qps-burst
    • Configuration should be simplified by setting sane defaults and removing any unused or deprecated options. CiliumEndpointSlice should use its own Kubernetes ClientSet to decouple it from the ClientSet used by the CiliumOperator for all other resources.

  • Define a migration path from CiliumEndpoint to CiliumEndpointSlice

Tasks

  1. thorn3r
  2. thorn3r
  3. feature/egress-gateway kind/enhancement pinned sig/datapath
  4. thorn3r
  5. area/CI area/CI-improvement sig/agent
    jshr-w
  6. ready-to-merge release-note/misc
    thorn3r
@antonipp
Copy link
Contributor

Thank you for starting this! 🙌

Just for future reference, my PR #31800 already has the docs for the Define a migration path from CiliumEndpoint to CiliumEndpointSlice item (although there are no tests)

@thorn3r
Copy link
Contributor Author

thorn3r commented Apr 12, 2024

@antonipp I saw that, nice work and thank you for writing that up!

thorn3r added a commit to thorn3r/design-cfps that referenced this issue Apr 15, 2024
Add CFP design proposal for graduation CiliumEndpointSlice to
'stable'.

cilium/cilium#31904

Signed-off-by: Tim Horner <timothy.horner@isovalent.com>
@marseel
Copy link
Contributor

marseel commented Apr 16, 2024

Issue related to CES and ces-slice-mode option: #31564

@lmb lmb added sig/agent Cilium agent related. kind/cfp labels Apr 16, 2024
@dlapcevic
Copy link
Contributor

I think ces-slice-mode should be deprecated together with Identity batching mode, and FCFS batching mode will be the default and only one.

Reason:
FCFS is a simpler algorithm (fewer possible bugs) that has better performance for almost all cases.

@dlapcevic
Copy link
Contributor

dlapcevic commented Apr 17, 2024

Is the purpose of having a separate K8s client for CES to ensure that it doesn't impact or get impacted by other resources -- client side rate limiting for example?

If so, is there a pattern that can be applied across resources? Which resources require a separate client?

@marseel
Copy link
Contributor

marseel commented Apr 18, 2024

If so, is there a pattern that can be applied across resources? Which resources require a separate client?

I don't think there is a pattern in Cilium yet, but definitely, there are other controllers for which we would like to do it too.
The pattern is mostly inspired by kcm in k8s.

@jshr-w
Copy link
Contributor

jshr-w commented Apr 18, 2024

I would like to help with the CI coverage for the migration path, as in the issue above. Would someone be able to help assign the issue to me, please?

@thorn3r
Copy link
Contributor Author

thorn3r commented Apr 19, 2024

Agreed that the identity-based mode should be deprecated. We should also add some validation for the identity slice mode option. Currently, if anything other than cesSliceModeIdentity is supplied (including typos), you get FCFS. The risk is low, but still worth covering the bases imo.

@joestringer joestringer added the pinned These issues are not marked stale by our issue bot. label May 9, 2024
@thorn3r thorn3r added this to the 1.16 milestone May 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/cfp pinned These issues are not marked stale by our issue bot. sig/agent Cilium agent related.
Projects
None yet
Development

No branches or pull requests

7 participants