Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Experimental load-balancing control-plane with StateDB #32185

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

joamaki
Copy link
Contributor

@joamaki joamaki commented Apr 25, 2024

This is the beginnings of a new load-balancing control-plane implementation that aims to unify service load-balancing information under a single representation and to allow agent components to observe and update service-related information easily.

I'm approaching this first with experimental code rather than with a CFP as IMO it's too early to bring up a CFP with wider distribution before there's answers to how to approach the many use-cases. Since there are multiple parties involved in this experiment and it will span a long period of time, I'm proposing to already merge this skeleton implementation to avoid the pain of feature branches. This code is in a separate package, disabled by default and has no production impact other than slightly larger cilium-agent binary.

The rough structure this experiment is working towards is:
highlevel

Instead of service load-balancing related data being spread over cache.Store, ServiceCache, ServiceManager, etc. we would unify it to 3 sets of tables that would serve the internal information needs and would be used towards BPF map reconciliation. This would solve many of the problems we have with merging service-related data coming from multiple data sources, reduce memory usage and speed up reconciliation throughput as less layers and context switching is involved. See also the current structure for contrast.

Since we want to mature this implementation gradually and to answer the still many small questions
related to this (topo-aware services, clustermesh, NodePort frontend IP expansion etc.), this PR is
introducing the support via a feature flag (enable-experimental-services) so this can be tested and developed
gradually. This hidden feature flag starts a reflector that populates the load-balancing tables and a reconciler
that mocks the reconciliation operations.

Nothing yet uses the new API and the flag is disabled by default, so this code has no impact
to normal operation, but it allows further collaborative development on the API.

The implementation is structured as follows:

pkg/loadbalancer/experimental is chosen as the package for these. This clearly separates
the new code from the production code. Eventually it will make sense to move the code into the
loadbalancer package that already defines the data structures related to load-balancing.

pkg/loadbalancer/experimental/{service,frontend,backend}.go:
These are the load-balancing StateDB tables. To minimize data duplication we're dividing
these into Service (the metadata about the service), Frontend and Backend. To keep things simple
to start with many of fields in Service have been omitted. The Frontend is the object we're reconciling
as that defines a useful unit of work for the BPF map reconciliation as backends need to be updated before frontends and frontends (the services BPF map) directly references them.

pkg/loadbalancer/experimental/services.go:
Implements the "writer" to the load-balancing tables. We don't want to give write access to the tables directly as we want to validate changes and we need to mark frontend status as pending to ask the reconciler to process it.

pkg/loadbalancer/experimental/reflector.go:
Observes the Kubernetes Service and EndpointSlice objects and writes to the tables via the Services API. Eventually this could be implemented directly with a client-go Reflector and skip having a cache.Store that holds an unnecessary copy.

pkg/loadbalancer/experimental/reconciler.go:
A simulation of what the proper reconciler towards the BPF maps would look like. Just logs the operations we'd need to perform.

The feature flag can be enabled and tested with:

$ cilium config set enable-experimental-services true
# wait for restart

$ kubectl exec -it -n kube-system ds/cilium -- bash
# cilium-dbg statedb experimental frontends
# cilium-dbg statedb experimental backends
# cilium-dbg statedb experimental services
$ kubectl logs -n kube-system ds/cilium | grep loadbalancer-experimental
time=2024-05-28T14:11:12Z level=info msg="Update RevNat" module=agent.controlplane.loadbalancer-experimental.reconciler id=5 address="{AddrCluster:10.96.0.1 L4Addr:{Protocol:TCP Port:443} Scope:0}"
...

Next steps following this is to start implementing a test-suite to validate how this API works with all the different use-cases we have for it, and to implement the proper BPF map reconciler so we can test using the FakeLBMap implementation for proper integration testing and to start doing e2e testing.

@maintainer-s-little-helper maintainer-s-little-helper bot added the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Apr 25, 2024
@joamaki joamaki force-pushed the feature/statedb-services branch 4 times, most recently from 55b99f9 to f2ce936 Compare April 25, 2024 14:37
@joamaki joamaki added the release-note/misc This PR makes changes that have no direct user impact. label Apr 25, 2024
@maintainer-s-little-helper maintainer-s-little-helper bot removed the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Apr 25, 2024
Copy link

@robscott robscott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for all the work on this @joamaki! This looks really promising, mostly just have a bunch of questions to make sure I'm understanding things correctly.

pkg/loadbalancer/services.go Outdated Show resolved Hide resolved
pkg/loadbalancer/services.go Outdated Show resolved Hide resolved
pkg/loadbalancer/services.go Outdated Show resolved Hide resolved
pkg/loadbalancer/services.go Outdated Show resolved Hide resolved
pkg/loadbalancer/services.go Outdated Show resolved Hide resolved
pkg/loadbalancer/loadbalancer.go Outdated Show resolved Hide resolved
pkg/loadbalancer/services.go Outdated Show resolved Hide resolved
pkg/loadbalancer/services.go Outdated Show resolved Hide resolved
pkg/service/new_reconciler.go Outdated Show resolved Hide resolved
pkg/loadbalancer/services.go Outdated Show resolved Hide resolved
@DamianSawicki
Copy link

Service type in comparison to SVC has additional fields related to reconciliation status and backend revision (the link between the two tables).

I believe Service should be Frontend.

Implements the "writer" to Table[Service] and Table[ServiceBackend].

And here probably these should be Table[Frontend] and Table[Backend].

@joamaki

This comment was marked as outdated.

@joamaki joamaki force-pushed the feature/statedb-services branch 2 times, most recently from 2703ef2 to 15b73d7 Compare May 16, 2024 15:01
@joamaki
Copy link
Contributor Author

joamaki commented May 16, 2024

/test

@joamaki
Copy link
Contributor Author

joamaki commented May 17, 2024

/test

Copy link

@DamianSawicki DamianSawicki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not very material this time: one question and the rest are just typos or rewording suggestions.

pkg/loadbalancer/experimental/tables.go Outdated Show resolved Hide resolved
pkg/loadbalancer/experimental/tables.go Outdated Show resolved Hide resolved
pkg/loadbalancer/experimental/tables.go Outdated Show resolved Hide resolved
pkg/loadbalancer/experimental/tables.go Outdated Show resolved Hide resolved
pkg/loadbalancer/experimental/tables.go Outdated Show resolved Hide resolved
pkg/loadbalancer/experimental/tables.go Outdated Show resolved Hide resolved
pkg/loadbalancer/experimental/tables.go Outdated Show resolved Hide resolved
pkg/loadbalancer/experimental/wrapper.go Outdated Show resolved Hide resolved
pkg/loadbalancer/loadbalancer.go Outdated Show resolved Hide resolved
@joamaki joamaki changed the title [DRAFT] Service load-balancing with StateDB [DRAFT] Experimental load-balancing control-plane with StateDB May 28, 2024
@joamaki joamaki force-pushed the feature/statedb-services branch 2 times, most recently from 35b7a08 to 79d2d74 Compare May 28, 2024 14:15
Implement JSON marshalling for AddrCluster type so it can
be used in StateDB objects and dumped with cilium-dbg.

Signed-off-by: Jussi Maki <jussi@isovalent.com>
For dumping StateDB objects with ImmSet[T] fields, add support
for JSON marshallign and unmarshalling.

The default implementation does not work as ImmSet[T] has
private fields only.

Signed-off-by: Jussi Maki <jussi@isovalent.com>
@joamaki joamaki force-pushed the feature/statedb-services branch 2 times, most recently from 8ad9365 to a25882c Compare May 30, 2024 13:19
Add experimental Services API for managing load-balancing frontends
and backends. This is added as a new experimental package to avoid
confusing it with the production implementation.

When the hidden "--enable-experimental-services" flag is set:
* K8s Service and Endpoints are reflected to service, frontend and backend tables
* A mock reconciler is started that logs mock operations to reconcile the frontends.

The tables can be inspected with "cilium-dbg statedb experimental" commands,
e.g. "cilium-dbg statedb experimental frontends".

Signed-off-by: Jussi Maki <jussi@isovalent.com>
The "cilium-dbg statedb" commands require the objects to be
JSON serializable, which is easy to break accidentally.

Add a fuzz test to generate arbitrary Service, Frontend and
Backend objects and validate that the TableRow() output is the
same across JSON serialization.

Signed-off-by: Jussi Maki <jussi@isovalent.com>
@joamaki
Copy link
Contributor Author

joamaki commented May 30, 2024

/test

@joamaki joamaki marked this pull request as ready for review May 30, 2024 14:15
@joamaki joamaki requested review from a team as code owners May 30, 2024 14:15
@joamaki joamaki changed the title [DRAFT] Experimental load-balancing control-plane with StateDB Experimental load-balancing control-plane with StateDB May 30, 2024
@joamaki joamaki requested review from brb and joestringer May 31, 2024 06:56
@joamaki
Copy link
Contributor Author

joamaki commented May 31, 2024

/ci-runtime

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release-note/misc This PR makes changes that have no direct user impact.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants