Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce kvstoremesh, a clustermesh-apiserver companion component #26083

Merged
merged 6 commits into from Jun 15, 2023

Conversation

giorio94
Copy link
Member

@giorio94 giorio94 commented Jun 9, 2023

Please review commit by commit. I'm reporting the description of the main commit here for your convenience:

In the current clustermesh implementation, each agent connects to the
kvstore in all remote clusters, pulling information about nodes,
services, identities and IPs. The number of etcd watchers can grow
significantly large (~ 5 * total nodes in the clustermesh), as well
as the number of events to be relayed by etcd to the agents (especially
if the churn rate is high).

kvstoremesh is a new component responsible for synchronizing the
information from the remote kvstores to the local one. In other words,
the local kvstore caches the state of the entire clustermesh, so that
all agents only need to connect to the local kvstore, and not the remote
ones. Overall, this provides better isolation (because the only component
connecting to remote kvstores is now the kvstoremesh operator) and
reduces the number of agents connecting to a single kvstore.

While the number of events to be processed by a single kvstore is
similar to the vanilla clustermesh case, with kvstoremesh the load
is less spiky (because a specific event is propagated to a much lower
number of agents) and it is also easier to rate limit (configuring the
maximum number of keys that can be written by kvstoremesh per second).
This allows to trade latency for reliability in case of high churn in
the clustermesh. Finally, given that the number of watchers is
proportional to the number of nodes in the local cluster only (rather
than the nodes in the entire clustermesh), the dimensioning of the
kvstore is also simplified.

The kvstoremesh concept has been initially proposed by the Cloud-Network
team @Trip.com, led by Arthur Chiao:
http://arthurchiao.art/blog/trip-first-step-towards-cloud-native-security/

The Dockerfile and CI modifications required to build the new image, as well as the appropriate extensions to the helm chart, will be introduced in a subsequent PR to limit the size of the current one.

Introduce kvstoremesh, a clustermesh-apiserver companion component allowing to cache remote cluster information in the local kvstore for increased scalability and separation.

@giorio94 giorio94 added kind/performance There is a performance impact of this. release-note/major This PR introduces major new functionality to Cilium. area/clustermesh Relates to multi-cluster routing functionality in Cilium. release-blocker/1.14 This issue will prevent the release of the next version of Cilium. labels Jun 9, 2023
@giorio94 giorio94 requested review from a team as code owners June 9, 2023 15:47
@giorio94
Copy link
Member Author

giorio94 commented Jun 9, 2023

/test

kvstoremesh/metrics/metrics.go Show resolved Hide resolved
kvstoremesh/metrics/metrics.go Outdated Show resolved Hide resolved
kvstoremesh/metrics/metrics.go Show resolved Hide resolved
@giorio94
Copy link
Member Author

/test

@giorio94
Copy link
Member Author

/test

@giorio94
Copy link
Member Author

I've pushed a couple of minor changes to improve the reliability of the integration test, as one failure was observed in the previous run.

@giorio94
Copy link
Member Author

/test

@giorio94
Copy link
Member Author

/ci-aks

pkg/defaults/defaults.go Outdated Show resolved Hide resolved
kvstoremesh/option/config.go Show resolved Hide resolved
@giorio94
Copy link
Member Author

/test

Copy link
Member

@tklauser tklauser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@joestringer joestringer added the kind/feature This introduces new functionality. label Jun 14, 2023
Currently, the KVstoreLeaseTTL option value is directly used to
initialize the etcd lease manager. Yet, that option is unset when
running the integration tests, causing the etcd server to fallback
to very low TTL values (i.e., 2 seconds), which contribute to test
flakiness. Hence, let's use the default value (15 minutes) when that
option is unset, to prevent issues due to lease expiration.

Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
Extend the CiliumClusterConfig struct with the additional Cached
capability. It conveys the fact that the information concerning the
given cluster is cached from an external kvstore (for instance, by
kvstoremesh). This implies that all keys are stored under the dedicated
`cilium/cache` prefix (rather than the standard `cilium/state` one),
and all are cluster-scoped (including also IPs and identities).

The usage of completely disjoint prefixes is required because currently
nodes and services entries are cluster-scoped, but IPs and identities
ones not. This makes it impossible to watch only the information of a
given cluster if a single kvstore caches entries for multiple clusters.
At the same time, this approach removes the need full downgrade/upgrade
logic which, would be required by a complete switch over (it would be
tricky if not impossible to get right, since old agents would not
understand the new prefixes). Finally, it also clearly highlights that
the information therein stored is relayed from an external source.

For reference, the full "cached" prefixes are:
* cilium/cache/nodes/v1/<cluster-name>/
* cilium/cache/services/v1/<cluster-name>/
* cilium/cache/ip/v1/<cluster-name>/
* cilium/cache/identities/v1/<cluster-name>/

Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
In the current clustermesh implementation, each agent connects to the
kvstore in all remote clusters, pulling information about nodes,
services, identities and IPs. The number of etcd watchers can grow
significantly large (~ 5 * total nodes in the clustermesh), as well
as the number of events to be relayed by etcd to the agents (especially
if the churn rate is high).

kvstoremesh is a new component responsible for synchronizing the
information from the remote kvstores to the local one. In other words,
the local kvstore caches the state of the entire clustermesh, so that
all agents only need to connect to the local kvstore, and not the remote
ones. Overall, this provides better isolation (because the only component
connecting to remote kvstores is now the kvstoremesh operator) and
reduces the number of agents connecting to a single kvstore.

While the number of events to be processed by a single kvstore is
similar to the vanilla clustermesh case, with kvstoremesh the load
is less spiky (because a specific event is propagated to a much lower
number of agents) and it is also easier to rate limit (configuring the
maximum number of keys that can be written by kvstoremesh per second).
This allows to trade latency for reliability in case of high churn in
the clustermesh. Finally, given that the number of watchers is
proportional to the number of nodes in the local cluster only (rather
than the nodes in the entire clustermesh), the dimensioning of the
kvstore is also simplified.

The kvstoremesh concept has been initially proposed by the Cloud-Network
team @Trip.com, led by Arthur Chiao:
http://arthurchiao.art/blog/trip-first-step-towards-cloud-native-security/

This commit introduces the logic responsible for watching remote
kvstores and caching the retrieved information in the local one.

Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
This commit introduces the scaffolding for kvstoremesh, specifically:
* the kvstoremesh main file, which builds the hive and registers the
  appropriate flags. Most notably, it leverages the newly introduced
  kvstoremesh cell and the kvstore cell;
* the metrics server, to expose the information concerning kvstore
  operations;
* the makefile, to build kvstoremesh (inspired from that already
  available for the clustermesh-apiserver).

Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
This commit extends the metrics documentation with the information about
the newly introduced kvstoremesh metrics. Configuration options will be
added alongside the helm chart modifications.

Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
kvstoremesh is a companion component of the already existing
clustermesh-apiserver, hence it feels natural to assign it to the
sig-clustermesh team.

Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
@giorio94
Copy link
Member Author

Rebased onto main to fix CI failure

@giorio94
Copy link
Member Author

/test

@giorio94
Copy link
Member Author

/ci-ginkgo

Copy link
Contributor

@michi-covalent michi-covalent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mostly rubber stamping on behalf of the tophat team. lgtm assuming it's getting reviewed by the actual code owner (sig-clustermesh team) ✅

@giorio94
Copy link
Member Author

@tommyp1ckles PTAL

Copy link
Member

@aditighag aditighag left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@giorio94 It's not obvious if there are any policy related changes in the PR. It looks like sig-policy review was requested after this change - https://github.com/cilium/cilium/compare/9a82196551f3ce7b212c8884c3a5b62895d1d489..d5abdcadbb249cc901ae0511ab332d3346864ee8.
Let me know if that's not the case.

Copy link
Contributor

@tommyp1ckles tommyp1ckles left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Metrics LGTM 🙏

@maintainer-s-little-helper maintainer-s-little-helper bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Jun 15, 2023
@giorio94
Copy link
Member Author

@giorio94 It's not obvious if there are any policy related changes in the PR. It looks like sig-policy review was requested after this change - https://github.com/cilium/cilium/compare/9a82196551f3ce7b212c8884c3a5b62895d1d489..d5abdcadbb249cc901ae0511ab332d3346864ee8. Let me know if that's not the case.

Yeah, there's no policy-related change. It's only the metrics documentation page which seems to be also owned by sig-policy. Thanks!

@sayboras sayboras merged commit f014717 into cilium:main Jun 15, 2023
60 of 65 checks passed
giorio94 added a commit to giorio94/cilium-cli that referenced this pull request Jun 19, 2023
This commit extends the `cilium clustermesh enable` command with a new
flag which can be used to enable kvstoremesh (cilium/cilium#26083). The
new flag is supported only when using the new Helm mode, and requires
cilium >= 1.14 (it has no effect on earlier versions).

Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
michi-covalent pushed a commit to cilium/cilium-cli that referenced this pull request Jun 21, 2023
This commit extends the `cilium clustermesh enable` command with a new
flag which can be used to enable kvstoremesh (cilium/cilium#26083). The
new flag is supported only when using the new Helm mode, and requires
cilium >= 1.14 (it has no effect on earlier versions).

Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/clustermesh Relates to multi-cluster routing functionality in Cilium. kind/feature This introduces new functionality. kind/performance There is a performance impact of this. ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-blocker/1.14 This issue will prevent the release of the next version of Cilium. release-note/major This PR introduces major new functionality to Cilium.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants