New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce kvstoremesh, a clustermesh-apiserver companion component #26083
Conversation
/test |
9a82196
to
d5abdca
Compare
/test |
d5abdca
to
d8db544
Compare
/test |
I've pushed a couple of minor changes to improve the reliability of the integration test, as one failure was observed in the previous run. |
/test |
/ci-aks |
4b1593b
to
d2d0371
Compare
/test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
Currently, the KVstoreLeaseTTL option value is directly used to initialize the etcd lease manager. Yet, that option is unset when running the integration tests, causing the etcd server to fallback to very low TTL values (i.e., 2 seconds), which contribute to test flakiness. Hence, let's use the default value (15 minutes) when that option is unset, to prevent issues due to lease expiration. Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
Extend the CiliumClusterConfig struct with the additional Cached capability. It conveys the fact that the information concerning the given cluster is cached from an external kvstore (for instance, by kvstoremesh). This implies that all keys are stored under the dedicated `cilium/cache` prefix (rather than the standard `cilium/state` one), and all are cluster-scoped (including also IPs and identities). The usage of completely disjoint prefixes is required because currently nodes and services entries are cluster-scoped, but IPs and identities ones not. This makes it impossible to watch only the information of a given cluster if a single kvstore caches entries for multiple clusters. At the same time, this approach removes the need full downgrade/upgrade logic which, would be required by a complete switch over (it would be tricky if not impossible to get right, since old agents would not understand the new prefixes). Finally, it also clearly highlights that the information therein stored is relayed from an external source. For reference, the full "cached" prefixes are: * cilium/cache/nodes/v1/<cluster-name>/ * cilium/cache/services/v1/<cluster-name>/ * cilium/cache/ip/v1/<cluster-name>/ * cilium/cache/identities/v1/<cluster-name>/ Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
In the current clustermesh implementation, each agent connects to the kvstore in all remote clusters, pulling information about nodes, services, identities and IPs. The number of etcd watchers can grow significantly large (~ 5 * total nodes in the clustermesh), as well as the number of events to be relayed by etcd to the agents (especially if the churn rate is high). kvstoremesh is a new component responsible for synchronizing the information from the remote kvstores to the local one. In other words, the local kvstore caches the state of the entire clustermesh, so that all agents only need to connect to the local kvstore, and not the remote ones. Overall, this provides better isolation (because the only component connecting to remote kvstores is now the kvstoremesh operator) and reduces the number of agents connecting to a single kvstore. While the number of events to be processed by a single kvstore is similar to the vanilla clustermesh case, with kvstoremesh the load is less spiky (because a specific event is propagated to a much lower number of agents) and it is also easier to rate limit (configuring the maximum number of keys that can be written by kvstoremesh per second). This allows to trade latency for reliability in case of high churn in the clustermesh. Finally, given that the number of watchers is proportional to the number of nodes in the local cluster only (rather than the nodes in the entire clustermesh), the dimensioning of the kvstore is also simplified. The kvstoremesh concept has been initially proposed by the Cloud-Network team @Trip.com, led by Arthur Chiao: http://arthurchiao.art/blog/trip-first-step-towards-cloud-native-security/ This commit introduces the logic responsible for watching remote kvstores and caching the retrieved information in the local one. Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
This commit introduces the scaffolding for kvstoremesh, specifically: * the kvstoremesh main file, which builds the hive and registers the appropriate flags. Most notably, it leverages the newly introduced kvstoremesh cell and the kvstore cell; * the metrics server, to expose the information concerning kvstore operations; * the makefile, to build kvstoremesh (inspired from that already available for the clustermesh-apiserver). Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
This commit extends the metrics documentation with the information about the newly introduced kvstoremesh metrics. Configuration options will be added alongside the helm chart modifications. Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
kvstoremesh is a companion component of the already existing clustermesh-apiserver, hence it feels natural to assign it to the sig-clustermesh team. Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
d2d0371
to
fb0a7d4
Compare
Rebased onto main to fix CI failure |
/test |
/ci-ginkgo |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mostly rubber stamping on behalf of the tophat team. lgtm assuming it's getting reviewed by the actual code owner (sig-clustermesh team) ✅
@tommyp1ckles PTAL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@giorio94 It's not obvious if there are any policy related changes in the PR. It looks like sig-policy review was requested after this change - https://github.com/cilium/cilium/compare/9a82196551f3ce7b212c8884c3a5b62895d1d489..d5abdcadbb249cc901ae0511ab332d3346864ee8.
Let me know if that's not the case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Metrics LGTM 🙏
Yeah, there's no policy-related change. It's only the metrics documentation page which seems to be also owned by sig-policy. Thanks! |
This commit extends the `cilium clustermesh enable` command with a new flag which can be used to enable kvstoremesh (cilium/cilium#26083). The new flag is supported only when using the new Helm mode, and requires cilium >= 1.14 (it has no effect on earlier versions). Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
This commit extends the `cilium clustermesh enable` command with a new flag which can be used to enable kvstoremesh (cilium/cilium#26083). The new flag is supported only when using the new Helm mode, and requires cilium >= 1.14 (it has no effect on earlier versions). Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
Please review commit by commit. I'm reporting the description of the main commit here for your convenience:
The Dockerfile and CI modifications required to build the new image, as well as the appropriate extensions to the helm chart, will be introduced in a subsequent PR to limit the size of the current one.