etcd backend sharding support #63

wilsonwang371 · 2021-02-04T18:57:09Z

As the number of nodes in our k8s cluster increases significantly, we saw etcd gradually becoming our cluster performance bottleneck. apiserver already supports different objects using different etcd cluster. But in our case, we saw a large number of pods objects and we investigate if it is possible to use multiple shards within one resource type.

Some companies already do the sharding based on the key, similar to what TiKV did. However, I want to discuss the possibilities to use etcd shards based on the key hash.

For example, I have key k1 and key k2, after getting their md5 and mod ops, I will put k1 to etcd shard1 and k2 to etcd shard2. This will balance the loads among different etcd clusters and get higher throughput.

AFAIK, apiserver uses only a limited number of operations supported by etcd. These are Range/Txn/Watch operations. Also, regarding Txn operation, they are only simple transactions doing single Create, Update or Delete operation.

If we use the proposed sharding, for single Create/Update/Delete operations, it seems simple. But for Watch and Range request, apiserver needs to maintain a connection to each one of the etcd shard. Could there be any issue with this regarding the Range/Watch performance?

When apiserver holds multiple etcd shard connections, apiserver also need to remember each etcd shard's latest revision. Regarding this, I am thinking make changes to the APIObjectVersioner object so that a resource version vector is supported. It will be something similar to "{Shard1:Rev1,Shard2:Rev2}". So that the revision positions in each of the shard connection is kept.

Do you guys see any issue with this general design? I want to get some feedback from you guys so that if it is do-able, we can make this change and contribute this back to the opensource community.

xww · 2022-04-22T14:47:56Z

why close？i think it's a good idea

AlexanderYastrebov · 2022-10-28T10:57:47Z

Seems to be a duplicate of kubernetes/kubernetes#98814

anveshreddy18 · 2024-03-29T06:37:23Z

Can we reopen this @wilsonwang371 ? the duplicate was closed as they wanted to put it as a KEP but it didn't go through. Having both the issues closed regarding an important issue doesn't seem ideal. I guess apiserver is the right repo to track this.

Or if the issue or proposal is being continued at a different place, can you pls refer it here to help others. Thanks

wilsonwang371 · 2024-04-08T21:44:21Z

I was thinking about this before, but finally, I made some contributions to etcd 3.5 and the TXN performance got significantly improved. So that's why I did not work on this for a long time.

It seems like many people are interested in this topic.

k8s-triage-robot · 2024-07-07T22:14:44Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot · 2024-08-06T22:38:44Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle rotten
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot · 2024-09-05T23:02:32Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen
Mark this issue as fresh with /remove-lifecycle rotten
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

k8s-ci-robot · 2024-09-05T23:02:36Z

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied

After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied

After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen

Mark this issue as fresh with /remove-lifecycle rotten

Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

wilsonwang371 closed this as completed Feb 5, 2021

wilsonwang371 reopened this Apr 8, 2024

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 7, 2024

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Aug 6, 2024

k8s-ci-robot closed this as not planned Won't fix, can't repro, duplicate, stale Sep 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

etcd backend sharding support #63

etcd backend sharding support #63

wilsonwang371 commented Feb 4, 2021

xww commented Apr 22, 2022

AlexanderYastrebov commented Oct 28, 2022

anveshreddy18 commented Mar 29, 2024 •

edited

Loading

wilsonwang371 commented Apr 8, 2024 •

edited

Loading

k8s-triage-robot commented Jul 7, 2024

k8s-triage-robot commented Aug 6, 2024

k8s-triage-robot commented Sep 5, 2024

k8s-ci-robot commented Sep 5, 2024

etcd backend sharding support #63

etcd backend sharding support #63

Comments

wilsonwang371 commented Feb 4, 2021

xww commented Apr 22, 2022

AlexanderYastrebov commented Oct 28, 2022

anveshreddy18 commented Mar 29, 2024 • edited Loading

wilsonwang371 commented Apr 8, 2024 • edited Loading

k8s-triage-robot commented Jul 7, 2024

k8s-triage-robot commented Aug 6, 2024

k8s-triage-robot commented Sep 5, 2024

k8s-ci-robot commented Sep 5, 2024

anveshreddy18 commented Mar 29, 2024 •

edited

Loading

wilsonwang371 commented Apr 8, 2024 •

edited

Loading