New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for paginated lists in etcd, and propagate global config options #25469
Conversation
/test |
e27678f
to
bf11026
Compare
/test |
TravisCI failed with the same error reported in #25272. Rerunning |
A couple of Cilium Conformance E2E tests failed while provisioning LVH: #25483. Manually retriggering the affected ones. |
/test-1.25-4.19 Failed during initialization with errors like:
|
/test-1.25-4.19 It seems it didn't restart with the previous comment... Probably I've edited it too early. |
/test-1.25-4.19 |
|
||
log.WithFields(logrus.Fields{ | ||
fieldNumEntries: len(res.Kvs), | ||
fieldRemainingEntries: res.Count - int64(len(res.Kvs)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at the documentation of .Count
it seems it's not always set?
// count is set to the number of keys within the range when requested.
I couldn't think any option other than isCountOnly
though 🤔
Does it work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAIU it is always returned by etcd [1], [2], and from an empirical test it works (elements = 10, limit = 4):
level=debug msg="Received list response from etcd" numEntries=4 remainingEntries=6 subsys=kvstore
level=debug msg="Received list response from etcd" numEntries=4 remainingEntries=2 subsys=kvstore
level=debug msg="Received list response from etcd" numEntries=2 remainingEntries=0 subsys=kvstore
[1]: https://github.com/etcd-io/etcd/blob/08407ff7600eb16c4445d5f21c4fafaf19412e24/server/etcdserver/apply.go#L418
[2]: https://github.com/etcd-io/etcd/blob/08407ff7600eb16c4445d5f21c4fafaf19412e24/server/mvcc/kvstore_txn.go#L179
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While checking the etcd source code, I also discovered that sorting is not performed by default if the WithSort
option is not specified [1]. That shouldn't introduce a big penalty though, since pdqsort is optimized to complete in linear time when the input is already sorted [2], which is the case [3].
[1]: https://github.com/etcd-io/etcd/blob/08407ff7600eb16c4445d5f21c4fafaf19412e24/server/etcdserver/apply.go#L390-L410
[2]: https://github.com/orlp/pdqsort#the-best-case
[3]: https://github.com/etcd-io/etcd/blob/08407ff7600eb16c4445d5f21c4fafaf19412e24/server/etcdserver/apply.go#L385-L387
This commit extends the ListAndWatch implementation of the etcd client with the support for pagination during the initial list operation. This allows to retrieve the initial state in batches, reducing the overall load in case a large number of entries needs to be obtained. The batch size can be configured through the `etcd.limit` option, which defaults to 256. Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
Currently, it is possible to specify etcd-related options (e.g., rate limiting, keepalive heartbeat and timeout, ...) through the `--kvstore-opts agent` flag. Yet, these options apply only to the "main" etcd connection (if enabled), and not to the ones towards remote clusters. Let's change this behavior and apply them also in the clustermesh scenario, to enable better customization and for consistency with other configuration parameters (e.g., `--kvstore-lease-ttl` and `--kvstore-max-consecutive-quorum-errors`). Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
bf11026
to
0f3f4f9
Compare
I've updated the PR making |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, thanks!
/test Job 'Cilium-PR-K8s-1.26-kernel-net-next' failed: Click to show.Test Name
Failure Output
Jenkins URL: https://jenkins.cilium.io/job/Cilium-PR-K8s-1.26-kernel-net-next/71/ If it is a flake and a GitHub issue doesn't already exist to track it, comment Then please upload the Jenkins artifacts to that issue. |
A few Cilium Conformance E2E jobs failed during provisioning: #25483. Manually rerunning. |
/test-runtime |
/test-1.26-net-next |
External workloads workflow hit unrelated issue: #25636 |
/test-runtime Hit new flake #25637 |
/ci-external-workloads |
/test-1.26-net-next Failed with known flakes: #25605, #25524 Job 'Cilium-PR-K8s-1.26-kernel-net-next' failed: Click to show.Test Name
Failure Output
Jenkins URL: https://jenkins.cilium.io/job/Cilium-PR-K8s-1.26-kernel-net-next/138/ If it is a flake and a GitHub issue doesn't already exist to track it, comment Then please upload the Jenkins artifacts to that issue. |
k8s-1.26-kernel-net-next failed again due to known flake #25605. Given the currently high failure rate (>50%) it wouldn't be much useful to rerun it again. Marking this PR as ready to merge, since all the other tests are green, and all reviews are in. |
This PR extends the ListAndWatch implementation of the etcd client with the support for pagination during the initial list operation. This allows to retrieve the initial state in batches, reducing the overall load in case a large number of entries needs to be obtained. Additionally, the global etcd-related options are applied also to the etcd clients targeting remote clusters, to enable better customization and for consistency with other configuration parameters.
Tagging as
release-note/minor
due to the slight behavioral change as etcd options are propagated also to clients targeting remote clusters.