Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kvserver: delay range quiescence #103266

Merged
merged 2 commits into from May 18, 2023

Conversation

erikgrinaker
Copy link
Contributor

@erikgrinaker erikgrinaker commented May 14, 2023

This patch only quiesces ranges after 6 ticks (3 seconds) without any proposals, configurable via COCKROACH_QUIESCE_AFTER_TICKS. Unquiescence incurs a proposal, which has a non-negligible cost, and on low-latency clusters with steady write load this may otherwise (un)quiesce ranges very frequently, as often as every tick.

Resolves #63295.
Epic: none

Release note (performance improvement): ranges now only quiesce after 3 seconds without proposals, to avoid frequent unquiescence which incurs an additional Raft proposal. This is configurable via COCKROACH_QUIESCE_AFTER_TICKS which defaults to 6.

@erikgrinaker erikgrinaker requested review from pav-kv and tbg May 14, 2023 21:11
@erikgrinaker erikgrinaker self-assigned this May 14, 2023
@erikgrinaker erikgrinaker requested review from a team as code owners May 14, 2023 21:11
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@erikgrinaker
Copy link
Contributor Author

erikgrinaker commented May 14, 2023

A quick kv95 run on a 3-node 4-CPU cluster didn't show a significant difference -- a slight increase in throughput, but possibly just noise. The resource graphs didn't show a significant difference either. Let's see what the nightlies say.

Before:

_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total
  180.0s        0        2450660        13614.8      4.2      3.5     11.5     18.9     71.3  read

_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total
  180.0s        0         129802          721.1      8.8      7.9     18.9     30.4     75.5  write

_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__result
  180.0s        0        2580462        14335.9      4.5      3.7     12.1     19.9     75.5  

After:

_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total
  180.0s        0        2472175        13734.3      4.2      3.5     11.5     18.9     92.3  read

_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total
  180.0s        0         130129          722.9      8.3      7.9     17.8     24.1     65.0  write

_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__result
  180.0s        0        2602304        14457.2      4.4      3.7     12.1     18.9     92.3  

There was a big difference in the quiesced range equilibrium though, demonstrating how most ranges are continually (un)quiescing on a workload like this.

Before:

Screenshot 2023-05-14 at 23 39 34

After:

Screenshot 2023-05-14 at 23 50 57

This patch only quiesces ranges after 6 ticks (3 seconds) without any
proposals, configurable via `COCKROACH_QUIESCE_AFTER_TICKS`.
Unquiescence incurs a proposal, which has a non-negligible cost, and on
low-latency clusters with steady write load this may otherwise
(un)quiesce ranges very frequently, as often as every tick.

Epic: none
Release note (performance improvement): ranges now only quiesce after 3
seconds without proposals, to avoid frequent unquiescence which incurs
an additional Raft proposal. This is configurable via
`COCKROACH_QUIESCE_AFTER_TICKS` which defaults to 6.
@erikgrinaker
Copy link
Contributor Author

Bumped this from 4 to 6 ticks (2 to 3 seconds) considering the amount of quiescence churn we still saw with the above kv95 workload.

@erikgrinaker
Copy link
Contributor Author

bors r+

@craig
Copy link
Contributor

craig bot commented May 18, 2023

Build succeeded:

@craig craig bot merged commit e0c35b4 into cockroachdb:master May 18, 2023
6 of 7 checks passed
@erikgrinaker erikgrinaker deleted the raft-delay-quiescence branch May 30, 2023 12:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

kvserver: single node unquiesces on every tick
3 participants