Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

loqrecovery: parallelize the data collection phase #122639

Closed
miraradeva opened this issue Apr 18, 2024 · 0 comments · Fixed by #123011
Closed

loqrecovery: parallelize the data collection phase #122639

miraradeva opened this issue Apr 18, 2024 · 0 comments · Fixed by #123011
Assignees
Labels
A-kv Anything in KV that doesn't belong in a more specific category. C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) O-postmortem Originated from a Postmortem action item. O-support Would prevent or help troubleshoot a customer escalation - bugs, missing observability/tooling, docs T-kv KV Team
Projects

Comments

@miraradeva
Copy link
Contributor

miraradeva commented Apr 18, 2024

When preparing a recovery plan, the LoQ tool runs on a coordinator node that collects info from all available nodes in the cluster.

  1. Visiting the available nodes is done sequentially, and each visitor function returns only after it's heard back from all replicas on the node.
  2. When a node visits all its local replicas, it does so one store at a time.

We are essentially iterating over all range descriptors on the cluster sequentially. For a cluster with over a million ranges, it can take several hours to collect all the needed info. We should try parallelizing each of (1) and (2) above, or at least (1).

Jira issue: CRDB-38011

Epic CRDB-37617

@miraradeva miraradeva added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) O-support Would prevent or help troubleshoot a customer escalation - bugs, missing observability/tooling, docs A-kv Anything in KV that doesn't belong in a more specific category. labels Apr 18, 2024
@arulajmani arulajmani added the O-postmortem Originated from a Postmortem action item. label Apr 19, 2024
@nicktrav nicktrav added the T-kv KV Team label Apr 23, 2024
@blathers-crl blathers-crl bot added this to Incoming in KV Apr 23, 2024
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this issue Apr 24, 2024
Informs cockroachdb#122639.

This commit adds a `--max-concurrency` flag to the following `debug
recover` commands:
- `debug recovery collect-info`
- `debug recovery make-plan`
- `debug recovery apply-plan`
- `debug recovery verify`

The flag controls the maximum concurrency when fanning out RPCs to nodes
in the cluster while servicing the command. It defaults to `2 x num_cpus`,
which is a decent proxy for how much fanout the process can tolerate without
overloading itself.

Some plumbing is performed, but the flags are currently unused.

Release note (cli change): Added `--max-concurrency` flag to `debug
recover` commands to control the maximum concurrency when fanning out
RPCs to nodes in the cluster. The flag defaults to `2 x num_cpus`.
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this issue Apr 24, 2024
…Info

Informs cockroachdb#122639.

This commit updates `CollectRemoteReplicaInfo` to permit out-of-order
ReplicaInfo responses, instead of assuming that the responses are in
order by node ID. This is a precursor to being able to perform a
parallel fanout in RecoveryCollectReplicaInfo.

The migration story for this is simple — this is only needed if the
MaxParallelism passed to RecoveryCollectReplicaInfo is greater than 1.
Old versions of the cockroachdb binary that can not tolerate out of
order responses will always pass a MaxParallelism of 0, so the
assumption that they are making about in-order responses will still
hold.

Release note: None
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this issue Apr 24, 2024
Informs cockroachdb#122639.

This commit uses the MaxConcurrency parameters added in the previous
commits to parallelize the fanout of RPCs to nodes in the cluster during
the data collection phase of LoQ.

Release note: None
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this issue Apr 24, 2024
Closes cockroachdb#122639.

This commit parallelizes the fanout of store iteration during the data
collection phase of LoQ. This fanout is not limited, as we expect nodes
to have a sufficiently high cpu-to-store ratio to handle the fanout.

Release note: None
@nvanbenschoten nvanbenschoten self-assigned this Apr 24, 2024
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this issue Apr 25, 2024
Informs cockroachdb#122639.

This commit adds a `--max-concurrency` flag to the following `debug
recover` commands:
- `debug recover collect-info`
- `debug recover make-plan`
- `debug recover apply-plan`
- `debug recover verify`

The flag controls the maximum concurrency when fanning out RPCs to nodes
in the cluster while servicing the command. It defaults to `2 x num_cpus`,
which is a decent proxy for how much fanout the process can tolerate without
overloading itself.

Some plumbing is performed, but the flags are currently unused.

Release note (cli change): Added `--max-concurrency` flag to `debug
recover` commands to control the maximum concurrency when fanning out
RPCs to nodes in the cluster. The flag defaults to `2 x num_cpus`.
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this issue Apr 25, 2024
…Info

Informs cockroachdb#122639.

This commit updates `CollectRemoteReplicaInfo` to permit out-of-order
ReplicaInfo responses, instead of assuming that the responses are in
order by node ID. This is a precursor to being able to perform a
parallel fanout in RecoveryCollectReplicaInfo.

The migration story for this is simple — this is only needed if the
MaxParallelism passed to RecoveryCollectReplicaInfo is greater than 1.
Old versions of the cockroachdb binary that can not tolerate out of
order responses will always pass a MaxParallelism of 0, so the
assumption that they are making about in-order responses will still
hold.

Release note: None
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this issue Apr 25, 2024
Informs cockroachdb#122639.

This commit uses the MaxConcurrency parameters added in the previous
commits to parallelize the fanout of RPCs to nodes in the cluster during
the data collection phase of LoQ.

Release note: None
craig bot pushed a commit that referenced this issue Apr 25, 2024
123011: loqrecovery: parallelize data collection across nodes and stores r=arulajmani,miraradeva a=nvanbenschoten

Fixes #122639.

This PR adds a `--max-concurrency` flag to the following `debug recover` commands:
- `debug recover collect-info`
- `debug recover make-plan`
- `debug recover apply-plan`
- `debug recover verify`

The flag controls the maximum concurrency when fanning out RPCs to nodes in the cluster while servicing the command. It defaults to `2 x num_cpus`, which is a decent proxy for how much fannout the process can tolerate without overloading itself.

The PR then uses this maximum concurrency flag to control the parallelism for the RPC fanout to nodes in the cluster during the data collection phase of LoQ.

The PR also parallelizes the fanout of store iteration during the data collection phase of LoQ. This fanout is not limited, as we expect nodes to have a sufficiently high cpu-to-store ratio to handle the fanout.

### Experiment

We measure the time it takes to recover from loss-of-quorum on a 16 (32vCPU) node x 4 store-per-node cluster with 200k ranges.

```bash
roachprod create nathan-loq -n 16 --gce-machine-type=n2-standard-32 --local-ssd=true --gce-local-ssd-count=4 --gce-enable-multiple-stores
roachprod stage  nathan-loq cockroach
roachprod start  nathan-loq --store-count=4
roachprod run    nathan-loq:1 -- './cockroach workload init kv --splits=200000 {pgurl:1}'

# create some range descriptor churn to build up multiple versions of each descriptor.
# wait for full replication between steps.
roachprod sql nathan-loq:1 -- -e "ALTER RANGE default CONFIGURE ZONE USING num_replicas = 9"
roachprod sql nathan-loq:1 -- -e "ALTER RANGE default CONFIGURE ZONE USING num_replicas = 3"

# import and run a workload for a bit to push range descriptors out of cache.
roachprod run nathan-loq:1 -- './cockroach workload init tpcc --warehouses=10000 {pgurl:1}'
roachprod run nathan-loq:1 -- './cockroach workload run kv --min-block-bytes=40000 --max-block-bytes=40000 --concurrency=256 --read-percent=50 --duration=5m {pgurl:1-9}'

# create range unavailability.
roachprod stop nathan-loq:6,9

# create LoQ recovery plan while measuring how long this takes.
time roachprod run nathan-loq:1 -- './cockroach debug recover make-plan --confirm y --insecure --port={pgport:1} > plan.json'
```

Before this change, the `cockroach debug recover make-plan` command took **2m37.265s**.

After this change, the `cockroach debug recover make-plan` command takes **0m7.520s**, a **95.2%** speedup.

----

Release note (cli change): Added `--max-concurrency` flag to `debug recover` commands to control the maximum concurrency when fanning out RPCs to nodes in the cluster. The flag defaults to `2 x num_cpus`.

Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>
@craig craig bot closed this as completed in 07442ed Apr 25, 2024
blathers-crl bot pushed a commit that referenced this issue Apr 25, 2024
Informs #122639.

This commit adds a `--max-concurrency` flag to the following `debug
recover` commands:
- `debug recover collect-info`
- `debug recover make-plan`
- `debug recover apply-plan`
- `debug recover verify`

The flag controls the maximum concurrency when fanning out RPCs to nodes
in the cluster while servicing the command. It defaults to `2 x num_cpus`,
which is a decent proxy for how much fanout the process can tolerate without
overloading itself.

Some plumbing is performed, but the flags are currently unused.

Release note (cli change): Added `--max-concurrency` flag to `debug
recover` commands to control the maximum concurrency when fanning out
RPCs to nodes in the cluster. The flag defaults to `2 x num_cpus`.
blathers-crl bot pushed a commit that referenced this issue Apr 25, 2024
…Info

Informs #122639.

This commit updates `CollectRemoteReplicaInfo` to permit out-of-order
ReplicaInfo responses, instead of assuming that the responses are in
order by node ID. This is a precursor to being able to perform a
parallel fanout in RecoveryCollectReplicaInfo.

The migration story for this is simple — this is only needed if the
MaxParallelism passed to RecoveryCollectReplicaInfo is greater than 1.
Old versions of the cockroachdb binary that can not tolerate out of
order responses will always pass a MaxParallelism of 0, so the
assumption that they are making about in-order responses will still
hold.

Release note: None
blathers-crl bot pushed a commit that referenced this issue Apr 25, 2024
Informs #122639.

This commit uses the MaxConcurrency parameters added in the previous
commits to parallelize the fanout of RPCs to nodes in the cluster during
the data collection phase of LoQ.

Release note: None
blathers-crl bot pushed a commit that referenced this issue Apr 25, 2024
Closes #122639.

This commit parallelizes the fanout of store iteration during the data
collection phase of LoQ. This fanout is not limited, as we expect nodes
to have a sufficiently high cpu-to-store ratio to handle the fanout.

Release note: None
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this issue Apr 25, 2024
Informs cockroachdb#122639.

This commit adds a `--max-concurrency` flag to the following `debug
recover` commands:
- `debug recover collect-info`
- `debug recover make-plan`
- `debug recover apply-plan`
- `debug recover verify`

The flag controls the maximum concurrency when fanning out RPCs to nodes
in the cluster while servicing the command. It defaults to `2 x num_cpus`,
which is a decent proxy for how much fanout the process can tolerate without
overloading itself.

Some plumbing is performed, but the flags are currently unused.

Release note (cli change): Added `--max-concurrency` flag to `debug
recover` commands to control the maximum concurrency when fanning out
RPCs to nodes in the cluster. The flag defaults to `2 x num_cpus`.
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this issue Apr 25, 2024
…Info

Informs cockroachdb#122639.

This commit updates `CollectRemoteReplicaInfo` to permit out-of-order
ReplicaInfo responses, instead of assuming that the responses are in
order by node ID. This is a precursor to being able to perform a
parallel fanout in RecoveryCollectReplicaInfo.

The migration story for this is simple — this is only needed if the
MaxParallelism passed to RecoveryCollectReplicaInfo is greater than 1.
Old versions of the cockroachdb binary that can not tolerate out of
order responses will always pass a MaxParallelism of 0, so the
assumption that they are making about in-order responses will still
hold.

Release note: None
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this issue Apr 25, 2024
Informs cockroachdb#122639.

This commit uses the MaxConcurrency parameters added in the previous
commits to parallelize the fanout of RPCs to nodes in the cluster during
the data collection phase of LoQ.

Release note: None
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this issue Apr 25, 2024
Closes cockroachdb#122639.

This commit parallelizes the fanout of store iteration during the data
collection phase of LoQ. This fanout is not limited, as we expect nodes
to have a sufficiently high cpu-to-store ratio to handle the fanout.

Release note: None
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this issue Apr 25, 2024
Informs cockroachdb#122639.

This commit adds a `--max-concurrency` flag to the following `debug
recover` commands:
- `debug recover collect-info`
- `debug recover make-plan`
- `debug recover apply-plan`
- `debug recover verify`

The flag controls the maximum concurrency when fanning out RPCs to nodes
in the cluster while servicing the command. It defaults to `2 x num_cpus`,
which is a decent proxy for how much fanout the process can tolerate without
overloading itself.

Some plumbing is performed, but the flags are currently unused.

Release note (cli change): Added `--max-concurrency` flag to `debug
recover` commands to control the maximum concurrency when fanning out
RPCs to nodes in the cluster. The flag defaults to `2 x num_cpus`.
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this issue Apr 25, 2024
…Info

Informs cockroachdb#122639.

This commit updates `CollectRemoteReplicaInfo` to permit out-of-order
ReplicaInfo responses, instead of assuming that the responses are in
order by node ID. This is a precursor to being able to perform a
parallel fanout in RecoveryCollectReplicaInfo.

The migration story for this is simple — this is only needed if the
MaxParallelism passed to RecoveryCollectReplicaInfo is greater than 1.
Old versions of the cockroachdb binary that can not tolerate out of
order responses will always pass a MaxParallelism of 0, so the
assumption that they are making about in-order responses will still
hold.

Release note: None
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this issue Apr 25, 2024
Informs cockroachdb#122639.

This commit uses the MaxConcurrency parameters added in the previous
commits to parallelize the fanout of RPCs to nodes in the cluster during
the data collection phase of LoQ.

Release note: None
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this issue Apr 25, 2024
Closes cockroachdb#122639.

This commit parallelizes the fanout of store iteration during the data
collection phase of LoQ. This fanout is not limited, as we expect nodes
to have a sufficiently high cpu-to-store ratio to handle the fanout.

Release note: None
msbutler pushed a commit to msbutler/cockroach that referenced this issue Apr 26, 2024
Informs cockroachdb#122639.

This commit adds a `--max-concurrency` flag to the following `debug
recover` commands:
- `debug recover collect-info`
- `debug recover make-plan`
- `debug recover apply-plan`
- `debug recover verify`

The flag controls the maximum concurrency when fanning out RPCs to nodes
in the cluster while servicing the command. It defaults to `2 x num_cpus`,
which is a decent proxy for how much fanout the process can tolerate without
overloading itself.

Some plumbing is performed, but the flags are currently unused.

Release note (cli change): Added `--max-concurrency` flag to `debug
recover` commands to control the maximum concurrency when fanning out
RPCs to nodes in the cluster. The flag defaults to `2 x num_cpus`.
msbutler pushed a commit to msbutler/cockroach that referenced this issue Apr 26, 2024
…Info

Informs cockroachdb#122639.

This commit updates `CollectRemoteReplicaInfo` to permit out-of-order
ReplicaInfo responses, instead of assuming that the responses are in
order by node ID. This is a precursor to being able to perform a
parallel fanout in RecoveryCollectReplicaInfo.

The migration story for this is simple — this is only needed if the
MaxParallelism passed to RecoveryCollectReplicaInfo is greater than 1.
Old versions of the cockroachdb binary that can not tolerate out of
order responses will always pass a MaxParallelism of 0, so the
assumption that they are making about in-order responses will still
hold.

Release note: None
msbutler pushed a commit to msbutler/cockroach that referenced this issue Apr 26, 2024
Informs cockroachdb#122639.

This commit uses the MaxConcurrency parameters added in the previous
commits to parallelize the fanout of RPCs to nodes in the cluster during
the data collection phase of LoQ.

Release note: None
msbutler pushed a commit to msbutler/cockroach that referenced this issue Apr 26, 2024
Closes cockroachdb#122639.

This commit parallelizes the fanout of store iteration during the data
collection phase of LoQ. This fanout is not limited, as we expect nodes
to have a sufficiently high cpu-to-store ratio to handle the fanout.

Release note: None
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-kv Anything in KV that doesn't belong in a more specific category. C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) O-postmortem Originated from a Postmortem action item. O-support Would prevent or help troubleshoot a customer escalation - bugs, missing observability/tooling, docs T-kv KV Team
Projects
KV
Incoming
Development

Successfully merging a pull request may close this issue.

4 participants