Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kvserver: admin scatter should retry processing on a descriptor change error #124522

Closed
kvoli opened this issue May 21, 2024 · 0 comments · Fixed by #124537
Closed

kvserver: admin scatter should retry processing on a descriptor change error #124522

kvoli opened this issue May 21, 2024 · 0 comments · Fixed by #124537
Labels
A-kv-distribution Relating to rebalancing and leasing. C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-kv KV Team
Projects

Comments

@kvoli
Copy link
Collaborator

kvoli commented May 21, 2024

Is your feature request related to a problem? Please describe.
AdminScatter currently only retries processing upon encountering a snapshot error:

if err != nil {
// TODO(tbg): can this use IsRetriableReplicationError?
if isSnapshotError(err) {
continue
}
break
}

It is commonly the case that splits will interleave scatters, as callers attempt to distribute the keyspace before some operation, such as RESTORE or an index backfill.

When a split races with the scatter and changes the descriptor whilst a scatter is being processed, the scatter will fail.

Describe the solution you'd like

Retry descriptor change errors, similar to snapshot errors for admin scatter.

Jira issue: CRDB-38942

@kvoli kvoli added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) A-kv-distribution Relating to rebalancing and leasing. T-kv KV Team labels May 21, 2024
@blathers-crl blathers-crl bot added this to Incoming in KV May 21, 2024
kvoli added a commit to kvoli/cockroach that referenced this issue May 21, 2024
Previously, scatter processing would only be retried when encountering a
snapshot error. Other errors commonly occur, which we expect to be
transient and retryable, such as the range descriptor changing or
rejected lease transfers. The range descriptor change error being most
common, due to the proclivity of clients to issue splits alongside
scatter requests, which would update the range descriptor.

Retry failed scatter replicate processing if the returned error matches
any of `IsRetriableReplicationChangeError`s, similar to range splits.
Note the maximum number of retries remains at 5 for scatter.

Resolves: cockroachdb#124522
Release note: None
craig bot pushed a commit that referenced this issue May 28, 2024
124537: kvserver: allow retrying scatter processing with more errors r=nvanbenschoten a=kvoli

Previously, scatter processing would only be retried when encountering a snapshot error. Other errors commonly occur, which we expect to be transient and retryable, such as the range descriptor changing or rejected lease transfers. The range descriptor change error being most common, due to the proclivity of clients to issue splits alongside scatter requests, which would update the range descriptor.

Retry failed scatter replicate processing if the returned error matches any of `IsRetriableReplicationChangeError`s, similar to range splits. Note the maximum number of retries remains at 5 for scatter.

Resolves: #124522
Release note: None

124751: catalog: collecting virtual schemas can be expensive for some ORM queries  r=fqazi a=fqazi

Previously, when running ORM queries with larger schemas or our ORM query bench test we noticed that aggregating virtual schema objects could take a big chunk of time. This was because converting the internal the VirtualTable / VirtualSchemas into a nstree.Catalog was not cheap. To address this, this patch will:

1. Convert the VirtualTable / VirtualSchemas into a nstree.Catalog which can be used between collections (the entire state here is immutable).
2. Reduce some extra copies that happen when copying betwen nstree.Catalog objects, by allowing references to be ingested.

Fixes: #124750

```Orignal:
BenchmarkORMQueries/asyncpg_types         	       1	16914664491 ns/op	         0 roundtrips	6151796312 B/op	33275332 allocs/op
After having a shared nstree.Catalog object:
BenchmarkORMQueries/asyncpg_types         	       1	13252306945 ns/op	         0 roundtrips	5442736128 B/op	29586155 allocs/op
After optimizing MutableCatalog.AddAll:
BenchmarkORMQueries/asyncpg_types         	       1	12399752839 ns/op	         0 roundtrips	4602288096 B/op	27474078 allocs/op
````

Co-authored-by: Austen McClernon <austen@cockroachlabs.com>
Co-authored-by: Faizan Qazi <faizan@cockroachlabs.com>
@craig craig bot closed this as completed in bde1e38 May 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-kv-distribution Relating to rebalancing and leasing. C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-kv KV Team
Projects
KV
Incoming
Development

Successfully merging a pull request may close this issue.

1 participant