Skip to content

allocator/mmaprototype: convert remove-target Fatalf to ok-return#168241

Draft
tbg wants to merge 1 commit intocockroachdb:masterfrom
tbg:mma-panic-audit/remove-target
Draft

allocator/mmaprototype: convert remove-target Fatalf to ok-return#168241
tbg wants to merge 1 commit intocockroachdb:masterfrom
tbg:mma-panic-audit/remove-target

Conversation

@tbg
Copy link
Copy Markdown
Member

@tbg tbg commented Apr 13, 2026

makeRebalanceReplicaChanges fatals when the remove target is not found
in existing replicas. This can happen when an external change (e.g.
replicate queue) removes a replica before the rebalancer's adjusted state
is refreshed by the next StoreLeaseholderMsg.

Change the function to return an ok bool instead of fataling. Callers
skip the range when the target is missing.

Part of: #167723
Epic: none
Release note: None

@trunk-io
Copy link
Copy Markdown
Contributor

trunk-io Bot commented Apr 13, 2026

Merging to master in this repository is managed by Trunk.

  • To merge this pull request, check the box to the left or comment /trunk merge below.

After your PR is submitted to the merge queue, this comment will be automatically updated with its status. If the PR fails, failure details will also be posted here

@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

@blathers-crl
Copy link
Copy Markdown

blathers-crl Bot commented Apr 13, 2026

Detected infrastructure failure (matched: ). Automatically rerunning failed jobs. (run link)

`makeRebalanceReplicaChanges` fatals when the remove target is not found
in existing replicas. This can happen when an external change (e.g.
replicate queue) removes a replica before the rebalancer's adjusted state
is refreshed by the next `StoreLeaseholderMsg`.

Change the function to return an `ok` bool instead of fataling. Callers
skip the range when the target is missing.

Part of: cockroachdb#167723
Epic: none
Release note: None
@tbg tbg force-pushed the mma-panic-audit/remove-target branch from 2676510 to e993dd5 Compare April 24, 2026 14:19
@cockroach-teamcity
Copy link
Copy Markdown
Member

⚪ Sysbench [SQL, 3node, oltp_read_write]
Metric Old Commit New Commit Delta Note
sec/op 10.60m ±2% 10.64m ±2% ~ p=0.806 n=15
allocs/op 8.113k ±1% 8.151k ±1% ~ p=0.205 n=15
Reproduce

benchdiff binaries:

mkdir -p benchdiff/e993dd5/bin/1058449141
gcloud storage cp gs://cockroach-microbench-ci/builds/e993dd5bd036cf79ab7bbc475e29167c77fecfa8/bin/pkg_sql_tests benchdiff/e993dd5/bin/1058449141/cockroachdb_cockroach_pkg_sql_tests
chmod +x benchdiff/e993dd5/bin/1058449141/cockroachdb_cockroach_pkg_sql_tests
mkdir -p benchdiff/3cd2804/bin/1058449141
gcloud storage cp gs://cockroach-microbench-ci/builds/3cd2804e2e5fa261ee662949dad84ac8fad5d272/bin/pkg_sql_tests benchdiff/3cd2804/bin/1058449141/cockroachdb_cockroach_pkg_sql_tests
chmod +x benchdiff/3cd2804/bin/1058449141/cockroachdb_cockroach_pkg_sql_tests

benchdiff command:

# NB: for best (most stable) results, also add a suitable `--benchtime` that
# results in ~1s to ~5s of benchmark runs. For example, if ops average ~3ms, a
# benchtime of `1000x` is appropriate.
#
# Some benchmarks (in particular BenchmarkSysbench) output additional memory
# profiles covering only the execution (excluding the setup/teardown) - those
# should be preferred for analysis since they more closely correspond to what's
# reported as B/op and alloc/op.
benchdiff --run=^BenchmarkSysbench/SQL/3node/oltp_read_write$ --old=3cd2804 --new=e993dd5 --memprofile ./pkg/sql/tests
⚪ Sysbench [KV, 3node, oltp_read_only]
Metric Old Commit New Commit Delta Note
sec/op 3.084m ±1% 3.078m ±1% ~ p=0.345 n=15
allocs/op 2.100k ±1% 2.100k ±0% ~ p=0.771 n=15
Reproduce

benchdiff binaries:

mkdir -p benchdiff/e993dd5/bin/1058449141
gcloud storage cp gs://cockroach-microbench-ci/builds/e993dd5bd036cf79ab7bbc475e29167c77fecfa8/bin/pkg_sql_tests benchdiff/e993dd5/bin/1058449141/cockroachdb_cockroach_pkg_sql_tests
chmod +x benchdiff/e993dd5/bin/1058449141/cockroachdb_cockroach_pkg_sql_tests
mkdir -p benchdiff/3cd2804/bin/1058449141
gcloud storage cp gs://cockroach-microbench-ci/builds/3cd2804e2e5fa261ee662949dad84ac8fad5d272/bin/pkg_sql_tests benchdiff/3cd2804/bin/1058449141/cockroachdb_cockroach_pkg_sql_tests
chmod +x benchdiff/3cd2804/bin/1058449141/cockroachdb_cockroach_pkg_sql_tests

benchdiff command:

# NB: for best (most stable) results, also add a suitable `--benchtime` that
# results in ~1s to ~5s of benchmark runs. For example, if ops average ~3ms, a
# benchtime of `1000x` is appropriate.
#
# Some benchmarks (in particular BenchmarkSysbench) output additional memory
# profiles covering only the execution (excluding the setup/teardown) - those
# should be preferred for analysis since they more closely correspond to what's
# reported as B/op and alloc/op.
benchdiff --run=^BenchmarkSysbench/KV/3node/oltp_read_only$ --old=3cd2804 --new=e993dd5 --memprofile ./pkg/sql/tests
🔴 Sysbench [KV, 3node, oltp_write_only]
Metric Old Commit New Commit Delta Note
🔴 sec/op 2.886m ±1% 2.942m ±2% +1.95% p=0.003 n=15
allocs/op 4.199k ±0% 4.202k ±0% +0.07% p=0.014 n=15
Reproduce

benchdiff binaries:

mkdir -p benchdiff/e993dd5/bin/1058449141
gcloud storage cp gs://cockroach-microbench-ci/builds/e993dd5bd036cf79ab7bbc475e29167c77fecfa8/bin/pkg_sql_tests benchdiff/e993dd5/bin/1058449141/cockroachdb_cockroach_pkg_sql_tests
chmod +x benchdiff/e993dd5/bin/1058449141/cockroachdb_cockroach_pkg_sql_tests
mkdir -p benchdiff/3cd2804/bin/1058449141
gcloud storage cp gs://cockroach-microbench-ci/builds/3cd2804e2e5fa261ee662949dad84ac8fad5d272/bin/pkg_sql_tests benchdiff/3cd2804/bin/1058449141/cockroachdb_cockroach_pkg_sql_tests
chmod +x benchdiff/3cd2804/bin/1058449141/cockroachdb_cockroach_pkg_sql_tests

benchdiff command:

# NB: for best (most stable) results, also add a suitable `--benchtime` that
# results in ~1s to ~5s of benchmark runs. For example, if ops average ~3ms, a
# benchtime of `1000x` is appropriate.
#
# Some benchmarks (in particular BenchmarkSysbench) output additional memory
# profiles covering only the execution (excluding the setup/teardown) - those
# should be preferred for analysis since they more closely correspond to what's
# reported as B/op and alloc/op.
benchdiff --run=^BenchmarkSysbench/KV/3node/oltp_write_only$ --old=3cd2804 --new=e993dd5 --memprofile ./pkg/sql/tests
Artifacts

download:

mkdir -p new
gcloud storage cp gs://cockroach-microbench-ci/artifacts/e993dd5bd036cf79ab7bbc475e29167c77fecfa8/24894355240-1/\* new/
mkdir -p old
gcloud storage cp gs://cockroach-microbench-ci/artifacts/3cd2804e2e5fa261ee662949dad84ac8fad5d272/24894355240-1/\* old/

built with commit: e993dd5bd036cf79ab7bbc475e29167c77fecfa8

@cockroach-teamcity cockroach-teamcity added the X-perf-check Microbenchmarks CI: Added to a PR if a performance regression is detected and should be checked label Apr 24, 2026
@tbg tbg removed the X-perf-check Microbenchmarks CI: Added to a PR if a performance regression is detected and should be checked label Apr 24, 2026
@tbg tbg marked this pull request as ready for review April 24, 2026 15:28
@tbg tbg requested review from a team as code owners April 24, 2026 15:28
@tbg tbg requested a review from angeladietz April 24, 2026 15:29
@tbg tbg marked this pull request as draft May 4, 2026 10:57
@tbg tbg removed the request for review from angeladietz May 4, 2026 10:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants