New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kvserver: compute MVCCStats in AdminSplit instead of splitTrigger #119894
Conversation
655a2a5
to
b88c331
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Took a first pass, looking good. I'll wait for @nvanbenschoten to take a look before doing another round.
The spikes seem to somewhat correlate with admission delay.
This would be indicative that it is unrelated to the split latches.
I ran a sequential KV workload with this PR, comparing to master.
Would it be possible to run a uniform write only workload without pre-splitting. We should splits occur at the same time across ranges and that'd give a better indication of latency. I'm guessing for the sequential workload the splits were load based, so we might not get as much data scanned under latches vs size based splits.
Reviewed 3 of 5 files at r1, 11 of 11 files at r3, 11 of 11 files at r4, 7 of 7 files at r5, 6 of 6 files at r6, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @miraradeva, @nvanbenschoten, and @sumeerbhola)
-- commits
line 61 at r5:
Consider mentioning the default diff thresholds for bytes/count in the commit message; also how they were selected.
-- commits
line 83 at r6:
nit: add a release note as an ops change for these new metrics.
pkg/kv/kvserver/metrics.go
line 3444 at r6 (raw file):
ReplicaReadBatchWithoutInterleavingIter: metric.NewCounter(metaReplicaReadBatchWithoutInterleavingIter), // Estimated MVCC stats in split
nit: period.
pkg/kv/kvserver/replica_command.go
line 452 at r4 (raw file):
ba.AddRawRequest(&req) if err = r.store.DB().Run(ctx, &ba); err != nil { log.Event(ctx, "failed to re-compute MVCCStats pre split")
nit: does it makes sense to include the error message here?
pkg/kv/kvserver/batcheval/cmd_end_transaction.go
line 61 at r5 (raw file):
settings.SystemVisible, "kv.split.max_mvcc_stat_count_diff", "defines the max number of units that are acceptable for an individual MVCC stat to diverge; needs kv.split.estimated_mvcc_stats.enabled to be true",
nit: wrap to 80 chars.
pkg/kv/kvserver/batcheval/cmd_end_transaction.go
line 72 at r5 (raw file):
settings.SystemVisible, "kv.split.max_mvcc_stat_bytes_diff", "defines the max number of bytes that are acceptable for an individual MVCC stat to diverge; needs kv.split.estimated_mvcc_stats.enabled to be true",
Same as above.
pkg/kv/kvserver/batcheval/split_stats_helper.go
line 136 at r4 (raw file):
// be scanned. ScanRightFirst bool // LeftIsEmpty denotes that the LHS is empty. If this is the case, the entire
nit: the brevity of LHS vs RHS is good but inconsistent with the field comments above, consider spelling them out.
pkg/kv/kvserver/batcheval/split_stats_helper.go
line 221 at r4 (raw file):
// to be 100% accurate, so they are marked with ContainsEstimates. However, // if there are no writes concurrent with the split (i.e. the pre-computed // PreSplitLeftUser still correspond to the correct LHS user stats), then the
nit: double space.
pkg/kv/kvserver/batcheval/split_stats_helper.go
line 260 at r5 (raw file):
} // If the user pre-split stats differ significantly form the current stats
nit: from.
pkg/kv/kvserver/batcheval/split_stats_helper.go
line 276 at r6 (raw file):
h.in.PreSplitStats, h.in.MaxCountDiff, h.in.MaxBytesDiff) { log.VEventf(context.Background(), 2, "split falling back to accurate stats computation because of large difference in pre- and post-split MVCC stats; pre: %v, post: %v",
nit : wrap to 80 chars.
pkg/storage/enginepb/mvcc.go
line 207 at r5 (raw file):
// HasUserDataCloseTo compares the fields corresponding to user data and returns // whether their absolute difference is within a certain limit. Separate limits // are passed in for stats measures in count, bytes, and nanos.
Should the comment be updated or method changed? The nano fields (age) aren't being compared here.
pkg/kv/kvserver/rditer/stats_test.go
line 1 at r3 (raw file):
// Copyright 2023 The Cockroach Authors.
s/2023/2024/
b88c331
to
05daef9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @kvoli, @nvanbenschoten, and @sumeerbhola)
Previously, kvoli (Austen) wrote…
Consider mentioning the default diff thresholds for bytes/count in the commit message; also how they were selected.
Done.
pkg/kv/kvserver/replica_command.go
line 452 at r4 (raw file):
Previously, kvoli (Austen) wrote…
nit: does it makes sense to include the error message here?
I returned the error instead to be more consistent with the rest of the error handling in this function.
pkg/kv/kvserver/batcheval/cmd_end_transaction.go
line 72 at r5 (raw file):
Previously, kvoli (Austen) wrote…
Same as above.
Done.
pkg/storage/enginepb/mvcc.go
line 207 at r5 (raw file):
Previously, kvoli (Austen) wrote…
Should the comment be updated or method changed? The nano fields (age) aren't being compared here.
Done.
pkg/kv/kvserver/rditer/stats_test.go
line 1 at r3 (raw file):
Previously, kvoli (Austen) wrote…
s/2023/2024/
Done.
05daef9
to
9bca223
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 2 of 5 files at r1, 2 of 11 files at r3, 18 of 18 files at r7, 6 of 11 files at r8.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @kvoli, @miraradeva, and @sumeerbhola)
pkg/kv/kvserver/replica_command.go
line 467 at r8 (raw file):
// post-split LHS stats by combining these stats with the non-user stats // computed in splitTrigger. More details in makeEstimatedSplitStatsHelper. userOnlyLeftStats, err := rditer.ComputeStatsForRangeUserOnly(
Should this scan be subject to the batcheval.EnableEstimatedMVCCStatsInSplit
cluster setting?
Also, should it all be behind a cluster version check?
pkg/kv/kvserver/batcheval/cmd_end_transaction.go
line 65 at r8 (raw file):
"if enabled, MVCC stats will be recomputed at the beginning of a split "+ "to prevent stats estimates from drifting", util.ConstantWithMetamorphicTestBool("kv.split.mvcc_stats_recomputation.enabled", true))
I had thought we were going to default this to false
, given your recent experiments.
pkg/kv/kvserver/batcheval/cmd_end_transaction.go
line 1064 at r8 (raw file):
startKey := split.LeftDesc.StartKey.AsRawKey() endKey := split.LeftDesc.EndKey.AsRawKey() if startKey.Compare(keys.LocalMax) <= 0 && endKey.Compare(keys.LocalMax) > 0 {
I think split.LeftDesc.KeySpan().AsRawSpanWithNoLocals()
will give you what you want.
pkg/kv/kvserver/batcheval/cmd_end_transaction.go
line 1185 at r8 (raw file):
// makeEstimatedSplitStatsHelper. if rec.ClusterSettings().Version.IsActive(ctx, clusterversion.V24_1_EstimatedMVCCStatsInSplit) && EnableEstimatedMVCCStatsInSplit.Get(&rec.ClusterSettings().SV) {
I think we should be checking statsInput.PreSplitLeftUser
(statsInput.PreSplitLeftUser == enginepb.MVCCStats{}
) to decide whether to enter this branch, and then lift these cluster setting and version checks to AdminSplit
. Doing the checks here:
- prevents you from skipping the scan in AdminSplit when the cluster setting is disabled
- could race with a cluster version switch. The PreSplitLeftUser might not be supplied from an old node, but the cluster version may change before this request is processed.
pkg/kv/kvserver/batcheval/split_stats_helper.go
line 136 at r8 (raw file):
// be scanned. ScanRightFirst bool // LeftIsEmpty denotes that the left hand side is empty. If this is the case,
For both of these fields, these flags is only saying that the user keyspaces are empty, right? So they're not an indication that the stats are empty, just that an accurate stats computation will be cheap. Consider commenting that and also changing the name to something like LeftUserIsEmpty
and RightUserIsEmpty
.
pkg/kv/kvserver/batcheval/split_stats_helper.go
line 144 at r8 (raw file):
// accurately. This is cheap because the range is empty. RightIsEmpty bool // PreSplitLeftUser contains the pre-split user-only stats of the left hand
nit: feel free to move around these fields to where they make sense. For example, move this up below the other MVCCStats fields, and move PostSplitScanLocalLeftFn
up below the other splitStatsScanFn
fields.
pkg/kv/kvserver/batcheval/split_stats_helper.go
line 233 at r8 (raw file):
} // Fast path: if both ranges are guaranteed to be empty, just scan them and
Would it be simpler to handle all of the h.in.LeftIsEmpty || h.in.RightIsEmpty
cases in makeSplitStatsHelper
and then only use makeEstimatedSplitStatsHelper
if !h.in.LeftIsEmpty && !h.in.RightIsEmpty
? Then this function will always leave the stats on both sides estimated.
pkg/roachpb/data.proto
line 133 at r8 (raw file):
reserved 3, 4; // Pre-computed stats for the LHS spans corresponding to user data.
nit: move above reserved
.
9bca223
to
14f7ade
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @kvoli, @nvanbenschoten, and @sumeerbhola)
pkg/kv/kvserver/replica_command.go
line 467 at r8 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Should this scan be subject to the
batcheval.EnableEstimatedMVCCStatsInSplit
cluster setting?Also, should it all be behind a cluster version check?
Yes, it should check EnableEstimatedMVCCStatsInSplit
.
But if we have the statsInput.PreSplitLeftUser == enginepb.MVCCStats{}
check in cmd_end_transaction, we don't need the version gate anymore, right?
If the leaseholder is running an old version, it won't populate PreSplitLeftUser
and we fall back to accurate stats. If the follower is running an old version, it only knows about accurate stats computation (and ignores PreSplitLeftUser
). The only discrepancy we can get is if some replicas have the old version, and some have the new version; they would end up with different stats. But that should be ok? I can keep the version gate to prevent this but let me know if you think it's simpler to remove it.
pkg/kv/kvserver/batcheval/cmd_end_transaction.go
line 65 at r8 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
I had thought we were going to default this to
false
, given your recent experiments.
I re-ran the experiment with stats re-computation on (ran it for over 2 hours), and couldn't repro the performance degradation I saw in that first experiment. Maybe a longer-term run will help decide but I don't have any concrete evidence that re-computation is causing issues.
pkg/kv/kvserver/batcheval/cmd_end_transaction.go
line 1185 at r8 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
I think we should be checking
statsInput.PreSplitLeftUser
(statsInput.PreSplitLeftUser == enginepb.MVCCStats{}
) to decide whether to enter this branch, and then lift these cluster setting and version checks toAdminSplit
. Doing the checks here:
- prevents you from skipping the scan in AdminSplit when the cluster setting is disabled
- could race with a cluster version switch. The PreSplitLeftUser might not be supplied from an old node, but the cluster version may change before this request is processed.
Yes, that's a great point. I hesitated to compare PreSplitLeftUser
to enginepb.MVCCStats{}
because the cases of (1) the field being unset, and (2) the left user stats being empty, are indistinguishable. But it's ok to fall back to accurate stats computation in both of these cases; in the former we have no choice, and in the latter the LHS is empty to it's not expensive to scan.
pkg/kv/kvserver/batcheval/split_stats_helper.go
line 233 at r8 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Would it be simpler to handle all of the
h.in.LeftIsEmpty || h.in.RightIsEmpty
cases inmakeSplitStatsHelper
and then only usemakeEstimatedSplitStatsHelper
if!h.in.LeftIsEmpty && !h.in.RightIsEmpty
? Then this function will always leave the stats on both sides estimated.
Yes, so much simpler!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 2 of 11 files at r8, 12 of 12 files at r12, 7 of 7 files at r13, 6 of 6 files at r14, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @kvoli, @miraradeva, and @sumeerbhola)
-- commits
line 77 at r13:
Have we confirmed that the customer case which motivated this work will fall below these thresholds? We could look at the QPS on the range and the time the stats computation took to perform (the latch delay) and determine whether the thresholds would have been tripped.
pkg/kv/kvserver/replica_command.go
line 467 at r8 (raw file):
Previously, miraradeva (Mira Radeva) wrote…
Yes, it should check
EnableEstimatedMVCCStatsInSplit
.But if we have the
statsInput.PreSplitLeftUser == enginepb.MVCCStats{}
check in cmd_end_transaction, we don't need the version gate anymore, right?If the leaseholder is running an old version, it won't populate
PreSplitLeftUser
and we fall back to accurate stats. If the follower is running an old version, it only knows about accurate stats computation (and ignoresPreSplitLeftUser
). The only discrepancy we can get is if some replicas have the old version, and some have the new version; they would end up with different stats. But that should be ok? I can keep the version gate to prevent this but let me know if you think it's simpler to remove it.
I think it's probably good to keep the version gate, if only to avoid double-computing the stats in mixed-version clusters where the PreSplitLeftUser
is computed and then ignored.
pkg/kv/kvserver/replica_command.go
line 500 at r13 (raw file):
if err := r.store.DB().Txn(ctx, func(ctx context.Context, txn *kv.Txn) error { return splitTxnAttempt(ctx, r.store, txn, leftDesc, rightDesc, desc, reason, userOnlyLeftStats, r.GetMVCCStats())
Should we grab r.GetMVCCStats
at the same time that we're grabbing userOnlyLeftStats
? Do we want to grab a new value on each txn retry?
pkg/kv/kvserver/batcheval/cmd_end_transaction.go
line 1159 at r12 (raw file):
// to accurate stats computation because scanning the empty LHS is not // expensive. if split.PreSplitLeftUserStats.Equal(enginepb.MVCCStats{}) ||
nit: does split.PreSplitLeftUserStats == (enginepb.MVCCStats{})
not work? The Equal
method is auto-generated and operates on a interface{}
, so it will be less performant.
pkg/storage/enginepb/mvcc.go
line 212 at r13 (raw file):
) bool { countWithinLimit := func(v1 int64, v2 int64) bool { return math.Abs(float64(v1)-float64(v2)) <= float64(maxCountDiff)
nit: Abs
is easy enough to define for int64
. Want to just pull out a helper right above countWithinLimit
?
14f7ade
to
f1b8546
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @kvoli, @nvanbenschoten, and @sumeerbhola)
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Have we confirmed that the customer case which motivated this work will fall below these thresholds? We could look at the QPS on the range and the time the stats computation took to perform (the latch delay) and determine whether the thresholds would have been tripped.
I bumped these up to 5k count and 5.12MB respectively, which ensures no splits fall through to accurate stats for the kv sequential workload (which was doing ~15k QPS). The thresholds (5k and 5.12MB) handle this comfortably; as I was tuning them, in the logs I saw count diffs of ~2k and bytes diffs of ~2M for individual stats.
I looked through some old escalations where the split latency was an issue. There isn't a consistent latch wait duration (it varies from a few hundred ms to 1.5-2 seconds); the QPS in these escalations seems to be around 8k on average, though in some it ramps up to 10k, and one escalation mentions 30k for some other high-qps workloads.
Since we won't be holding latches for 2s anymore, it's probably too extreme to expect (30k QPS) * (2s) = 60k concurrent writes. If we want to handle 30k QPS, and extrapolate from the kv sequential workload, we'd need about 4-5k count diff and 4-5M bytes diff. The bumped-up values should handle just about that much but not more. I'm happy to increase these more, especially if we decide to keep the re-computation cluster setting on.
pkg/kv/kvserver/replica_command.go
line 467 at r8 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
I think it's probably good to keep the version gate, if only to avoid double-computing the stats in mixed-version clusters where the
PreSplitLeftUser
is computed and then ignored.
Good point, I'll keep it.
pkg/kv/kvserver/replica_command.go
line 500 at r13 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Should we grab
r.GetMVCCStats
at the same time that we're grabbinguserOnlyLeftStats
? Do we want to grab a new value on each txn retry?
Yes, we should keep them consistent. In the next commit, I use the total pre-split stats (r.GetMVCCStats
) to export a metric on the total bytes of estimates produced. But the actual estimates being produced come from the discrepancy between userOnlyLeftStats
and the actual user left stats.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 19 of 19 files at r15.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @nvanbenschoten and @sumeerbhola)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 2 of 19 files at r15, 6 of 11 files at r16, 5 of 7 files at r17, 6 of 6 files at r18, all commit messages.
Reviewable status: complete! 2 of 0 LGTMs obtained (waiting on @miraradeva and @sumeerbhola)
pkg/kv/kvserver/batcheval/cmd_end_transaction.go
line 1212 at r18 (raw file):
statsInput.PreSplitStats, statsInput.AbsPreSplitBothStored) } log.VEventf(ctx, 2, "falling back to accurate stats computation because %v", reason)
We should probably log.Info
if preComputedStatsDiff
. I imagine this will be a common suspect for tail latency in the near future, especially as we try to find the right values for the kv.split.max_mvcc_stat_XXX_diff
settings.
pkg/storage/enginepb/mvcc.go
line 212 at r13 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
nit:
Abs
is easy enough to define forint64
. Want to just pull out a helper right abovecountWithinLimit
?
nit: I was thinking you could do it without math.Abs
or the cast through a float64 (with the corresponding loss of precision). Just
abs := func(x int64) int64 {
if x < 0 {
return -x
}
return x
}
f1b8546
to
d0e5ee4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TYFTR!
Reviewable status: complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @nvanbenschoten and @sumeerbhola)
pkg/storage/enginepb/mvcc.go
line 212 at r13 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
nit: I was thinking you could do it without
math.Abs
or the cast through a float64 (with the corresponding loss of precision). Justabs := func(x int64) int64 { if x < 0 { return -x } return x }
Lol yes, easy enough.
…-user spans ComputeStatsForRange iterates over an entire range and computes its MVCC stats. This patch adds ComputeStatsForRangeUserOnly and ComputeStatsForRangeExcludingUser, which compute the MVCC stats corresponding to the user-only and non-user spans in the range, respectively. These functions are useful for computing the estimated stats proposed in cockroachdb#119499 by allowing to quickly iterate only over the non-user key spans in the range, and separately iterate over the user-only key spans. Part of: cockroachdb#119499 Release note: None
Previously, we computed MVCC stats for the LHS and RHS of a split in splitTrigger, while holding latches. This resulted in high tail latencies due to foreground work not being able to proceed as long as the split holds these latches. More details in cockroachdb#119499. This change moves some of the stats computation to AdminSplit, where no latches are held. In particular, in AdminSplit, we pre-compute the part of the LHS stats corresponding to user-data spans, and pass that to the splitTrigger. In the splitTrigger, we iterate only over the non-user spans in the range (which are small), and combine the resulting stats with the passed in user stats to arrive at the new LHS stats. The RHS stats are derived by substituting the LHS stats from the total on-disk stats. This computation does not produce 100% correct stats because any writes concurrent with the split may not be attributed correctly to the LHS or RHS. Therefore, both LHS and RHS stats are marked with ContainsEstimates. To prevent the stats from drifting after successive splits, we kick off a stats re-computation in AdminSplit (only if the on-disk stats contain estimates). Thus, each split may introduce some stats estimates, but before doing so it will correct any previous estimates. There are two invariants that the stats computation guarantees: 1. Non-user stats (SysBytes, SysCount, etc.) are always correct. This is because we scan those spans while holding latches. 2. If there are no writes concurrent with the split, the stats are always correct. Fixes: cockroachdb#119499 Release note (performance improvement): Splits no longer hold latches for time proportional to the range size while computing MVCC stats. Instead, MVCC stats are pre-computed before the critical section of the split. As a side effect, the resulting stats are no longer 100% accurate because they may correctly distribute writes concurrent with the split. To mitigate this, and to prevent the stats from drifting after successive splits, the existing stored stats are re-computed and corrected (if needed) as part of the non-critical section of the split.
…ccurate-stats split This patch is a followup to cockroachdb#119499, and allows to add thresholds on the number of keys and bytes of the difference between the pre-split MVCC stats (retrieved in AdminSplit) and the stats when the split holds latches (retrieved in splitTrigger). This difference corresponds to writes concurrent with the split. If the difference is too large, the split falls back to computing LHS and RHS stats accurately. The difference is computed only for stats corresponding to user-data; system stats are always kept accurate. These thresholds are tunable by two new cluster settings, MaxMVCCStatCountDiff and MaxMVCCStatBytesDiff, which denote the maximum number of entities (e.g. keys, intents) and maximum number of bytes (e.g. value bytes, range value bytes), respectively, that are acceptable as the difference between the pre- and post-split values of an individual stat. The former defaults to 5000, and the latter to 5.12MB (1% of the maximum range size). Fixes: cockroachdb#119503 Release note: None
This patch adds logging and two new metrics to track estimated MVCC stats computed during splits. `kv.split.estimated_stats`: the number of splits that computed estimated MVCC stats, as opposed to 100% accurate ones. `kv.split.total_bytes_estimates`: the number of total bytes of estimates introduced by splits. These are calculated as the difference between the pre-computed stats before the split and the stored stats during the split (while holding latches). Fixes: cockroachdb#119516 Release note (ops change): Two new metrics (kv.split.estimated_stats and kv.split.total_bytes_estimates) added to track the number of splits that produce MVCC stats estimates, and the total bytes of estimates produced.
d0e5ee4
to
af328f4
Compare
bors r=nvanbenschoten,kvoli |
This PR includes four commits:
Results:
We ran the following workloads to confirm the effect of the change on p99.99 latency. In the following screenshots, the green cluster is running with accurate stats computation (i.e.
kv.split.estimated_mvcc_stats.enabled = false
), and the yellow cluster is running with estimated stats computation, which is the default../cockroach workload run kv --drop --insert-count=5000000 --max-block-bytes=1024 --min-block-bytes=1024 --concurrency=1024 --max-rate=10000
We see significant spikes in p99.99 latency on the accurate-stats cluster, and none on the estimated-stats cluster.
./cockroach workload run kv --drop –sequential --insert-count=5000000 --max-block-bytes=1024 --min-block-bytes=1024 --concurrency=1024 --max-rate=10000
Next, we tested a sequential-write load to see an even more pronounced effect on p99.99 latency. With the default cluster settings, the p99.99 latency is dominated by latch waits caused by load-based lease transfers (specifically LeaseTransfer requests holding latches).
To get a more meaningful test, we turned off load-based splitting and rebalancing (
kv.range_split.by_load_enabled
,kv.allocator.load_based_rebalancing
).The main trade-off of this change is the increased split duration due to re-computing MVCC stats at the start of a split.
This hasn’t been an issue but if it becomes one, turning stats re-computation off (
kv.split.mvcc_stats_recomputation.enabled
) brings the split time back to normal. Stats re-computation itself is not cheap; on a cluster that’s close to being overloaded, multiple splits with stats re-computation can significantly load the cluster additionally.