Skip to content

pebble: speed up root package tests#6007

Merged
RaduBerinde merged 11 commits intocockroachdb:masterfrom
RaduBerinde:make-tests-faster
Apr 30, 2026
Merged

pebble: speed up root package tests#6007
RaduBerinde merged 11 commits intocockroachdb:masterfrom
RaduBerinde:make-tests-faster

Conversation

@RaduBerinde
Copy link
Copy Markdown
Member

@RaduBerinde RaduBerinde commented Apr 29, 2026

Speed up the slowest tests in the root package.

Total package time dropped from 40.5s → 24.9s (~38% faster).

┌──────────────────────────────────────────────────┬────────┬───────┐                                                                                          
│                       Test                       │ Before │ After │                                                                                          
├──────────────────────────────────────────────────┼────────┼───────┤                                                                                          
│ TestDBCompactionCrash                            │ 7.89s  │ 1.62s │                                                                                          
├──────────────────────────────────────────────────┼────────┼───────┤                                                                                          
│ TestMetricsWALBytesWrittenMonotonicity           │ 3.60s  │ <0.3s │                                                                                          
├──────────────────────────────────────────────────┼────────┼───────┤                                                                                          
│ TestOpenCloseOpenClose                           │ 3.21s  │ 1.75s │                                                                                          
├──────────────────────────────────────────────────┼────────┼───────┤                                                                                          
│ TestLargeKeys                                    │ 2.58s  │ 0.30s │                                                                                          
├──────────────────────────────────────────────────┼────────┼───────┤                                                                                          
│ TestOpenAlreadyLocked                            │ 2.28s  │ 2.04s │                         
├──────────────────────────────────────────────────┼────────┼───────┤                                                                                          
│ TestCheckpointCompaction                         │ 2.21s  │ <0.5s │
├──────────────────────────────────────────────────┼────────┼───────┤                                                                                          
│ TestElevateThresholdAfterWriteStallUnblocksStall │ 1.83s  │ 1.08s │                         
├──────────────────────────────────────────────────┼────────┼───────┤                                                                                          
│ TestReadOnlyRecovery                             │ 1.41s  │ <0.5s │                         
├──────────────────────────────────────────────────┼────────┼───────┤                                                                                          
│ TestWALFailoverAvoidsWriteStall                  │ 1.36s  │ 0.38s │                         
├──────────────────────────────────────────────────┼────────┼───────┤                                                                                          
│ TestSnapshotRangeDeletionStress                  │ 1.17s  │ <0.5s │                         
├──────────────────────────────────────────────────┼────────┼───────┤                                                                                          
│ TestFlushDelayStress                             │ 1.12s  │ <0.6s │                         
└──────────────────────────────────────────────────┴────────┴───────┘                                                                                          

pebble: speed up TestDBCompactionCrash

Increase the outer-loop step from rng.Int32N(5)+1 to rng.Int32N(10)+3.
Each iteration opens a database and replays writes up to a crash point;
sampling crash points at a coarser stride reduces test time roughly
2-3x while still exercising a representative range of crash positions.

Co-Authored-By: roachdev-claude roachdev-claude-bot@cockroachlabs.com

pebble: speed up TestMetricsWALBytesWrittenMonotonicity

Cut the run-time budget from 1s to 200ms. The test asserts that
Metrics.WAL.BytesWritten never decreases under concurrent writers and
flushes; many violations would surface within tens of milliseconds, so
200ms is sufficient to detect the regressed behavior tracked in #3505.

Co-Authored-By: roachdev-claude roachdev-claude-bot@cockroachlabs.com

pebble: speed up TestOpenCloseOpenClose

Drop the 100,000-byte value length and parallelize the
(startFromEmpty, walDirname) subtests. The four resulting subtests are
independent on disk; running them concurrently halves wall time while
preserving the sequential-reopen semantics within each group.

Co-Authored-By: roachdev-claude roachdev-claude-bot@cockroachlabs.com

pebble: shrink testdata/large_keys

Reduce the shared key prefix from 1MB to 100KB and drop the SST target
file size from 4MB to 400KB. Multi-block index partitioning (the
property TestLargeKeys exercises, see #4518) is preserved at the
smaller scale, with the same number of files and index partitions.

To avoid flakes, we disable compactions and set MemTableSize to
16MiB to keep the SET batch (~3MB) below largeBatchThreshold.

Co-Authored-By: roachdev-claude roachdev-claude-bot@cockroachlabs.com

pebble: speed up TestOpenAlreadyLocked

Move the WAL recovery directory setup inside runTest so each subtest
gets its own pre-acquired lock and unlocked recovery directory, then run
the memfs and disk/absolute subtests in parallel. The
disk/relative subtests can't be parallelized because they call
t.Chdir.

Co-Authored-By: roachdev-claude roachdev-claude-bot@cockroachlabs.com

pebble: speed up TestCheckpointCompaction

Reduce the number of checkpoints created from 200 to 50. The test
exercises concurrent writes, compactions, and checkpoint creation/open;
50 iterations is enough to catch concurrency issues without the extra
cost of validating the remaining 150.

Co-Authored-By: roachdev-claude roachdev-claude-bot@cockroachlabs.com

pebble: speed up TestElevateThresholdAfterWriteStallUnblocksStall

Reduce the number of 1MB writes from 200 to 50. The write stall fires
after ~8 writes; once the elevation goroutine clears it, the remaining
writes only verify that the elevated threshold continues to admit
writes — 50 is enough to exercise that.

Co-Authored-By: roachdev-claude roachdev-claude-bot@cockroachlabs.com

pebble: speed up TestReadOnlyRecovery

Reduce the metamorphic operation count from 10,000 to 3,000. The test
generates a random database state and verifies a read-only re-open
doesn't mutate the filesystem; 3,000 ops produce a sufficiently rich
state with multiple SSTs and WAL segments.

Co-Authored-By: roachdev-claude roachdev-claude-bot@cockroachlabs.com

pebble: speed up TestWALFailoverAvoidsWriteStall

Drop UnhealthySamplingInterval from 100ms to 10ms and
UnhealthyOperationLatencyThreshold from 1s to 50ms. The test runs
entirely against in-memory vfs.NewMem, so the latency thresholds only
serve to gate when the failover decision triggers; tightening them
shortens the time-to-failover and cuts the test runtime ~3x.

Co-Authored-By: roachdev-claude roachdev-claude-bot@cockroachlabs.com

pebble: speed up TestSnapshotRangeDeletionStress

Reduce the number of runs from 200 to 50. Each run writes O(runs) keys
and takes a snapshot, so total work scales as O(runs^2). The expanding
DeleteRange pattern is fully exercised at 50 runs.

Co-Authored-By: roachdev-claude roachdev-claude-bot@cockroachlabs.com

pebble: speed up TestFlushDelayStress

Drop the run count from 100 to 25 and replace the random 1-10ms flush
delays with a fixed 1ms. The test runs inside synctest.Test, which
makes wall-clock variance irrelevant, so the random delay range adds
nothing but extra simulated time per iteration.

Co-Authored-By: roachdev-claude roachdev-claude-bot@cockroachlabs.com

RaduBerinde and others added 3 commits April 29, 2026 09:05
Increase the outer-loop step from `rng.Int32N(5)+1` to `rng.Int32N(10)+3`.
Each iteration opens a database and replays writes up to a crash point;
sampling crash points at a coarser stride reduces test time roughly
2-3x while still exercising a representative range of crash positions.

Co-Authored-By: roachdev-claude <roachdev-claude-bot@cockroachlabs.com>
Cut the run-time budget from 1s to 200ms. The test asserts that
`Metrics.WAL.BytesWritten` never decreases under concurrent writers and
flushes; many violations would surface within tens of milliseconds, so
200ms is sufficient to detect the regressed behavior tracked in cockroachdb#3505.

Co-Authored-By: roachdev-claude <roachdev-claude-bot@cockroachlabs.com>
Drop the 100,000-byte value length and parallelize the
`(startFromEmpty, walDirname)` subtests. The four resulting subtests are
independent on disk; running them concurrently halves wall time while
preserving the sequential-reopen semantics within each group.

Co-Authored-By: roachdev-claude <roachdev-claude-bot@cockroachlabs.com>
@RaduBerinde RaduBerinde requested a review from a team as a code owner April 29, 2026 16:44
@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

RaduBerinde and others added 8 commits April 29, 2026 13:14
Reduce the shared key prefix from 1MB to 100KB and drop the SST target
file size from 4MB to 400KB. Multi-block index partitioning (the
property `TestLargeKeys` exercises, see cockroachdb#4518) is preserved at the
smaller scale, with the same number of files and index partitions.

To avoid flakes, we disable compactions and set `MemTableSize` to
16MiB to keep the SET batch (~3MB) below `largeBatchThreshold`.

Co-Authored-By: roachdev-claude <roachdev-claude-bot@cockroachlabs.com>
Move the WAL recovery directory setup inside `runTest` so each subtest
gets its own pre-acquired lock and unlocked recovery directory, then run
the `memfs` and `disk/absolute` subtests in parallel. The
`disk/relative` subtests can't be parallelized because they call
`t.Chdir`.

Co-Authored-By: roachdev-claude <roachdev-claude-bot@cockroachlabs.com>
Reduce the number of checkpoints created from 200 to 50. The test
exercises concurrent writes, compactions, and checkpoint creation/open;
50 iterations is enough to catch concurrency issues without the extra
cost of validating the remaining 150.

Co-Authored-By: roachdev-claude <roachdev-claude-bot@cockroachlabs.com>
Reduce the number of 1MB writes from 200 to 50. The write stall fires
after ~8 writes; once the elevation goroutine clears it, the remaining
writes only verify that the elevated threshold continues to admit
writes — 50 is enough to exercise that.

Co-Authored-By: roachdev-claude <roachdev-claude-bot@cockroachlabs.com>
Reduce the metamorphic operation count from 10,000 to 3,000. The test
generates a random database state and verifies a read-only re-open
doesn't mutate the filesystem; 3,000 ops produce a sufficiently rich
state with multiple SSTs and WAL segments.

Co-Authored-By: roachdev-claude <roachdev-claude-bot@cockroachlabs.com>
Drop `UnhealthySamplingInterval` from 100ms to 10ms and
`UnhealthyOperationLatencyThreshold` from 1s to 50ms. The test runs
entirely against in-memory `vfs.NewMem`, so the latency thresholds only
serve to gate when the failover decision triggers; tightening them
shortens the time-to-failover and cuts the test runtime ~3x.

Co-Authored-By: roachdev-claude <roachdev-claude-bot@cockroachlabs.com>
Reduce the number of runs from 200 to 50. Each run writes O(runs) keys
and takes a snapshot, so total work scales as O(runs^2). The expanding
DeleteRange pattern is fully exercised at 50 runs.

Co-Authored-By: roachdev-claude <roachdev-claude-bot@cockroachlabs.com>
Drop the run count from 100 to 25 and replace the random 1-10ms flush
delays with a fixed 1ms. The test runs inside `synctest.Test`, which
makes wall-clock variance irrelevant, so the random delay range adds
nothing but extra simulated time per iteration.

Co-Authored-By: roachdev-claude <roachdev-claude-bot@cockroachlabs.com>
Copy link
Copy Markdown
Contributor

@xxmplus xxmplus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How nice!
:lgtm:

@xxmplus made 1 comment.
Reviewable status: 0 of 10 files reviewed, all discussions resolved (waiting on annrpom and sumeerbhola).

@RaduBerinde
Copy link
Copy Markdown
Member Author

TFTR!

@RaduBerinde RaduBerinde merged commit c449e62 into cockroachdb:master Apr 30, 2026
9 checks passed
@RaduBerinde RaduBerinde deleted the make-tests-faster branch April 30, 2026 21:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants