Skip to content

Commit 0175c8c

Browse files
committed
metamorphic: bound time spent level checking
The metamorphic test runs with the level checker validating every new readState. This level checking happens while holding the database mutex, preventing the test from making forward progress. Some configurations can create pathological LSMs with tens of thousands of sstables, which are slow to validate with the level checker. This commit addresses this issue by tracking the test wall time and the cumulative wall time spent in the level checker. It avoids running the level checker if the cumulative time spent in the level checker exceeds 25% of the overall run's run time. Additionally, regardless of the cumulative time budget, it skips the level checker half the time. This is intended to ensure that inserting artifical latency during read-state installation through level checking doesn't obscure subtle races. Fix #4517. Fix #4338. Fix #4202.
1 parent e905172 commit 0175c8c

File tree

1 file changed

+25
-19
lines changed

1 file changed

+25
-19
lines changed

metamorphic/options.go

Lines changed: 25 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -322,8 +322,7 @@ func defaultOptions(kf KeyFormat) *pebble.Options {
322322
opts := &pebble.Options{
323323
// Use an archive cleaner to ease post-mortem debugging.
324324
Cleaner: base.ArchiveCleaner{},
325-
// Always use our custom comparer which provides a Split method,
326-
// splitting keys at the trailing '@'.
325+
// Always use our custom comparer which provides a Split method.
327326
Comparer: kf.Comparer,
328327
KeySchema: kf.KeySchema.Name,
329328
KeySchemas: sstable.MakeKeySchemas(kf.KeySchema),
@@ -336,27 +335,34 @@ func defaultOptions(kf KeyFormat) *pebble.Options {
336335
}
337336
opts.Experimental.EnableColumnarBlocks = func() bool { return true }
338337

339-
// We don't want to run the level checker every time because it can slow down
340-
// downloads and background compactions too much.
338+
// The level checker runs every time a new read state is installed: every
339+
// compaction, flush, ingest completion, etc. It runs while the database
340+
// mutex DB.mu is held, preventing the scheduling of new compactions or
341+
// flushes.
341342
//
342-
// We aim to run it once every 500ms (on average). To do this with some
343-
// randomization, each time we get a callback we see how much time passed
344-
// since the last call and run the check with a proportional probability.
345-
const meanTimeBetweenChecks = 500 * time.Millisecond
343+
// We only consider running the level checker 50% of the time (to ensure
344+
// we're not obscuring races post-read state installation).
345+
//
346+
// Additionally, some option configurations can create pathological numbers
347+
// of sstables, causing the level checker to consume an excessive amount of
348+
// time. To prevent pathological cases, we limit the cumulative time spent
349+
// in the level checker to 25% of the test's runtime. If DebugCheck is
350+
// invoked but we've spent more than 25% of our total test time wihtin the
351+
// level checker, we skip invoking DebugCheckLevels.
346352
startTime := time.Now()
347-
// lastCallTime stores the time of the last DebugCheck call, as the duration
348-
// since startTime.
349-
var lastCallTime atomic.Uint64
353+
var cumulativeTime atomic.Int64
350354
opts.DebugCheck = func(db *pebble.DB) error {
351-
now := time.Since(startTime)
352-
last := time.Duration(lastCallTime.Swap(uint64(now)))
353-
// Run the check with probability equal to the time (as a fraction of
354-
// meanTimeBetweenChecks) passed since the last time we had a chance, as a
355-
// fraction of meanTimeBetweenChecks.
356-
if rand.Float64() < float64(now-last)/float64(meanTimeBetweenChecks) {
357-
return pebble.DebugCheckLevels(db)
355+
if rand.Float64() < 0.50 {
356+
return nil
357+
}
358+
now := time.Now()
359+
testDur := now.Sub(startTime)
360+
if checkerDur := time.Duration(cumulativeTime.Load()); checkerDur > testDur/4 {
361+
return nil
358362
}
359-
return nil
363+
err := pebble.DebugCheckLevels(db)
364+
cumulativeTime.Add(int64(time.Since(now)))
365+
return err
360366
}
361367

362368
return opts

0 commit comments

Comments
 (0)