Add more labels to the quickcheck-lockstep StateMachine tests #431

dcoutts · 2024-10-14T15:23:33Z

Add labels for:

number of tables
number of actions per table
logical size of tables
per-run number of actions (failing and succeeding)
label failing actions with the exception
label interleaving depth on duplicate tables

Initial results for each in each patch.

It's clear from several of these measures that the generation is suboptimal. One idea to improve it is to make the generation more sophisticated by (usually) only generating actions on live tables (or cursors, or existant snapshots). This is however difficult-to-impossible to do with the current quickcheck-lockstep package which does not provide a lookUp function to the action generation arbitraryWithVars. So there's no way to filter vars by their liveness (from looking at the model). It may make sense to extend quickcheck-lockstep to support this.

jorisdral

Very nice to get a better sense of our test distribution! I only have minor suggestions

test/Test/Database/LSMTree/Normal/StateMachine.hs

jorisdral · 2024-10-15T14:06:50Z

test/Test/Database/LSMTree/Normal/StateMachine.hs

+    updDupTableActionLog stats = case action of
+        Lookups     ks   tableVar
+          | not (null ks)         -> updateLastActionLog tableVar
+        RangeLookup _    tableVar -> updateLastActionLog tableVar


You could check in this case if the range argument to the action does not represent an empty range

Doesn't look like this was done

Doh! I did it on the wrong one! (updNumActionsPerTable rather than updDupTableActionLog)

Ok. Done. 🤦

test/Test/Database/LSMTree/Normal/StateMachine.hs

dcoutts · 2024-10-16T16:40:22Z

@jorisdral everything should be addressed now. Follow-up PR nearly ready too...

jorisdral

LGTM, though there is one unresolved comment

jorisdral · 2024-10-21T08:21:12Z

test/Test/Database/LSMTree/Normal/StateMachine.hs

+    updDupTableActionLog stats = case action of
+        Lookups     ks   tableVar
+          | not (null ks)         -> updateLastActionLog tableVar
+        RangeLookup _    tableVar -> updateLastActionLog tableVar


Doesn't look like this was done

Instead of using Dynamic. This will allow inspecting the Table at an unknown type, for use with polymrphic operations like size that don't care about the concrete table type.

In the QLS tests. Initial results Number of tables (100 in total): 33% NumTables "2 <= n < 4" 23% NumTables "4 <= n < 8" 17% NumTables "1 <= n < 2" 16% NumTables "8 <= n < 16" 9% NumTables "n == 0" 2% NumTables "16 <= n < 32" Number of actions per table (410 in total): 27.8% NumTableActions "n == 0" 22.0% NumTableActions "2 <= n < 4" 19.3% NumTableActions "4 <= n < 8" 16.1% NumTableActions "1 <= n < 2" 12.4% NumTableActions "8 <= n < 16" 2.4% NumTableActions "16 <= n < 32" As we can see, 9% of runs have no tables at all! This seems a bit surprising, and not terribly useful. Although such runs will be quick, in CI we typically only run 100 tests. And also in many runs there a lot of tables (nearly 20% have 8 or more). This is probably too many to get interesting interleavings of operations between related (duplicated) tables. So we may want to reduce the number of tables, and make more related tables so we can get better coverage of interleaved operations on related tables. And as we can also see, over a quarter of tables have no actions at all! And generaly the number of actions per table is quite low. This would probably be better with fewer tables.

Initial results: Table sizes (440 in total): 41.1% TableSize "n == 0" 21.1% TableSize "32 <= n < 64" 13.4% TableSize "16 <= n < 32" 8.9% TableSize "8 <= n < 16" 7.5% TableSize "64 <= n < 128" 2.7% TableSize "4 <= n < 8" 2.3% TableSize "128 <= n < 256" 2.0% TableSize "2 <= n < 4" 0.7% TableSize "1 <= n < 2" 0.2% TableSize "256 <= n < 512" Again, it's not great that 40% of tables are empty. We'll try and improve that. On the othe hand, the long tail is quite long, going out over 100 elements.

Both success, failing and total. Initial results: Actions total (100 in total): 55% NumActions "10 <= n < 100" 36% NumActions "1 <= n < 10" 7% NumActions "n == 0" 2% NumActions "100 <= n < 1000" Actions that succeeded total (100 in total): 52% NumActions "10 <= n < 100" 38% NumActions "1 <= n < 10" 9% NumActions "n == 0" 1% NumActions "100 <= n < 1000" Actions that failed total (100 in total): 68% NumActions "1 <= n < 10" 22% NumActions "n == 0" 10% NumActions "10 <= n < 100" As we can see, there's a surprising number of runs with no actions at all (7%) or no successful actions (9%). While it's important to get coverage of failing actions, we don't need lots of coverage since the reasons are always the same ones. There are not a lot of complex interactions in these cases. So we're wasting run actions on them.

Initial results: Actions that failed (409 in total): 35.0% ActionFail "Open" "MErr ErrSnapshotDoesNotExist" 10.3% ActionFail "DeleteSnapshot" "MErr ErrSnapshotDoesNotExist" 7.1% ActionFail "Inserts" "MErr ErrTableHandleClosed" 6.8% ActionFail "Deletes" "MErr ErrTableHandleClosed" 6.8% ActionFail "Lookups" "MErr ErrTableHandleClosed" 6.1% ActionFail "RetrieveBlobs" "MErr ErrBlobRefInvalidated" 6.1% ActionFail "Updates" "MErr ErrTableHandleClosed" 5.4% ActionFail "ReadCursor" "MErr ErrCursorClosed" 3.9% ActionFail "Snapshot" "MErr ErrSnapshotExists" 3.4% ActionFail "RangeLookup" "MErr ErrTableHandleClosed" 3.2% ActionFail "Open" "MErr ErrSnapshotWrongType" 2.0% ActionFail "Duplicate" "MErr ErrTableHandleClosed" 2.0% ActionFail "NewCursor" "MErr ErrTableHandleClosed" 2.0% ActionFail "Snapshot" "MErr ErrTableHandleClosed" As we can see, each action has exactly one reason for failing. This shows clearly that we don't need a lot of coverage for these failures. The only non-trivial one is the blob refs becomming invalidated. Otherwise, there's no complex logic or state for any of these failures.

This adds a measure of how much coverage we are getting of uses of duplicate tables, to help us make sure we are getting good test coverage of the multiple writable table handles feature. We count the number of interleaved actions on different but related tables. In particular we count multiple actions on the same table only once: we are only interested in actions that interleave between different but related tables. Tables are defined to be related if they derive (via 'duplicate') from the same ultimate parent table. Initial results: Interleaved actions on table duplicates (270 in total): 88.1% DupTableActionLog "1 <= n < 2" 6.3% DupTableActionLog "2 <= n < 4" 3.3% DupTableActionLog "4 <= n < 8" 1.9% DupTableActionLog "8 <= n < 16" 0.4% DupTableActionLog "16 <= n < 32" Obviously this isn't great yet: 88% of runs have no interleaved actions at all.

Occurrences of `ModelFindVariables` and `ModelLookup` in the `InLockstep` class are replaced by the newly exposed `ModelVarContext`. A `ModelFindVariables` can be recovered from a `ModelVarContext` using the new `findVars` functions. A `ModelLookup` can be recovered from a `ModelVarContext` using the new `lookupVars` function. Since these functions can be recovered from `ModelVarContext`, existing tests are guaranteed to be adaptable to the new `InLockstep` API. Motivation: previously in the `InLockstep` class, member functions would be passed a `ModelFindVariables` or a `ModelLookup`, but never both. In practice this turned out to be too restrictive, because one might want access to both in the same function, see IntersectMBO/lsm-tree#431. For example, `arbitraryWithVars` was previously only passed a `ModelFindVariables`, but without a `ModelLookup` one can not filter these variables based on the (modelled) outcome of the corresponding actions. The use case for #431 in particular was to filter variables that reference stateful handles (e.g., file handles) for handles that are open. Because of this filtering, one can skew the action distribution to generate more actions on open handles. It is often more interesting to test actions on open handles than it is to test actions on closed handles, which will presumably always return the same error.

dcoutts marked this pull request as ready for review October 15, 2024 09:59

dcoutts requested review from jorisdral, mheinzel and recursion-ninja as code owners October 15, 2024 09:59

jorisdral changed the title ~~Add more label to the quickcheck-lockstep StateMachine tests~~ Add more labels to the quickcheck-lockstep StateMachine tests Oct 15, 2024

jorisdral requested changes Oct 15, 2024

View reviewed changes

dcoutts force-pushed the dcoutts/lockstep-io-sim-tweak-generation branch 3 times, most recently from c85bd32 to 2ef54a4 Compare October 16, 2024 16:39

dcoutts requested a review from jorisdral October 16, 2024 16:39

dcoutts force-pushed the dcoutts/lockstep-io-sim-tweak-generation branch from 2ef54a4 to b04fddd Compare October 16, 2024 16:45

jorisdral approved these changes Oct 21, 2024

View reviewed changes

dcoutts force-pushed the dcoutts/lockstep-io-sim-tweak-generation branch from b04fddd to c970ab4 Compare October 21, 2024 12:18

dcoutts added 6 commits October 21, 2024 13:19

QLS tests: change model's SomeTable type rep to use existentials

07ee7b8

Instead of using Dynamic. This will allow inspecting the Table at an unknown type, for use with polymrphic operations like size that don't care about the concrete table type.

dcoutts force-pushed the dcoutts/lockstep-io-sim-tweak-generation branch from c970ab4 to 28c4890 Compare October 21, 2024 12:20

dcoutts enabled auto-merge October 21, 2024 12:21

dcoutts added this pull request to the merge queue Oct 21, 2024

Merged via the queue into main with commit 3f5bda8 Oct 21, 2024
24 checks passed

dcoutts deleted the dcoutts/lockstep-io-sim-tweak-generation branch October 21, 2024 13:10

jorisdral mentioned this pull request Nov 4, 2024

Introduce ModelVarContext as a generalisation of ModelLookup and ModelFindVariables well-typed/quickcheck-lockstep#25

Merged

Add more labels to the quickcheck-lockstep StateMachine tests #431

Add more labels to the quickcheck-lockstep StateMachine tests #431

Uh oh!

Conversation

dcoutts commented Oct 14, 2024

Uh oh!

jorisdral left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jorisdral Oct 15, 2024

Choose a reason for hiding this comment

Uh oh!

dcoutts Oct 16, 2024

Choose a reason for hiding this comment

Uh oh!

jorisdral Oct 21, 2024

Choose a reason for hiding this comment

Uh oh!

dcoutts Oct 21, 2024

Choose a reason for hiding this comment

Uh oh!

dcoutts Oct 21, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

dcoutts commented Oct 16, 2024

Uh oh!

jorisdral left a comment

Choose a reason for hiding this comment

Uh oh!

jorisdral Oct 21, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants