QC Lockstep tests: tweak test case generation #436

dcoutts · 2024-10-22T15:07:06Z

This is an attempt to get the QLS tests to get more coverage of the duplicate table feature, and more generally to have more interesting coverage.

In particular, we move several of the boring special cases out of the QLS test and into unit tests. This is for things like operating on closed table handles, or opening non-existent snapshots. These things of course must be covered, but they don't have interesting interactions so they don't need to be covered in complex situations. This lets the limited length of action sequences to be used for more interesting things.

See quickcheck-dynamic#85 for further details on why action sequences are so short.

We also change how actions are generated to have fewer tables, and thus more actions per table. We limit the number of tables at once to 5.

Initial results (from 28c4890):

      Interleaved actions on table duplicates (270 in total):
      88.1% DupTableActionLog "1 <= n < 2"
       6.3% DupTableActionLog "2 <= n < 4"
       3.3% DupTableActionLog "4 <= n < 8"
       1.9% DupTableActionLog "8 <= n < 16"
       0.4% DupTableActionLog "16 <= n < 32"

and now we get:

      Interleaved actions on table duplicates (125 in total):
      55.2% DupTableActionLog "1 <= n < 2"
      12.8% DupTableActionLog "4 <= n < 8"
      11.2% DupTableActionLog "2 <= n < 4"
      11.2% DupTableActionLog "8 <= n < 16"
       7.2% DupTableActionLog "16 <= n < 32"
       2.4% DupTableActionLog "32 <= n < 64"

So this is considerably improved, with 45% of runs now having non-trivial interleavings rather than 12%. And where previously we had only 2.3% of original tables with interleavings of length 8 or more, we now have over 20%.

One important caveat is that the change in the key and value types mean the tests run longer. Summary of running times from my system:

    propLockstep_ModelIOImpl:           OK (11.15s)
    propLockstep_RealImpl_RealFS_IO:    OK (24.24s)
    propLockstep_RealImpl_MockFS_IO:    OK (77.06s)
    propLockstep_RealImpl_MockFS_IOSim: OK (111.07s)

The purpose of the change is to allow the generation of actions to look up information from the model. Instead of action generation only being given a function to find model variables, it now gets a context from which one can both find model variables, and look those variables up. This will allow filtering variables based on information from the model, such as whether tables are closed. This first patch just updates to the API changes without making any changes in functionalty.

In preparation for adding more unit tests.

jorisdral

Nice!

Tests are timing out because CI currently only allows tests to run for around 5 minutes. These lines will have to be tweaked:

lsm-tree/.github/workflows/haskell.yml

Lines 123 to 133 in 3f5bda8

    
               # https://docs.github.com/en/actions/using-workflows/workflow-commands-for-github-actions#setting-an-environment-variable 
        
               - name: Set test timeout (Unix) 
        
                 if: ${{ startsWith(matrix.os, 'ubuntu') || startsWith(matrix.os, 'macOS') }} 
        
                 run: | 
        
                   echo "TASTY_TIMEOUT=5m" >> "$GITHUB_ENV" 
        
               # https://github.com/actions/runner/issues/2281#issuecomment-1326748709 
        
               - name: Set test timeout (Windows) 
        
                 if: ${{ startsWith(matrix.os, 'windows') }} 
        
                 run: | 
        
                   echo "TASTY_TIMEOUT=5m" >> "$env:GITHUB_ENV"

src/Database/LSMTree/Internal/Paths.hs

test/Test/Database/LSMTree/Normal/StateMachine.hs

To make it rather more convenient to use for literal constant names. This is exposed in the API, not just for tests, but I think this is a helpful API addition.

Previously the QLS test spent half of its search space on making sure we get coverage for using operations at two different table types. While it is important to get this coverage, we do not need to divide our search space in half to achieve it. That search space can be better spent on longer sequences of actions on fewer tables that can have sharing and thus interesting interactions. So we move the {Key,Value,Blob}{1,2} types into the unit test module, and do a simple unit test where we create, insert, snapshot and restore tables of two different types. And we also simplify the QLS test to use a single set of key,value,blob types.

For snapshots we cover the cases of: DeleteMissingSnapshot, OpenMissingSnapshot, SnapshotTwice and OpenSnapshotWrongLabel. (These are all tags from the QLS test.) For tables and cursors we cover use after close and idempotent close.

The intention here is to cover the "boring" cases of using tables that are not open using unit tests, so that more of the QLS search space can be spent on interesting things that may take longer action sequences to find. So having added specific unit tests for snapshots, tables and cursors we now adjust the generation of actions so that we only ever generate references to resources that are currently open (or equivalent for snapshots). This involves consulting the model when we generate actions and filtering out variables that do not correspond to available resources. We leave blobrefs unfiltered since they have some interesting logic to them. But these cover only a small portion of the search space. In particular we also limit the number open tables and open cursors to 5 at once. By having fewer tables, we have more operations per table, and thus more opportunity for interesting interleavings on duplicate tables.

Fewer: * new tables * opening of snapshots * new cursors * table closes * fewer snapshots More: * table duplicates Note also that tables and cursors are limited to 5 each at any one time.

This is intended to allow us to increase the QC size for the QLS tests to get longer action sequences, without also getting individually larger actions (which makes things too slow).

dcoutts · 2024-10-23T14:12:44Z

In the first version of this PR the running times on my machine were:

    propLockstep_ModelIOImpl:           OK (17.80s)
    propLockstep_RealImpl_RealFS_IO:    OK (86.42s)
    propLockstep_RealImpl_MockFS_IO:    OK (251.20s)
    propLockstep_RealImpl_MockFS_IOSim: OK (284.86s)
Test.Database.LSMTree.Normal.StateMachine.DL
    prop_example:                       OK (455.09s)

But with a couple changes they're down to this:

    propLockstep_ModelIOImpl:           OK (11.15s)
    propLockstep_RealImpl_RealFS_IO:    OK (24.24s)
    propLockstep_RealImpl_MockFS_IO:    OK (77.06s)
    propLockstep_RealImpl_MockFS_IOSim: OK (111.07s)

Those changes are:

Use SerialisedBlob rather than LargeRawBytes for the Blob type.
Disable the StateMachine.DL.prop_example test

In particular, the latter was not justifying its long running time with any crucial coverage.

dcoutts · 2024-10-23T14:13:53Z

Oh and so hopefully this will mean we do not need to adjust the CI testsuite timeouts. 🤞

Use pre-existing test types created for this purpose with reasonable looking Arbitrary instances. These are definately larger. They may be too large for this kind of test. The test running times are longer. Also have to adjust the DL example to match the key type. In particular the key type is now designed to create some collisions, so the size of a sequence of keys is not the same as the size of the map of the same keys.

It's utility does not justify its running time.

To better reflect its content.

jorisdral

LGTM! There is just the hlint failure to fix, but then we can merge

dcoutts added 2 commits October 22, 2024 14:28

Restructure existing blobs unit test

6cee94c

In preparation for adding more unit tests.

dcoutts requested review from jorisdral, mheinzel and recursion-ninja as code owners October 22, 2024 15:07

dcoutts force-pushed the dcoutts/lockstep-io-sim-tweak-generation-2 branch from 8669753 to b95de4c Compare October 22, 2024 15:10

jorisdral reviewed Oct 23, 2024

View reviewed changes

test/Test/Database/LSMTree/Normal/StateMachine.hs Show resolved Hide resolved

dcoutts added 8 commits October 23, 2024 14:41

Add instance IsString SnapshotName

58f8a14

To make it rather more convenient to use for literal constant names. This is exposed in the API, not just for tests, but I think this is a helpful API addition.

Remove now-redundant QLS tracking and tagging of table types.

f8d4f99

Tweak the QLS generation action frequency

a748b9b

Fewer: * new tables * opening of snapshots * new cursors * table closes * fewer snapshots More: * table duplicates Note also that tables and cursors are limited to 5 each at any one time.

QLS: limit the max action size to 100

40f99bd

This is intended to allow us to increase the QC size for the QLS tests to get longer action sequences, without also getting individually larger actions (which makes things too slow).

Correct a comment on the KeyForIndexCompact type

94b55cd

dcoutts force-pushed the dcoutts/lockstep-io-sim-tweak-generation-2 branch from 15dd804 to 2b3e9db Compare October 23, 2024 14:09

dcoutts force-pushed the dcoutts/lockstep-io-sim-tweak-generation-2 branch from 2b3e9db to 0a130e3 Compare October 23, 2024 14:30

dcoutts added 3 commits October 23, 2024 15:32

Disable the DL.prop_example

c4ecec3

It's utility does not justify its running time.

Rename Examples test module to UnitTests

9c68700

To better reflect its content.

dcoutts force-pushed the dcoutts/lockstep-io-sim-tweak-generation-2 branch from 0a130e3 to 9c68700 Compare October 23, 2024 14:32

jorisdral approved these changes Oct 24, 2024

View reviewed changes

Ignore a couple hlint warnings

f279f5e

dcoutts enabled auto-merge October 24, 2024 12:41

dcoutts added this pull request to the merge queue Oct 24, 2024

Merged via the queue into main with commit 2e76991 Oct 24, 2024
24 checks passed

dcoutts deleted the dcoutts/lockstep-io-sim-tweak-generation-2 branch October 24, 2024 13:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

QC Lockstep tests: tweak test case generation #436

QC Lockstep tests: tweak test case generation #436

Uh oh!

dcoutts commented Oct 22, 2024 •

edited

Loading

Uh oh!

jorisdral left a comment •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dcoutts commented Oct 23, 2024

Uh oh!

dcoutts commented Oct 23, 2024

Uh oh!

jorisdral left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	# https://docs.github.com/en/actions/using-workflows/workflow-commands-for-github-actions#setting-an-environment-variable
	- name: Set test timeout (Unix)
	if: ${{ startsWith(matrix.os, 'ubuntu') \|\| startsWith(matrix.os, 'macOS') }}
	run: \|
	echo "TASTY_TIMEOUT=5m" >> "$GITHUB_ENV"

	# https://github.com/actions/runner/issues/2281#issuecomment-1326748709
	- name: Set test timeout (Windows)
	if: ${{ startsWith(matrix.os, 'windows') }}
	run: \|
	echo "TASTY_TIMEOUT=5m" >> "$env:GITHUB_ENV"

QC Lockstep tests: tweak test case generation #436

QC Lockstep tests: tweak test case generation #436

Uh oh!

Conversation

dcoutts commented Oct 22, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jorisdral left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dcoutts commented Oct 23, 2024

Uh oh!

dcoutts commented Oct 23, 2024

Uh oh!

jorisdral left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

dcoutts commented Oct 22, 2024 •

edited

Loading

jorisdral left a comment •

edited

Loading