-
Notifications
You must be signed in to change notification settings - Fork 9
QC Lockstep tests: tweak test case generation #436
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The purpose of the change is to allow the generation of actions to look up information from the model. Instead of action generation only being given a function to find model variables, it now gets a context from which one can both find model variables, and look those variables up. This will allow filtering variables based on information from the model, such as whether tables are closed. This first patch just updates to the API changes without making any changes in functionalty.
In preparation for adding more unit tests.
8669753 to
b95de4c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
Tests are timing out because CI currently only allows tests to run for around 5 minutes. These lines will have to be tweaked:
lsm-tree/.github/workflows/haskell.yml
Lines 123 to 133 in 3f5bda8
| # https://docs.github.com/en/actions/using-workflows/workflow-commands-for-github-actions#setting-an-environment-variable | |
| - name: Set test timeout (Unix) | |
| if: ${{ startsWith(matrix.os, 'ubuntu') || startsWith(matrix.os, 'macOS') }} | |
| run: | | |
| echo "TASTY_TIMEOUT=5m" >> "$GITHUB_ENV" | |
| # https://github.com/actions/runner/issues/2281#issuecomment-1326748709 | |
| - name: Set test timeout (Windows) | |
| if: ${{ startsWith(matrix.os, 'windows') }} | |
| run: | | |
| echo "TASTY_TIMEOUT=5m" >> "$env:GITHUB_ENV" |
To make it rather more convenient to use for literal constant names. This is exposed in the API, not just for tests, but I think this is a helpful API addition.
Previously the QLS test spent half of its search space on making sure we
get coverage for using operations at two different table types. While it
is important to get this coverage, we do not need to divide our search
space in half to achieve it. That search space can be better spent on
longer sequences of actions on fewer tables that can have sharing and
thus interesting interactions.
So we move the {Key,Value,Blob}{1,2} types into the unit test module,
and do a simple unit test where we create, insert, snapshot and restore
tables of two different types.
And we also simplify the QLS test to use a single set of key,value,blob
types.
For snapshots we cover the cases of: DeleteMissingSnapshot, OpenMissingSnapshot, SnapshotTwice and OpenSnapshotWrongLabel. (These are all tags from the QLS test.) For tables and cursors we cover use after close and idempotent close.
The intention here is to cover the "boring" cases of using tables that are not open using unit tests, so that more of the QLS search space can be spent on interesting things that may take longer action sequences to find. So having added specific unit tests for snapshots, tables and cursors we now adjust the generation of actions so that we only ever generate references to resources that are currently open (or equivalent for snapshots). This involves consulting the model when we generate actions and filtering out variables that do not correspond to available resources. We leave blobrefs unfiltered since they have some interesting logic to them. But these cover only a small portion of the search space. In particular we also limit the number open tables and open cursors to 5 at once. By having fewer tables, we have more operations per table, and thus more opportunity for interesting interleavings on duplicate tables.
Fewer: * new tables * opening of snapshots * new cursors * table closes * fewer snapshots More: * table duplicates Note also that tables and cursors are limited to 5 each at any one time.
This is intended to allow us to increase the QC size for the QLS tests to get longer action sequences, without also getting individually larger actions (which makes things too slow).
15dd804 to
2b3e9db
Compare
|
In the first version of this PR the running times on my machine were: But with a couple changes they're down to this: Those changes are:
In particular, the latter was not justifying its long running time with any crucial coverage. |
|
Oh and so hopefully this will mean we do not need to adjust the CI testsuite timeouts. 🤞 |
2b3e9db to
0a130e3
Compare
Use pre-existing test types created for this purpose with reasonable looking Arbitrary instances. These are definately larger. They may be too large for this kind of test. The test running times are longer. Also have to adjust the DL example to match the key type. In particular the key type is now designed to create some collisions, so the size of a sequence of keys is not the same as the size of the map of the same keys.
It's utility does not justify its running time.
To better reflect its content.
0a130e3 to
9c68700
Compare
jorisdral
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! There is just the hlint failure to fix, but then we can merge
This is an attempt to get the QLS tests to get more coverage of the duplicate table feature, and more generally to have more interesting coverage.
In particular, we move several of the boring special cases out of the QLS test and into unit tests. This is for things like operating on closed table handles, or opening non-existent snapshots. These things of course must be covered, but they don't have interesting interactions so they don't need to be covered in complex situations. This lets the limited length of action sequences to be used for more interesting things.
See quickcheck-dynamic#85 for further details on why action sequences are so short.
We also change how actions are generated to have fewer tables, and thus more actions per table. We limit the number of tables at once to 5.
Initial results (from 28c4890):
and now we get:
So this is considerably improved, with 45% of runs now having non-trivial interleavings rather than 12%. And where previously we had only 2.3% of original tables with interleavings of length 8 or more, we now have over 20%.
One important caveat is that the change in the key and value types mean the tests run longer. Summary of running times from my system: