-
-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve generated and random testing support #18
Comments
Added largescale report to address point 1. |
Ah! I came here to create this issue :P I'll come back to add details, ideas and a plan. Ok, quick look at the readme. I think a new type of test make sense (I'll confirm once I've looked at the implementation.) |
Here's the current behaviour, to help highlight what needs improvment: (defpackage #:randomized-testing
(:documentation "Scratch file to explore randomized tests with parachute")
(:use #:cl)
(:import-from #:parachute
#:define-test
#:define-test+run
#:is
#:true
#:false))
(in-package #:randomized-testing)
;; Define a dummy function to test
(defun buggy (x)
"Like cl:identity, but returns the wrong value for 2."
(if (= 2 x) 3 x))
;; Define a dummy PRGN
(defun make-fake-prng ()
(let ((x 0))
(lambda ()
(incf x))))
;; That's what a radomized test would like _with the current implementation_
(define-test+run buggy
(let ((prng (make-fake-prng)))
(loop
:repeat 10 ; we run the randomized test 10 times
:for x = (funcall prng)
:do (is = x (buggy x)))))
;; Here's what the output looks like:
#|
? RANDOMIZED-TESTING::BUGGY
0.000 ✔ (is = x (buggy x))
0.000 ✘ (is = x (buggy x))
0.000 ✔ (is = x (buggy x))
0.000 ✔ (is = x (buggy x))
0.000 ✔ (is = x (buggy x)) ; These are not useful
0.000 ✔ (is = x (buggy x)) ; They are not descriptive either
0.000 ✔ (is = x (buggy x))
0.000 ✔ (is = x (buggy x))
0.000 ✔ (is = x (buggy x))
0.000 ✔ (is = x (buggy x))
0.004 ✘ RANDOMIZED-TESTING::BUGGY
;; Summary:
Passed: 9 ; Should we count them differently?
Failed: 1
Skipped: 0
;; Failures:
1/ 10 tests failed in RANDOMIZED-TESTING::BUGGY ; What about this count?
The test form (buggy x) ; For more complex tests, this would not be
evaluated to 3 ; enough to know which inputs failed the test
when 2
was expected to be equal under =.
#<PARACHUTE:PLAIN 11, FAILED results>
((IS = X (BUGGY X)))
|# The user can improve the situation a bit, by supplying a description to the test result: (define-test+run buggy
(let ((prng (make-fake-prng)))
(loop
:repeat 10 ; we run the randomized test 10 times
:for x = (funcall prng)
- :do (is = x (buggy x)))))
+ :do (is = x (buggy x) "Failed with input ~a" x)))) Then the output would be slightly improved: [...]
;; Summary:
Passed: 9
Failed: 1
Skipped: 0
;; Failures:
1/ 10 tests failed in RANDOMIZED-TESTING::BUGGY
The test form (buggy x)
evaluated to 3
when 2
was expected to be equal under =.
+Failed with input 2
[...]
If we expand the (define-test+run buggy
(let ((prng (make-fake-prng)))
(loop
:repeat 10 ; we run the randomized test 10 times
:for x = (funcall prng)
:do (eval-in-context *context*
(make-instance 'comparison-result
:expression '(is = x (buggy x))
:value-form '(buggy x)
:body (lambda () (buggy x))
:expected x
:comparison '=))))) And if we tweak it just a little bit: (define-test+run buggy
(let ((prng (make-fake-prng)))
(loop
:repeat 10 ; we run the randomized test 10 times
:for x = (funcall prng)
:do (eval-in-context *context*
(make-instance 'comparison-result
- :expression '(is = x (buggy x))
- :value-form '(buggy x)
+ :expression `(is = x (buggy ,x))
+ :value-form `(buggy ,x)
:body (lambda () (buggy x))
:expected x
:comparison '=))))) We get a much more useful output:
Still too verbose for now. |
In This leads to much better reports, but is obviously quite unwieldy. |
My current thought is that we need some way to communicate variable parts in a test form to the test macro, so that it knows which parts to take from the runtime environment, and which parts to take from the expression. That would at least solve the reports issue. How exactly that could be worked out, I'm not sure. It is very likely though that we'll need separate analogues of |
Another thought: for instance, I would like to randomly generate trees, then perform a random sequence of interactions on the tree, checking an invariant at each step. Ideally the test report on failure would say something like:
Meaning: it captures the i/o on each operation, and then stores them in the failure report so we know how to replicate the results. In cases where global state is modified we might also need to instruct it to capture that global state, too. |
Generalising this: it seems useful to be able to attach some contextual operation to another test form. Maybe something like |
To avoid macroising it too much, how about a |
Again, I'll come back later to add details, but I was able to hack something up (as in "it works, but the code is very horrible and I didn't think about all the implications yet"): (define-test+run buggy
(randomize ((x (gen-integer 0 3)))
(is = x (identity x))
(is = x (buggy x)))) And it outputs something like this:
Note to myself: also print the failed result's expression. |
Here are the things I've tried: breeze/scratch-files/randomized-testing.lisp It's low quality, because I just wanted to try anything and because I needed to learn a bit more about how parachute works. Lastly, the code is in my project "breeze" because the whole point of this project is to try stuff. So, I would like some feedback before I try making something cleaner. Oh! I also thought about the DSL for generating more complex data (like trees), and I think it might not be that hard. Using a function that looks like like an eval, but that returns a function that takes a prng as input... Something along those lines |
Sorry, I'm having trouble gleaming your thinking from the lisp file, so I'm not entirely sure what, concretely, you're proposing. Let me try to outline my current understanding, and please correct me if I'm wrong or missing something: We want to be able to do three things:
I'm not talking about data at all here, as 3. seems like a much more general and powerful concept than explicitly generating data structures. If we can support randomising operations, we can use that to generate the random data, too, and also test protocols that don't primarily rely on data structures, or rely on data structures that are highly private and annoying to construct via anything but these operations we want to test, anyhow. I think 3. could also be left as a separate issue, as with 1. we can also leave it up to the user to decide how to "generate" the operations, as long as the system can take care of capturing the context we care about to analyse the failure. Finally, 2. should be rather trivial and more a question of taste. I can see either approach working, the latter probably just with some kinda flag to mark results as "unimportant", which are then skipped in the report. |
That's totally fine and I was pretty much expecting it. We're on the same page about the 3 points. But I think there's one thing we might not be exactly on the same page: I'm trying to implement 2 kinds of randomized tests: PBT and MBT (more on that in a bit), whereas you seem (to me) to be more focused on the MBT kind. I'm also trying to make sure that everything is reproducible by using PRNGs that can be seeded, and trying to make sure that it's going to be possible to add shriking later on. Some background info
PBT was popularized by an Haskell library named QuickCheck, and now there are "quick/fast/rapid" test libraries everywhere xD (see the wikipedia page). I'm trying to use the same vocabulary where it make sense. What I tried so farReporting and PBTI tried two things to help with reporting:
Furthermore, the macro I use to rebind the (randomize ((x (gen-integer 0 3)))
(is = x (identity x))) It expands into a loop that generates the data for the tests. Implementation aside, I think I quite like this syntax. What about you? Almost forgot the thing I like the most about it: it's easy to modify existing tests to use generated data. MBTFor model-based testing, I did a very query and very very dirty prototype.
Finally, there's a loop that
My questions so far
I have more, but let's start with that. RoadmapHere's my high-level plan:
I swear I didn't mean to write a novel, but I just thought about something (that is super-evident): I could split the
Both parts would be bundled together in one P.S. I didn't re-read all of this... Knowing myself, it's probably full of typos and weird syntax. I'll try to re-read it later when I have slept 😅 Bonne nuit! |
More food for thought: https://lobste.rs/s/xk9mhh/run_property_tests_until_coverage_stops |
Parachute should support testing where the tests are runtime (possibly randomly) generated better. In particular, this would involve:
The text was updated successfully, but these errors were encountered: