Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generator reproducability with function generators #236

Closed
jmid opened this issue Apr 8, 2022 · 0 comments · Fixed by #240
Closed

Generator reproducability with function generators #236

jmid opened this issue Apr 8, 2022 · 0 comments · Fixed by #240

Comments

@jmid
Copy link
Collaborator

jmid commented Apr 8, 2022

One nice property of a QuickCheck setup is the ability to reproduce a failing test's input when providing the same seed.
We were just bitten by a situation where this doesn't hold under special circumstances that involve

  • a function generator
  • a non-deterministic property

When started from the same seed, a test sequence generating 10 triples may differ (note the difference in the last 4 triples):

(7,fun,50) (60,fun,9) (4,fun,8) (8,fun,5) (8,fun,7) (1,fun,7) (5,fun,0) (8,fun,1) (0,fun,1) (4,fun,6)
or
(7,fun,50) (60,fun,9) (4,fun,8) (8,fun,5) (8,fun,7) (1,fun,7) (4,fun,1) (2,fun,8) (3,fun,0) (0,fun,4) 

Here is a minimal repro:

open QCheck

(* [apply_n f x n] computes f(f(...f(x))) with n applications of f *)
let rec apply_n f x n =
  if n=0
  then x
  else apply_n f (f x) (pred n)

let t =
  Test.make ~count:10 ~name:"t1"
    (set_shrink Shrink.nil (triple small_int (fun1 Observable.int small_int) small_int))
    (fun (i,f,j) ->
       Printf.printf "(%i,fun,%i) %!" i j;
       if 1 = Float.to_int (Unix.time ()) mod 2
       then
         (ignore(apply_n (Fn.apply f) i j > 0); true)
       else
         (ignore(apply_n (Fn.apply f) i i > 0); true)
    )

let sleep_time = 0.175;;
QCheck_runner.run_tests ~rand:(Random.State.make [|1234|]) [t];;
Unix.sleepf sleep_time;;
QCheck_runner.run_tests ~rand:(Random.State.make [|1234|]) [t];;
Unix.sleepf sleep_time;;
QCheck_runner.run_tests ~rand:(Random.State.make [|1234|]) [t];;
Unix.sleepf sleep_time;;
QCheck_runner.run_tests ~rand:(Random.State.make [|1234|]) [t];;
Unix.sleepf sleep_time;;
QCheck_runner.run_tests ~rand:(Random.State.make [|1234|]) [t];;

This (admittedly ridiculous) test generates a function and two small_ints, then tests a vacuously true property, that happens to perform one function application or another, depending on Unix.time.
As another side-effect in prints the generated triple - for our viewing pleasure.

When I simply #use the file repeatedly from utop I'm able to observe changes in the generated triples, despite starting the test from fresh RNGs seeded identically:

val apply_n : ('a -> 'a) -> 'a -> int -> 'a = <fun>
val t : Test.t = QCheck2.Test.Test <abstr>
val sleep_time : float = 0.175
(7,fun,50) (60,fun,9) (4,fun,8) (8,fun,5) (8,fun,7) (1,fun,7) (5,fun,0) (8,fun,1) (0,fun,1) (4,fun,6) ================================================================================
success (ran 1 tests)
- : int = 0
- : unit = ()
(7,fun,50) (60,fun,9) (4,fun,8) (8,fun,5) (8,fun,7) (1,fun,7) (5,fun,0) (8,fun,1) (0,fun,1) (4,fun,6) ================================================================================
success (ran 1 tests)
- : int = 0
- : unit = ()
(7,fun,50) (60,fun,9) (4,fun,8) (8,fun,5) (8,fun,7) (1,fun,7) (5,fun,0) (8,fun,1) (0,fun,1) (4,fun,6) ================================================================================
success (ran 1 tests)
- : int = 0
- : unit = ()
(7,fun,50) (60,fun,9) (4,fun,8) (8,fun,5) (8,fun,7) (1,fun,7) (4,fun,1) (2,fun,8) (3,fun,0) (0,fun,4) ================================================================================
success (ran 1 tests)
- : int = 0
- : unit = ()
(7,fun,50) (60,fun,9) (4,fun,8) (8,fun,5) (8,fun,7) (1,fun,7) (4,fun,1) (2,fun,8) (3,fun,0) (0,fun,4) ================================================================================
success (ran 1 tests)

In retrospect, given how we generate functions (lazily inhabiting an underlying function table by generation, IIUC), this is probably not too surprising.

Nevertheless, by doing so, we lose a key property of QuickCheck, namely that the test value generation is independent of the usage.
For testing multicore programs, where behavior can be non-deterministic depending on scheduling, we depend and build on that property and would therefore like to see it restored.

I have an initial, minimal fix at https://github.com/jmid/qcheck/tree/splittable-qcheck1
but this makes the test fail_pred_strings in the test suite succeed (it was supposed to fail).
We probably want to use a splittable PRNG to restore independence.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant