Sub-optimal QCheck2 function shrinkers #163

jmid · 2021-08-18T21:03:46Z

Sub-optimal function shrinkers

A number of the function tests illustrate a difference in the function shrinker's output (terminal cuts off at width 160):

--- Failure -----------------------------------------------------------------	--- Failure -----------------------------------------------------------------

Test fail_pred_map_commute failed (127 shrink steps):			      |	Test fail_pred_map_commute failed (16 shrink steps):

([3], {_ -> 0}, {3 -> false; _ -> true})				      |	([2], {_ -> 0}, {1 -> false; 2 -> true; _ -> false})

...

--- Failure -----------------------------------------------------------------	--- Failure -----------------------------------------------------------------

Test fold_left fold_right failed (25 shrink steps):			      |	Test fold_left fold_right failed (22 shrink steps):

(0, [1], {(1, 0) -> 1; _ -> 0})						      |	(0, [1], {(1, 0) -> 1; (8, 0) -> 0; (8, 8) -> 0; (8, 93) -> 0; (7, 7) -> 0; (

...

--- Failure -----------------------------------------------------------------	--- Failure -----------------------------------------------------------------

Test fold_left fold_right uncurried failed (111 shrink steps):		      |	Test fold_left fold_right uncurried failed (325 shrink steps):

({(5, 7) -> 0; _ -> 7}, 0, [5; 0])					      |	({(23, 62) -> 0; (9, 42) -> 0; (8, 61) -> 0; (8, 5) -> 0; (30, 5) -> 0; (9, 6

--- Failure -----------------------------------------------------------------	--- Failure -----------------------------------------------------------------

Test fold_left fold_right uncurried fun last failed (26 shrink steps):	      |	Test fold_left fold_right uncurried fun last failed (25 shrink steps):

(0, [1], {(0, 1) -> 1; _ -> 0})						      |	(0, [1], {(0, 2) -> 0; (8, 80) -> 0; (93, 9) -> 0; (7, 24) -> 0; (8, 0) -> 0;

The steps spent vary between the two approaches. The last two are perhaps most saying.
First off, here's the counterexamples reported by QCheck2:

--- Failure --------------------------------------------------------------------

Test fold_left fold_right uncurried failed (325 shrink steps):

({(23, 62) -> 0; (9, 42) -> 0; (8, 61) -> 0; (8, 5) -> 0; (30, 5) -> 0; (9, 6) -> 0; (76, 6) -> 0; (19, 31) -> 0; (7, 62) -> 0; (0, 7) -> 1; (7, 1) -> 0; (78, 4) -> 0; (8, 2) -> 0; (78, 0) -> 0; (3, 47) -> 0; (4, 8) -> 0; (98, 9) -> 0; (1, 38) -> 0; (0, 26) -> 0; (1, 7) -> 0; (86, 3) -> 0; (9, 37) -> 0; (8, 1) -> 0; (79, 9) -> 0; (3, 5) -> 0; (56, 8) -> 0; (2, 5) -> 0; (8, 8) -> 0; (56, 67) -> 0; (5, 60) -> 0; (2, 31) -> 0; (61, 6) -> 0; (12, 5) -> 0; (76, 2) -> 0; (78, 8) -> 0; (1, 1) -> 0; (8, 9) -> 0; (7, 8) -> 0; (2, 9) -> 0; (29, 7) -> 0; (5, 8) -> 0; (28, 6) -> 0; (1, 4) -> 0; (9, 79) -> 0; (0, 1) -> 0; (1, 41) -> 0; (82, 98) -> 0; (6, 79) -> 0; (7, 6) -> 0; (4, 3) -> 0; (8, 12) -> 0; (5, 1) -> 0; (39, 1) -> 0; (3, 6) -> 0; (1, 2) -> 0; (76, 31) -> 0; (4, 1) -> 0; (6, 5) -> 0; (0, 8) -> 0; (8, 7) -> 0; (2, 6) -> 0; (52, 5) -> 0; (8, 47) -> 0; (5, 3) -> 0; (7, 9) -> 0; (13, 13) -> 0; (0, 87) -> 0; (82, 0) -> 0; (34, 8) -> 0; (1, 14) -> 0; (2, 71) -> 0; (52, 4) -> 0; (1, 3) -> 0; (85, 6) -> 0; (8, 19) -> 0; (3, 13) -> 0; (69, 1) -> 0; (5, 62) -> 0; (0, 15) -> 0; (34, 0) -> 0; (9, 4) -> 0; (0, 6) -> 0; (1, 8) -> 0; (86, 6) -> 0; (4, 5) -> 0; (3, 1) -> 0; (57, 2) -> 0; (3, 3) -> 0; (4, 0) -> 0; (30, 6) -> 0; (5, 34) -> 0; (0, 4) -> 0; (2, 3) -> 0; (5, 6) -> 0; (5, 7) -> 0; (5, 0) -> 0; (4, 4) -> 0; (7, 5) -> 0; (78, 2) -> 0; (9, 8) -> 0; (7, 70) -> 0; (35, 1) -> 0; (64, 7) -> 0; (60, 0) -> 0; (1, 9) -> 0; _ -> 0}, 0, [0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 0; 7])

--- Failure --------------------------------------------------------------------

Test fold_left fold_right uncurried fun last failed (25 shrink steps):

(0, [1], {(0, 2) -> 0; (8, 80) -> 0; (93, 9) -> 0; (7, 24) -> 0; (8, 0) -> 0; (9, 7) -> 0; (0, 24) -> 0; (0, 7) -> 0; (7, 1) -> 0; (8, 9) -> 0; (24, 0) -> 0; (5, 8) -> 0; (1, 0) -> 1; (4, 8) -> 0; (7, 0) -> 0; (5, 7) -> 0; (8, 4) -> 0; (24, 5) -> 0; (0, 1) -> 0; (2, 8) -> 0; (9, 1) -> 0; (8, 8) -> 0; _ -> 0})

It is clear that generating functions last in a tuple rather than first (a hard-learned lesson from issue #8) is still a clear win for integrated shrinking. Still, there is quite a few more bindings reported in the above counterexample compared to QCheck.
Looking at the function generator and shrinker I can see a list of bindings is reduced using Tree.applicative_take which takes at 0-len bindings:

    let shrinks : (k, v) t Tree.t Seq.t = fun () ->
      (* This only gets evaluated *after* the test was run for [tbl], meaning it is correctly
         populated with bindings recorded during the test already *)
      let current_bindings : (k * v Tree.t) list = List.rev !(root.p_tree_bindings_rev) in
      let take_at_most_tree : int Tree.t = Tree.make_primitive (Shrink.int_towards 0) (List.length current_bindings) in
      let current_tree_bindings : (k * v) Tree.t list = List.map (fun (k, tree) -> Tree.map (fun v -> (k, v)) tree) current_bindings in
      let shrunk_bindings_tree : (k * v) list Tree.t = Tree.bind take_at_most_tree (fun take_at_most -> Tree.applicative_take take_at_most current_tree_bindings) in
      (* During shrinking, we don't want to record/add bindings, so [~extend:false]. *)
      let shrunk_poly_tbl_tree : (k, v) t Tree.t = Tree.map (fun bindings -> List.to_seq bindings |> T.of_seq |> make ~extend:false) shrunk_bindings_tree in
      (* [shrunk_poly_tbl_tree] is a bit misleading: its root *should* be the same as [root] but because of the required laziness
         induced by the mutation of bindings, we don't use it, only graft its children to the original [root]. *)
      Tree.children shrunk_poly_tbl_tree ()
    in
    Tree.Tree (root, shrinks)

This looks like a quick algorithmic list shrinker to me - also with opportunities for improvements.

It puzzled me that the take_at_most wasn't reducing the above list of bindings further. After all, the outlier binding (1, 0) -> 1 is in the middle!

I therefore added a shrink log of successful shrink attempts by replacing l.355 of QCheck2_expect_test.ml with:

  QCheck_base_runner.run_tests ~colors:false ~debug_shrink:(Some (open_out "funshrinklog.txt")) ~debug_shrink_list:["fold_left fold_right uncurried fun last"] ([

After rerunning the tests, this revealed the following in _build/default/test/core/funshrinklog.txt:


~~~ Shrink ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Test fold_left fold_right uncurried fun last sucessfully shrunk counter example (step 0) to:

(5, [7; 8; 9; 24], {(8, 80) -> 0; (7, 24) -> 7; (8, 9) -> 7; (7, 0) -> 3; (5, 7) -> 8; (24, 5) -> 1; (9, 1) -> 80; (8, 8) -> 8; _ -> 25})

~~~ Shrink ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Test fold_left fold_right uncurried fun last sucessfully shrunk counter example (step 1) to:

(0, [7; 8; 9; 24], {(8, 80) -> 0; (93, 9) -> 0; (7, 24) -> 7; (9, 7) -> 4; (0, 24) -> 1; (0, 7) -> 5; (7, 1) -> 8; (8, 9) -> 7; (24, 0) -> 7; (5, 8) -> 93; (7, 0) -> 3; (5, 7) -> 8; (8, 4) -> 1; (24, 5) -> 1; (9, 1) -> 80; (8, 8) -> 8; _ -> 25})

~~~ Shrink ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Test fold_left fold_right uncurried fun last sucessfully shrunk counter example (step 2) to:

(0, [2; 8], {(0, 2) -> 4; (8, 80) -> 0; (93, 9) -> 0; (7, 24) -> 7; (8, 0) -> 8; (9, 7) -> 4; (0, 24) -> 1; (0, 7) -> 5; (7, 1) -> 8; (8, 9) -> 7; (24, 0) -> 7; (5, 8) -> 93; (4, 8) -> 18; (7, 0) -> 3; (5, 7) -> 8; (8, 4) -> 1; (24, 5) -> 1; (2, 8) -> 1; (9, 1) -> 80; (8, 8) -> 8; _ -> 25})

~~~ Shrink ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Test fold_left fold_right uncurried fun last sucessfully shrunk counter example (step 3) to:

(0, [1], {(0, 2) -> 4; (8, 80) -> 0; (93, 9) -> 0; (7, 24) -> 7; (8, 0) -> 8; (9, 7) -> 4; (0, 24) -> 1; (0, 7) -> 5; (7, 1) -> 8; (8, 9) -> 7; (24, 0) -> 7; (5, 8) -> 93; (1, 0) -> 6; (4, 8) -> 18; (7, 0) -> 3; (5, 7) -> 8; (8, 4) -> 1; (24, 5) -> 1; (0, 1) -> 0; (2, 8) -> 1; (9, 1) -> 80; (8, 8) -> 8; _ -> 25})

~~~ Shrink ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Test fold_left fold_right uncurried fun last sucessfully shrunk counter example (step 4) to:

(0, [1], {(0, 2) -> 4; (8, 80) -> 0; (93, 9) -> 0; (7, 24) -> 7; (8, 0) -> 8; (9, 7) -> 4; (0, 24) -> 1; (0, 7) -> 5; (7, 1) -> 8; (8, 9) -> 7; (24, 0) -> 7; (5, 8) -> 93; (1, 0) -> 6; (4, 8) -> 18; (7, 0) -> 3; (5, 7) -> 8; (0, 0) -> 3; (8, 4) -> 1; (24, 5) -> 1; (0, 1) -> 0; (2, 8) -> 1; (9, 1) -> 80; (8, 8) -> 8; _ -> 0})

[...]

This looks fishy to me: while the first elements of the tuple are reduced, the last one (the function table) actually gets longer in each attempt!
I think this is actually due to how Gen.triple is currently defined (and thus shrinks). I'll therefore add some tests for it and report separately.

Originally posted by @jmid in #153 (comment)

The text was updated successfully, but these errors were encountered:

jmid mentioned this issue Sep 7, 2021

Factor shrink loggers #172

Open

jmid mentioned this issue Mar 18, 2022

Disable QCheck2 shrinker with a parameter #232

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sub-optimal QCheck2 function shrinkers #163

Sub-optimal QCheck2 function shrinkers #163

jmid commented Aug 18, 2021

Sub-optimal QCheck2 function shrinkers #163

Sub-optimal QCheck2 function shrinkers #163

Comments

jmid commented Aug 18, 2021