sql: introduce the random cost fuzzer test #76093

jordanlewis · 2022-02-04T22:50:43Z

This is a new test that leverages the optimizer cost perturbation test
mode and SQLSmith statemente generation. The test runs in a loop, by
creating a random statement, running it as normal, then running it with
perturbed costs, and comparing the results. If the results are not
identical with and without cost perturbation, there should be a bug
somewhere in either the execution engine (one of the operators doing the
wrong thing) or the optimizer (one of the transformation rules doing the
wrong thing).

Release note: None

cockroach-teamcity · 2022-02-04T22:50:50Z

This change is

jordanlewis · 2022-02-04T22:52:32Z

This is still a draft for a couple of reasons.

Probably, this should be a roachtest that can run for longer, rather than a unit test.
It's frustrating trying to reproduce any issues found (and so far, I believe I've actually found some issues!). The reason is that the mutations are crucial to run, but I haven't yet generated a usable mutation log to send to a reproduction database. Once that's done, I imagine this tool should be fairly easy to use: run the test, input the mutation log to a fresh database, and then run the statement as normal and again with the random cost seed that's returned in the output.

rytaft

Very cool! Exciting to see this

Reviewed 9 of 10 files at r1, 1 of 1 files at r2, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @jordanlewis)

pkg/sql/opt/xform/coster.go, line 450 at r2 (raw file):

	if perturbation != 0 || seed != 0 {
		c.rng, _ = randutil.NewPseudoRand()
		c.rng.Seed(seed)

Doesn't look like you're ever using the random generator. Did you want to use it here?

cockroach/pkg/sql/opt/xform/coster.go

Line 558 in 9de35f5

multiplier := 2*rand.Float64() - 1

pkg/sql/tests/cost_fuzzer_test.go, line 50 at r2 (raw file):

	// Initialize a smither that generates only INSERT, UPDATE, and DELETE
	// statements with the MutationsOnly option. Smither.GenerateTLP always

Doesn't look like you're using GenerateTLP

pkg/sql/tests/cost_fuzzer_test.go, line 84 at r2 (raw file):

		stmt := smither.Generate()
		//t.Logf("Running SQL %s", stmt)

debug code

jordanlewis

Responded to comments. I also added a SetComplexity method that allows setting the "complexity" (likelihood to recurse) to the Smither to try to convince it to generate queries with more joins, and bumped the complexity for this test to .3.

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @rytaft)

pkg/sql/opt/xform/coster.go, line 450 at r2 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

Doesn't look like you're ever using the random generator. Did you want to use it here?

cockroach/pkg/sql/opt/xform/coster.go

Line 558 in 9de35f5

multiplier := 2*rand.Float64() - 1

Done.

pkg/sql/tests/cost_fuzzer_test.go, line 50 at r2 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

Doesn't look like you're using GenerateTLP

Done.

pkg/sql/tests/cost_fuzzer_test.go, line 84 at r2 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

debug code

Done.

This is a new test that leverages the optimizer cost perturbation test mode and SQLSmith statemente generation. The test runs in a loop, by creating a random statement, running it as normal, then running it with perturbed costs, and comparing the results. If the results are not identical with and without cost perturbation, there should be a bug somewhere in either the execution engine (one of the operators doing the wrong thing) or the optimizer (one of the transformation rules doing the wrong thing). Release note: None

Release note: None

jordanlewis · 2022-02-12T00:10:16Z

I've added the random rule disabler to this test as well. It turned up some interesting problems. I'm not totally sure whether it's a good idea to keep the rule disabler on, as it seems to cause a bunch of unrealistic problems (see #76455).

I added a few new rules to the "don't disable" list because they caused some of those unrealistic problems, but it seems like most of the rules can cause problems when disabled, so I'm not quite sure what to do there.

I also added a test clause that fails the test if the perturbed query returns an internal error, which turned up a bunch of stuff that motivated #76458. I think it would be good practice to keep hammering on the "no internal errors allowed unless they are really internal" rule to make sure that we can run fuzzers like this one and actually care when an XX000 internal error comes up.

Before this merges, I need to turn it into a roachtest. So consider the changes still in draft.

rytaft

Looking good!

It's true that a number of the normalization rules are required for correctness, and the only way I know of to figure out which ones is unfortunately trial and error. But I think it would still be valuable to include the rule disabler if we can.

Reviewed 7 of 7 files at r3, 3 of 3 files at r4, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @jordanlewis)

pkg/sql/opt/xform/optimizer.go, line 102 at r4 (raw file):

	jb JoinOrderBuilder

	// rng is used to randomly disable transformation rules for test scenarios.

Isn't it also used to randomly perturb the cost?

michae2 · 2022-04-06T17:40:11Z

@jordanlewis are you still working on this? How would you feel if I tried to turn the cost fuzzing portion into a roachtest?

jordanlewis · 2022-04-06T17:43:57Z

I've been unable to find time to focus on this. I very much support you taking it over in whatever form you find useful!

jordanlewis requested review from a team as code owners February 4, 2022 22:50

jordanlewis marked this pull request as draft February 4, 2022 22:50

jordanlewis force-pushed the cost-fuzzer branch from dffacc1 to eeb6a80 Compare February 6, 2022 02:55

rytaft reviewed Feb 10, 2022

View reviewed changes

jordanlewis force-pushed the cost-fuzzer branch from eeb6a80 to 9ed7785 Compare February 11, 2022 21:05

jordanlewis commented Feb 11, 2022

View reviewed changes

jordanlewis mentioned this pull request Feb 11, 2022

sql: internal error: optimizer factory constructor call stack exceeded max (with disabled rules) #76455

Closed

jordanlewis added 2 commits February 11, 2022 19:04

tests: randomly disable xform rules in cost fuzzer test

fa593f7

Release note: None

jordanlewis force-pushed the cost-fuzzer branch from 9ed7785 to fa593f7 Compare February 12, 2022 00:06

rytaft reviewed Feb 14, 2022

View reviewed changes

michae2 mentioned this pull request Apr 14, 2022

roachtest: add costfuzz test #79973

Merged

jordanlewis closed this Jul 1, 2022

jordanlewis deleted the cost-fuzzer branch July 17, 2022 05:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sql: introduce the random cost fuzzer test #76093

sql: introduce the random cost fuzzer test #76093

jordanlewis commented Feb 4, 2022

cockroach-teamcity commented Feb 4, 2022

jordanlewis commented Feb 4, 2022

rytaft left a comment

jordanlewis left a comment

jordanlewis commented Feb 12, 2022

rytaft left a comment

michae2 commented Apr 6, 2022

jordanlewis commented Apr 6, 2022

sql: introduce the random cost fuzzer test #76093

sql: introduce the random cost fuzzer test #76093

Conversation

jordanlewis commented Feb 4, 2022

cockroach-teamcity commented Feb 4, 2022

jordanlewis commented Feb 4, 2022

rytaft left a comment

Choose a reason for hiding this comment

jordanlewis left a comment

Choose a reason for hiding this comment

jordanlewis commented Feb 12, 2022

rytaft left a comment

Choose a reason for hiding this comment

michae2 commented Apr 6, 2022

jordanlewis commented Apr 6, 2022