Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: introduce the random cost fuzzer test #76093

Closed
wants to merge 2 commits into from

Conversation

jordanlewis
Copy link
Member

This is a new test that leverages the optimizer cost perturbation test
mode and SQLSmith statemente generation. The test runs in a loop, by
creating a random statement, running it as normal, then running it with
perturbed costs, and comparing the results. If the results are not
identical with and without cost perturbation, there should be a bug
somewhere in either the execution engine (one of the operators doing the
wrong thing) or the optimizer (one of the transformation rules doing the
wrong thing).

Release note: None

@jordanlewis jordanlewis requested review from a team as code owners February 4, 2022 22:50
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@jordanlewis jordanlewis marked this pull request as draft February 4, 2022 22:50
@jordanlewis
Copy link
Member Author

This is still a draft for a couple of reasons.

  1. Probably, this should be a roachtest that can run for longer, rather than a unit test.
  2. It's frustrating trying to reproduce any issues found (and so far, I believe I've actually found some issues!). The reason is that the mutations are crucial to run, but I haven't yet generated a usable mutation log to send to a reproduction database. Once that's done, I imagine this tool should be fairly easy to use: run the test, input the mutation log to a fresh database, and then run the statement as normal and again with the random cost seed that's returned in the output.

Copy link
Collaborator

@rytaft rytaft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very cool! Exciting to see this

Reviewed 9 of 10 files at r1, 1 of 1 files at r2, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @jordanlewis)


pkg/sql/opt/xform/coster.go, line 450 at r2 (raw file):

	if perturbation != 0 || seed != 0 {
		c.rng, _ = randutil.NewPseudoRand()
		c.rng.Seed(seed)

Doesn't look like you're ever using the random generator. Did you want to use it here?

multiplier := 2*rand.Float64() - 1


pkg/sql/tests/cost_fuzzer_test.go, line 50 at r2 (raw file):

	// Initialize a smither that generates only INSERT, UPDATE, and DELETE
	// statements with the MutationsOnly option. Smither.GenerateTLP always

Doesn't look like you're using GenerateTLP


pkg/sql/tests/cost_fuzzer_test.go, line 84 at r2 (raw file):

		stmt := smither.Generate()
		//t.Logf("Running SQL %s", stmt)

debug code

Copy link
Member Author

@jordanlewis jordanlewis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Responded to comments. I also added a SetComplexity method that allows setting the "complexity" (likelihood to recurse) to the Smither to try to convince it to generate queries with more joins, and bumped the complexity for this test to .3.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @rytaft)


pkg/sql/opt/xform/coster.go, line 450 at r2 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

Doesn't look like you're ever using the random generator. Did you want to use it here?

multiplier := 2*rand.Float64() - 1

Done.


pkg/sql/tests/cost_fuzzer_test.go, line 50 at r2 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

Doesn't look like you're using GenerateTLP

Done.


pkg/sql/tests/cost_fuzzer_test.go, line 84 at r2 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

debug code

Done.

This is a new test that leverages the optimizer cost perturbation test
mode and SQLSmith statemente generation. The test runs in a loop, by
creating a random statement, running it as normal, then running it with
perturbed costs, and comparing the results. If the results are not
identical with and without cost perturbation, there should be a bug
somewhere in either the execution engine (one of the operators doing the
wrong thing) or the optimizer (one of the transformation rules doing the
wrong thing).

Release note: None
@jordanlewis
Copy link
Member Author

I've added the random rule disabler to this test as well. It turned up some interesting problems. I'm not totally sure whether it's a good idea to keep the rule disabler on, as it seems to cause a bunch of unrealistic problems (see #76455).

I added a few new rules to the "don't disable" list because they caused some of those unrealistic problems, but it seems like most of the rules can cause problems when disabled, so I'm not quite sure what to do there.

I also added a test clause that fails the test if the perturbed query returns an internal error, which turned up a bunch of stuff that motivated #76458. I think it would be good practice to keep hammering on the "no internal errors allowed unless they are really internal" rule to make sure that we can run fuzzers like this one and actually care when an XX000 internal error comes up.

Before this merges, I need to turn it into a roachtest. So consider the changes still in draft.

Copy link
Collaborator

@rytaft rytaft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good!

It's true that a number of the normalization rules are required for correctness, and the only way I know of to figure out which ones is unfortunately trial and error. But I think it would still be valuable to include the rule disabler if we can.

Reviewed 7 of 7 files at r3, 3 of 3 files at r4, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @jordanlewis)


pkg/sql/opt/xform/optimizer.go, line 102 at r4 (raw file):

	jb JoinOrderBuilder

	// rng is used to randomly disable transformation rules for test scenarios.

Isn't it also used to randomly perturb the cost?

@michae2
Copy link
Collaborator

michae2 commented Apr 6, 2022

@jordanlewis are you still working on this? How would you feel if I tried to turn the cost fuzzing portion into a roachtest?

@jordanlewis
Copy link
Member Author

I've been unable to find time to focus on this. I very much support you taking it over in whatever form you find useful!

@jordanlewis jordanlewis closed this Jul 1, 2022
@jordanlewis jordanlewis deleted the cost-fuzzer branch July 17, 2022 05:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants