release-19.2: opt: adjust cost of lookup join on non-key columns #43059

RaduBerinde · 2019-12-09T20:23:11Z

I am a little unsure of backporting this. It is definitely the right thing to do (given the change in lookup join execution in 19.2), but it could cause regressions due to plan changes. Would love to know what @andy-kimball and @jordanlewis think.

Backport 3/3 commits from #43003.

/cc @cockroachdb/release

opt: add cols-are-key flag in the lookup join private

We currently determine if the lookup columns of a join form a key in
the execbuilder. This change moves this logic in the code that creates
lookup joins, which will allow costing to take this flag into account.

Release note: None

opt: allow lax key for lookup join cols-are-key

If the lookup join columns form a lax key, we can consider that a key
because NULLs don't equal anything and are just discarded.

Release note: None

opt: adjust cost of lookup join on non-key columns

In 19.2, there was a change in lookup join execution: unless we know
that each lookup returns at most one row (i.e. the lookup columns are
a key), we now limit the number of results in each batch to avoid OOM
conditions. This disables parallelization of spans going to different
ranges, resulting in slower performance. I ran some experiments on a 4
node cluster which showed a ~5x degradation when the lookups are
random; repro details here.

This change attempts to adjust the optimizer costing to account for
this degradation, by multiplying the cost of each lookup by 5 in this
case.

Four TPCH queries change. I ran them before and after and these are
the results (median time after a couple of runs on a 4 node cluster):

       before  after
  Q2   1.52s   2.16s
  Q8   40.42s  37.79s
  Q9   777s    461s
  Q18  9.16    11.80

The improvement on Q9 (by far the slowest query) is significant. I
experimented with smaller factors to see if I could get only Q8 and Q9
to change but that wasn't possible.

Release note (performance improvement): Adjusted the optimizer's cost
of lookup join when the lookup columns aren't a key in the table. This
will cause some queries to switch to using a hash or merge join
instead of a lookup join, improving performance in most cases.

We currently determine if the lookup columns of a join form a key in the execbuilder. This change moves this logic in the code that creates lookup joins, which will allow costing to take this flag into account. Release note: None

If the lookup join columns form a lax key, we can consider that a key because NULLs don't equal anything and are just discarded. Release note: None

In 19.2, there was a change in lookup join execution: unless we know that each lookup returns at most one row (i.e. the lookup columns are a key), we now limit the number of results in each batch to avoid OOM conditions. This disables parallelization of spans going to different ranges, resulting in slower performance. I ran some experiments on a 4 node cluster which showed a ~5x degradation when the lookups are random; repro details [here](https://gist.github.com/RaduBerinde/9228dd1feea6d39ce5c80ee6a6485079). This change attempts to adjust the optimizer costing to account for this degradation, by multiplying the cost of each lookup by 5 in this case. Four TPCH queries change. I ran them before and after and these are the results (median time after a couple of runs on a 4 node cluster): before after Q2 1.52s 2.16s Q8 40.42s 37.79s Q9 777s 461s Q18 9.16 11.80 The improvement on Q9 (by far the slowest query) is significant. I experimented with smaller factors to see if I could get only Q8 and Q9 to change but that wasn't possible. Release note (performance improvement): Adjusted the optimizer's cost of lookup join when the lookup columns aren't a key in the table. This will cause some queries to switch to using a hash or merge join instead of a lookup join, improving performance in most cases.

cockroach-teamcity · 2019-12-09T20:23:24Z

This change is

rytaft

Reviewed 37 of 37 files at r1, 4 of 4 files at r2, 17 of 17 files at r3.
Reviewable status: complete! 1 of 0 LGTMs obtained

jordanlewis · 2019-12-12T21:52:22Z

In which cases could the chosen plan regress? Could you give an example?

RaduBerinde · 2019-12-12T21:59:06Z

It is a very broad question.. whenever the cost of the lookup join was close enough to the next best method. Depending on how accurate our stats are, cluster configuration, distribution of lookup keys etc, the other plan might or might not be faster. If you want concrete examples, TPCH Q2 and Q18 :)

jordanlewis · 2019-12-12T22:06:17Z

Haha. Okay.

I think this is okay to backport. I wonder if it's worth exposing a knob so people can revert to the old behavior.

RaduBerinde added 3 commits December 9, 2019 14:51

opt: add cols-are-key flag in the lookup join private

c13cf23

We currently determine if the lookup columns of a join form a key in the execbuilder. This change moves this logic in the code that creates lookup joins, which will allow costing to take this flag into account. Release note: None

opt: allow lax key for lookup join cols-are-key

d534b19

If the lookup join columns form a lax key, we can consider that a key because NULLs don't equal anything and are just discarded. Release note: None

RaduBerinde requested a review from rytaft December 9, 2019 20:23

RaduBerinde requested a review from a team as a code owner December 9, 2019 20:23

rytaft approved these changes Dec 12, 2019

View reviewed changes

RaduBerinde merged commit dd838bc into cockroachdb:release-19.2 Dec 17, 2019

RaduBerinde deleted the backport19.2-43003 branch December 17, 2019 02:08

RaduBerinde mentioned this pull request Jan 17, 2020

opt: query taking over 400s after upgrade #43958

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

release-19.2: opt: adjust cost of lookup join on non-key columns #43059

release-19.2: opt: adjust cost of lookup join on non-key columns #43059

RaduBerinde commented Dec 9, 2019

cockroach-teamcity commented Dec 9, 2019

rytaft left a comment

jordanlewis commented Dec 12, 2019

RaduBerinde commented Dec 12, 2019

jordanlewis commented Dec 12, 2019

release-19.2: opt: adjust cost of lookup join on non-key columns #43059

release-19.2: opt: adjust cost of lookup join on non-key columns #43059

Conversation

RaduBerinde commented Dec 9, 2019

opt: add cols-are-key flag in the lookup join private

opt: allow lax key for lookup join cols-are-key

opt: adjust cost of lookup join on non-key columns

cockroach-teamcity commented Dec 9, 2019

rytaft left a comment

Choose a reason for hiding this comment

jordanlewis commented Dec 12, 2019

RaduBerinde commented Dec 12, 2019

jordanlewis commented Dec 12, 2019