-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
release-19.2: opt: adjust cost of lookup join on non-key columns #43059
release-19.2: opt: adjust cost of lookup join on non-key columns #43059
Conversation
We currently determine if the lookup columns of a join form a key in the execbuilder. This change moves this logic in the code that creates lookup joins, which will allow costing to take this flag into account. Release note: None
If the lookup join columns form a lax key, we can consider that a key because NULLs don't equal anything and are just discarded. Release note: None
In 19.2, there was a change in lookup join execution: unless we know that each lookup returns at most one row (i.e. the lookup columns are a key), we now limit the number of results in each batch to avoid OOM conditions. This disables parallelization of spans going to different ranges, resulting in slower performance. I ran some experiments on a 4 node cluster which showed a ~5x degradation when the lookups are random; repro details [here](https://gist.github.com/RaduBerinde/9228dd1feea6d39ce5c80ee6a6485079). This change attempts to adjust the optimizer costing to account for this degradation, by multiplying the cost of each lookup by 5 in this case. Four TPCH queries change. I ran them before and after and these are the results (median time after a couple of runs on a 4 node cluster): before after Q2 1.52s 2.16s Q8 40.42s 37.79s Q9 777s 461s Q18 9.16 11.80 The improvement on Q9 (by far the slowest query) is significant. I experimented with smaller factors to see if I could get only Q8 and Q9 to change but that wasn't possible. Release note (performance improvement): Adjusted the optimizer's cost of lookup join when the lookup columns aren't a key in the table. This will cause some queries to switch to using a hash or merge join instead of a lookup join, improving performance in most cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 37 of 37 files at r1, 4 of 4 files at r2, 17 of 17 files at r3.
Reviewable status: complete! 1 of 0 LGTMs obtained
In which cases could the chosen plan regress? Could you give an example? |
It is a very broad question.. whenever the cost of the lookup join was close enough to the next best method. Depending on how accurate our stats are, cluster configuration, distribution of lookup keys etc, the other plan might or might not be faster. If you want concrete examples, TPCH Q2 and Q18 :) |
Haha. Okay. I think this is okay to backport. I wonder if it's worth exposing a knob so people can revert to the old behavior. |
I am a little unsure of backporting this. It is definitely the right thing to do (given the change in lookup join execution in 19.2), but it could cause regressions due to plan changes. Would love to know what @andy-kimball and @jordanlewis think.
Backport 3/3 commits from #43003.
/cc @cockroachdb/release
opt: add cols-are-key flag in the lookup join private
We currently determine if the lookup columns of a join form a key in
the execbuilder. This change moves this logic in the code that creates
lookup joins, which will allow costing to take this flag into account.
Release note: None
opt: allow lax key for lookup join cols-are-key
If the lookup join columns form a lax key, we can consider that a key
because NULLs don't equal anything and are just discarded.
Release note: None
opt: adjust cost of lookup join on non-key columns
In 19.2, there was a change in lookup join execution: unless we know
that each lookup returns at most one row (i.e. the lookup columns are
a key), we now limit the number of results in each batch to avoid OOM
conditions. This disables parallelization of spans going to different
ranges, resulting in slower performance. I ran some experiments on a 4
node cluster which showed a ~5x degradation when the lookups are
random; repro details here.
This change attempts to adjust the optimizer costing to account for
this degradation, by multiplying the cost of each lookup by 5 in this
case.
Four TPCH queries change. I ran them before and after and these are
the results (median time after a couple of runs on a 4 node cluster):
The improvement on Q9 (by far the slowest query) is significant. I
experimented with smaller factors to see if I could get only Q8 and Q9
to change but that wasn't possible.
Release note (performance improvement): Adjusted the optimizer's cost
of lookup join when the lookup columns aren't a key in the table. This
will cause some queries to switch to using a hash or merge join
instead of a lookup join, improving performance in most cases.