New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimizer that removes redundant DELIM_GET and DELIM_JOIN operators #1296
Conversation
…r duplicate eliminated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR! It looks great, and very nice results. I have one minor piece of feedback, otherwise I think it can be merged already.
src/optimizer/deliminator.cpp
Outdated
for (auto &child : op->children) { | ||
FindCandidates(&child, candidates); | ||
} | ||
if ( // Projection/Aggregate |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer to break up this huge if statement into several blocks using early outs, e.g.
// search for a projection or aggregate
if (op->type != LogicalOperatorType::LOGICAL_PROJECTION &&
op->type != LogicalOperatorType::LOGICAL_AGGREGATE_AND_GROUP_BY) {
return;
}
// followed by a join
if (op->children[0]->type != LogicalOperatorType::LOGICAL_COMPARISON_JOIN) {
return;
}
...
This keeps the same logic but is more readable than a ton of nested conditions (in my opinion).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. I split it up and it is much more readable now
Thanks, looks great now! |
This PR implements an optimizer which I have affectionately called the Deliminator, an optimizer that removes redundant operators related to correlated subqueries.
When a query containing a correlated subquery is issued, a dependent join is created. Consider the following query:
DuckDB has implemented a way of generically flattening these queries, following Unnesting Arbitrary Queries by Thomas Neumann and Alfons Kemper, to avoid quadratic complexity. This is done by creating a duplicate-eliminated join, or delim join, and pushing it down the query plan to decorrelate it.
This process introduces delim scans, which are usually joined (or in a cross product) with another part of the query plan. Under specific circumstances, joining with a delim scan does not introduce new information, and this can be removed. If all delim scans belonging to a delim join can be removed, the delim join can be transformed into a comparison join instead.
This is more efficient for two reasons:
For the example query this means that the original query plan
Can be simplified to:
The original projection can also be removed in this specific case, but in other cases the projection may project
+(i, 3)
, in which case it is still necessary.This will improve the performance of the FTS extension, and some TPC-H queries.
I am happy to receive any feedback on this PR!