Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimizer that removes redundant DELIM_GET and DELIM_JOIN operators #1296

Merged
merged 28 commits into from Jan 18, 2021

Conversation

lnkuiper
Copy link
Contributor

This PR implements an optimizer which I have affectionately called the Deliminator, an optimizer that removes redundant operators related to correlated subqueries.

When a query containing a correlated subquery is issued, a dependent join is created. Consider the following query:

EXPLAIN SELECT i=ANY(SELECT i FROM integers WHERE i=i1.i) FROM integers i1 ORDER BY i;

DuckDB has implemented a way of generically flattening these queries, following Unnesting Arbitrary Queries by Thomas Neumann and Alfons Kemper, to avoid quadratic complexity. This is done by creating a duplicate-eliminated join, or delim join, and pushing it down the query plan to decorrelate it.

This process introduces delim scans, which are usually joined (or in a cross product) with another part of the query plan. Under specific circumstances, joining with a delim scan does not introduce new information, and this can be removed. If all delim scans belonging to a delim join can be removed, the delim join can be transformed into a comparison join instead.

This is more efficient for two reasons:

  1. Joins with delim scans can be removed, so there is less work to be done
  2. Delim joins can be regular comparison joins instead, which are easier (and already) parallelised in DuckDB

For the example query this means that the original query plan

┌───────────────────────────┐                                                                                                                    
│         PROJECTION        │                                                                                                                    
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │                                                                                                                    
│             #0            │                                                                                                                    
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │                                                                                                                    
│             3             │                                                                                                                    
│          (0.00s)          │                                                                                                                    
└─────────────┬─────────────┘                                                                                                                                                 
┌─────────────┴─────────────┐                                                                                                                    
│          ORDER_BY         │                                                                                                                    
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │                                                                                                                    
│           #1 ASC          │                                                                                                                    
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │                                                                                                                    
│             3             │                                                                                                                    
│          (0.00s)          │                                                                                                                    
└─────────────┬─────────────┘                                                                                                                                                 
┌─────────────┴─────────────┐                                                                                                                    
│         PROJECTION        │                                                                                                                    
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │                                                                                                                    
│          SUBQUERY         │                                                                                                                    
│             i             │                                                                                                                    
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │                                                                                                                    
│             3             │                                                                                                                    
│          (0.00s)          │                                                                                                                    
└─────────────┬─────────────┘                                                                                                                                                 
┌─────────────┴─────────────┐                                                                                                                    
│         DELIM_JOIN        │                                                                                                                    
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │                                                                                                                    
│            MARK           │                                                                                                                    
│            i=i            │                                                                                                                    
│            i=#0           ├──────────────┐──────────────────────────────────────────────────────────────────────────────────────┐              
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │              │                                                                                      │              
│             3             │              │                                                                                      │              
│          (0.00s)          │              │                                                                                      │              
└─────────────┬─────────────┘              │                                                                                      │                                           
┌─────────────┴─────────────┐┌─────────────┴─────────────┐                                                          ┌─────────────┴─────────────┐
│          SEQ_SCAN         ││         HASH_JOIN         │                                                          │       HASH_GROUP_BY       │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   ││   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │                                                          │   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│          integers         ││            MARK           │                                                          │             #0            │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   ││            i=i            │                                                          │   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│             i             ││            i=#0           ├──────────────┐                                           │             0             │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   ││   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │              │                                           │          (0.00s)          │
│             3             ││             3             │              │                                           │                           │
│          (0.00s)          ││          (0.00s)          │              │                                           │                           │
└───────────────────────────┘└─────────────┬─────────────┘              │                                           └───────────────────────────┘                             
                             ┌─────────────┴─────────────┐┌─────────────┴─────────────┐                                                          
                             │         CHUNK_SCAN        ││         PROJECTION        │                                                          
                             │   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   ││   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │                                                          
                             │             3             ││             i             │                                                          
                             │          (0.00s)          ││             #0            │                                                          
                             │                           ││   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │                                                          
                             │                           ││             3             │                                                          
                             │                           ││          (0.00s)          │                                                          
                             └───────────────────────────┘└─────────────┬─────────────┘                                                                                       
                                                          ┌─────────────┴─────────────┐                                                          
                                                          │         HASH_JOIN         │                                                          
                                                          │   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │                                                          
                                                          │           INNER           │                                                          
                                                          │            i=i            ├──────────────┐                                           
                                                          │   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │              │                                           
                                                          │             3             │              │                                           
                                                          │          (0.00s)          │              │                                           
                                                          └─────────────┬─────────────┘              │                                                                        
                                                          ┌─────────────┴─────────────┐┌─────────────┴─────────────┐                             
                                                          │         DELIM_SCAN        ││          SEQ_SCAN         │                             
                                                          │   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   ││   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │                             
                                                          │             3             ││          integers         │                             
                                                          │          (0.00s)          ││   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │                             
                                                          │                           ││             i             │                             
                                                          │                           ││   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │                             
                                                          │                           ││             3             │                             
                                                          │                           ││          (0.00s)          │                             
                                                          └───────────────────────────┘└───────────────────────────┘                                                          

Can be simplified to:

┌───────────────────────────┐                             
│         PROJECTION        │                             
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │                             
│             #0            │                             
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │                             
│             3             │                             
│          (0.00s)          │                             
└─────────────┬─────────────┘                                                          
┌─────────────┴─────────────┐                             
│          ORDER_BY         │                             
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │                             
│           #1 ASC          │                             
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │                             
│             3             │                             
│          (0.00s)          │                             
└─────────────┬─────────────┘                                                          
┌─────────────┴─────────────┐                             
│         PROJECTION        │                             
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │                             
│          SUBQUERY         │                             
│             i             │                             
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │                             
│             3             │                             
│          (0.00s)          │                             
└─────────────┬─────────────┘                                                          
┌─────────────┴─────────────┐                             
│         HASH_JOIN         │                             
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │                             
│            MARK           │                             
│            i=i            │                             
│            i=#0           ├──────────────┐              
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │              │              
│             3             │              │              
│          (0.00s)          │              │              
└─────────────┬─────────────┘              │                                           
┌─────────────┴─────────────┐┌─────────────┴─────────────┐
│          SEQ_SCAN         ││         PROJECTION        │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   ││   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│          integers         ││             i             │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   ││             i             │
│             i             ││   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   ││             3             │
│             3             ││          (0.00s)          │
│          (0.00s)          ││                           │
└───────────────────────────┘└─────────────┬─────────────┘                             
                             ┌─────────────┴─────────────┐
                             │          SEQ_SCAN         │
                             │   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
                             │          integers         │
                             │   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
                             │             i             │
                             │   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │
                             │             3             │
                             │          (0.00s)          │
                             └───────────────────────────┘                             

The original projection can also be removed in this specific case, but in other cases the projection may project +(i, 3), in which case it is still necessary.

This will improve the performance of the FTS extension, and some TPC-H queries.

I am happy to receive any feedback on this PR!

Copy link
Collaborator

@Mytherin Mytherin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! It looks great, and very nice results. I have one minor piece of feedback, otherwise I think it can be merged already.

for (auto &child : op->children) {
FindCandidates(&child, candidates);
}
if ( // Projection/Aggregate
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to break up this huge if statement into several blocks using early outs, e.g.

// search for a projection or aggregate
if (op->type != LogicalOperatorType::LOGICAL_PROJECTION &&
    op->type != LogicalOperatorType::LOGICAL_AGGREGATE_AND_GROUP_BY) {
    return;
}
// followed by a join
if (op->children[0]->type != LogicalOperatorType::LOGICAL_COMPARISON_JOIN) {
    return;
}
...

This keeps the same logic but is more readable than a ton of nested conditions (in my opinion).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. I split it up and it is much more readable now

@Mytherin
Copy link
Collaborator

Thanks, looks great now!

@Mytherin Mytherin merged commit 483eba5 into duckdb:master Jan 18, 2021
@lnkuiper lnkuiper deleted the deliminator branch January 18, 2021 09:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants