-
Notifications
You must be signed in to change notification settings - Fork 106
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Distinguish between inner and semijoins in QueryExpr
AST.
#969
Conversation
This commit adds a flag `semi: bool` to `JoinExpr`, which signifies a semijoin, as opposed to an inner join. A new optimization pass, `QueryExpr::try_semi_join`, is defined which can detect a certain common case of inner joins and rewrite them into semijoins. The punchline here is that `core::vm::join_inner` used to accept a flag `semi: bool` which it could use to avoid some expensive `Header` mutations, but that flag was always passed as `false` because we had no way to distinguish semijoins. With this commit, the flag is actually used, so evaluating non-indexed semijoins should avoid allocating a new `Header`.
let mut sources = SourceSet::default(); | ||
let rhs_source_expr = sources.add_mem_table(data); | ||
|
||
let q = query(&schema).with_join_inner(rhs_source_expr, FieldName::positional("inventory", 0), rhs, true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I would prefer a slightly higher level test that goes through the entire optimizer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As you pointed out, we already have test coverage of this part, so these are redundant. Feel free to remove.
let source_expr = sources.add_mem_table(table.clone()); | ||
let second_source_expr = sources.add_mem_table(table); | ||
|
||
let q = query(source_expr).with_join_inner(second_source_expr, field.clone(), field, true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, I think I would prefer a higher level test that goes from sql to result set. You can use RelationalDB::create_table_for_test
along with sql::execute::run
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just want to request one more test case. That we correctly transform an IndexJoin between delta tables to the corresponding semijoin.
Correction: we already have test coverage for this. If you can update the other tests, this should be good to merge.
crates/vm/src/expr.rs
Outdated
|
||
let q = QueryExpr { | ||
source: lhs_source, | ||
// Build the query manually, because `.with_select` will attempt to push selections before the join. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is unfortunate. All of this should be included as part of optimize
. But of course this is a preexisting issue.
But I should add that this change does improve the performance of incremental join exactly as expected.
|
- Remove a test that was silly and backwards, and intentionally thwarted the optimizer in a way that will hopefully stop working soon. - Add a test that an `IncrementalJoin`'s `virtual_plan` looks like we expect. - Rename the `JoinExpr` argument to `core::vm::join_inner` for clarity. - Sprinkle comments around about how we compile and optimize joins.
Description of Changes
This commit adds a flag
semi: bool
toJoinExpr
, which signifies a semijoin, as opposed to an inner join.A new optimization pass,
QueryExpr::try_semi_join
, is defined which can detect a certain common case of inner joins and rewrite them into semijoins.The punchline here is that
core::vm::join_inner
used to accept a flagsemi: bool
which it could use to avoid some expensiveHeader
mutations, but that flag was always passed asfalse
because we had no way to distinguish semijoins. With this commit, the flag is actually used,so evaluating non-indexed semijoins should avoid allocating a new
Header
.API and ABI breaking changes
N/a
Expected complexity level and risk
3 - our query planner and evaluator depend strongly (more strongly than they should) on specific parses and representations in some places. I believe I found all such places and changed them, but am not confident.