-
Notifications
You must be signed in to change notification settings - Fork 2.5k
Issue #3207: ASOF JOIN Compilation #6719
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Checkpoint commit with the ASOF grammar additions.
Checkpoint commit.
Implement ASOF joins using IEJoin + Window.
Fix and test equality clauses in ASOF Joins.
Add missing EnumSerializer::StringToEnum case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR! Looks good. Some comments:
Move the join projection logic into PhysicalProjectionOperator.
Fix tidy include.
* Decouple the JoinType from the JoinRefType. * Convert ASOF to be a JoinRefType. * Attach a JoinRefType to LogicalConditionalJoin to indicate how the conditions should be interpreted * Stop the optimiser from thinking it knows how to deal with ASOF joins.
Regenerate plans file.
Fix copy elision.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the changes! Looks great. Some additional comments:
| LEAD(begin, 1, 'infinity'::DOUBLE) OVER (ORDER BY begin ASC) AS end | ||
| FROM events0 | ||
| ) e | ||
| ON p.ts >= e.begin AND p.ts < e.end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we add some extra tests:
- Join on a timestamp range
- Join on a string range (it is fine if that throws a non-internal exception)
- Join on integer ranges (it is fine if that throws a non-internal exception)
- Multiple ranges in one join condition
- What if we invert the comparisons (e.g.
e.begin <= p.ts AND e.end > p.ts) - What if we have a different closed range (e.g.
p.ts > e.begin AND p.ts <= e.end) - What if we have a range that is not actually a range (e.g.
p.ts >= e.begin and p.other_col < e.end) - ASOF join on a constant (e.g.
ON 1=1) - Could we add some tests where we have NULL values in the tables?
- Could we add some tests where we have infinity values (e.g.
NaN, orinfinity::timestamp) in the tables? - What happens if we use an ASOF join inside of a correlated subquery? (it is fine if that throws an exception)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Multiple ranges was covered by the last error test, and you can't have invalid ranges because the bounds are implied (i.e., end is synthesised). But I'll have a go at the rest.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got all of these except the last one. How would you write an ASOF join inside a correlated subquery? Aren't the joins in correlated subqueries generated by the planner?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I meant an ASOF join inside a subquery that has correlated columns in the left and/or right side, see e.g. here
Switch to using a separate LogicalOperator to model AsOf joins.
Beef up the test suite with multiple data types and corner cases. Fix arbitrary expression planning issue.
Add test to show it works.
|
Thanks for the fixes! Looks great. |
Part 1 of the ASOF JOIN implementation.
This PR adds ASOF to the grammar and converts it into an inequality join on a window:
where the left join is implemented using the
IEJoinoperator.This is not very efficient (it does three sorts), and part 2 will add a new physical operator that uses only one partitioned sort and the outer partitioning from hash join. But it will hopefully make it easier to specify these joins in a future-proof manner.