-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PhysicalStreamingWindow operator #2792
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR! Looks great. @hawkfish maybe you want to have a look as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great - just a few small nits.
Thanks for the feedback! Hopefully good to go now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we tweak the operator ordering?
@hawkfish I've tweaked the operator ordering. Thanks for the feedback. I was wondering what you thought about this:
Is this something we want? Or would we rather have a non-streaming and a streaming window for these queries? |
You mean if they have the same partitioning and ordering? Like maybe SELECT last_value(i) OVER(), avg(i) OVER() FROM table; ? I suspect it makes sense to divide them so that the streaming goes last and we don't have to materialize the streamable columns. But since they will be constants (or sequences?) maybe that is not such a big deal. |
@hawkfish Thanks, that makes sense. I've separated them. |
@Mytherin Should be good to go now. I believe the CI failures are unrelated. |
Thanks! Looks good. |
This PR implements a streaming variant of the PhysicalWindow operator, which is selected when we have a window function, and the following is true:
OVER
clause is emptyIGNORE NULLS
clausefirst_value
,percent_rank
,rank
,rank_dense
,row_number
}The operator does not work in parallel (but is now deterministic!), but does give a nice speedup for these specific cases because it does not materialize anything.
Quick benchmark on TPC-H SF1: