Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PhysicalStreamingWindow operator #2792

Merged
merged 10 commits into from
Dec 16, 2021
Merged

Conversation

lnkuiper
Copy link
Contributor

This PR implements a streaming variant of the PhysicalWindow operator, which is selected when we have a window function, and the following is true:

  1. The OVER clause is empty
  2. There is no IGNORE NULLS clause
  3. The window function is one of {first_value, percent_rank, rank, rank_dense, row_number}

The operator does not work in parallel (but is now deterministic!), but does give a nice speedup for these specific cases because it does not materialize anything.

Quick benchmark on TPC-H SF1:

select sum(fv) from (select first_value(l_orderkey) over () as fv from lineitem) sq;
Old New
0.116 0.008

Copy link
Collaborator

@Mytherin Mytherin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! Looks great. @hawkfish maybe you want to have a look as well?

Copy link
Contributor

@hawkfish hawkfish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great - just a few small nits.

src/execution/physical_plan/plan_window.cpp Outdated Show resolved Hide resolved
test/sql/window/test_streaming_window.test Show resolved Hide resolved
@lnkuiper
Copy link
Contributor Author

Thanks for the feedback! Hopefully good to go now.

Copy link
Contributor

@hawkfish hawkfish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we tweak the operator ordering?

@lnkuiper
Copy link
Contributor Author

@hawkfish I've tweaked the operator ordering. Thanks for the feedback.

I was wondering what you thought about this:

# When we combine a streaming and non-streaming window function over the same window,
# We get a non-streaming window operator

Is this something we want? Or would we rather have a non-streaming and a streaming window for these queries?

@hawkfish
Copy link
Contributor

hawkfish commented Dec 15, 2021

@hawkfish I've tweaked the operator ordering. Thanks for the feedback.

I was wondering what you thought about this:

When we combine a streaming and non-streaming window function over the same window,
We get a non-streaming window operator

Is this something we want? Or would we rather have a non-streaming and a streaming window for these queries?

You mean if they have the same partitioning and ordering? Like maybe

SELECT last_value(i) OVER(), avg(i) OVER() FROM table;

?

I suspect it makes sense to divide them so that the streaming goes last and we don't have to materialize the streamable columns. But since they will be constants (or sequences?) maybe that is not such a big deal.

@lnkuiper
Copy link
Contributor Author

@hawkfish Thanks, that makes sense. I've separated them.

@lnkuiper
Copy link
Contributor Author

@Mytherin Should be good to go now. I believe the CI failures are unrelated.

@Mytherin
Copy link
Collaborator

Thanks! Looks good.

@Mytherin Mytherin merged commit 000fbff into duckdb:master Dec 16, 2021
@lnkuiper lnkuiper deleted the streamingwindow branch December 16, 2021 14:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants