Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve WindowTransform::updateAggregationState by addBatchSinglePlace #46321

Closed
wants to merge 3 commits into from

Conversation

lgbo-ustc
Copy link
Contributor

@lgbo-ustc lgbo-ustc commented Feb 13, 2023

Changelog category (leave one):

  • Performance Improvement

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Use addBatchSinglePlace instead of add in some cases to accelate WindowTransform::updateAggregationState

Test

insert into test_data select intDiv(number,16) as a, number % 16 as b from numbers(10000000);

-- Q1
select a, sum(b) over (partition by a) from test_data;

-- Q2
select a, sum(b) over (partition by a order by b) from test_data;

-- Q3
select a, sum(b) over (partition by a rows between unbounded preceding and current row) from test_data;

-- Q4
select a, sum(b) over (partition by a order by b rows between unbounded preceding and current row) from test_data;

-- Q5
select a, sum(b) over (partition by a rows between unbounded preceding and unbounded following) from test_data;

-- Q6
select a, sum(b) over (partition by a order by b rows between unbounded preceding and unbounded following) from test_data;

-- Q7
select a, sum(b) over (partition by a rows between unbounded preceding and 4 following) from test_data;

-- Q8
select a, sum(b) over (partition by a order by b rows between unbounded preceding and 4 following) from test_data;

-- Q9
select a, sum(b) over (partition by a rows between 4 preceding and 4 following) from test_data;

-- Q10
select a, sum(b) over (partition by a order by b rows between 4 preceding and 4 following) from test_data;

Result

query before after speedup
Q1 11.73 million rows/s., 117.31 MB/s 11.97 million rows/s., 119.73 MB/s. -
Q2 8.81 million rows/s., 88.12 MB/s 8.79 million rows/s., 87.93 MB/s -
Q3 11.74 million rows/s., 117.37 MB/s 11.75 million rows/s., 117.53 MB/s -
Q4 11.73 million rows/s., 117.28 MB/s 11.81 million rows/s., 118.09 MB/s -
Q5 12.35 million rows/s., 123.53 MB/s 13.55 million rows/s., 135.49 MB/s +9%
Q6 12.66 million rows/s., 126.55 MB/s 13.56 million rows/s., 135.57 MB/s. +9%
Q7 11.03 million rows/s., 110.32 MB/s. 11.13 million rows/s., 111.28 MB/s -
Q8 11.65 million rows/s., 116.53 MB/s 11.66 million rows/s., 116.65 MB/s -
Q9 7.40 million rows/s., 74.01 MB/s 8.31 million rows/s., 83.11 MB/s +12%
Q10 7.50 million rows/s., 74.98 MB/s 8.51 million rows/s., 85.10 MB/s +14%

Documentation entry for user-facing changes

  • Documentation is written (mandatory for new features)

Information about CI checks: https://clickhouse.com/docs/en/development/continuous-integration/

@lgbo-ustc lgbo-ustc marked this pull request as ready for review February 13, 2023 07:30
@robot-ch-test-poll2 robot-ch-test-poll2 added the pr-performance Pull request with some performance improvements label Feb 13, 2023
@lgbo-ustc lgbo-ustc changed the title Improve updateAggregationState by addBatchSinglePlace Improve WindowTransform::updateAggregationState by addBatchSinglePlace Feb 13, 2023
@antonio2368 antonio2368 added the can be tested Allows running workflows for external contributors label Feb 13, 2023
@novikd novikd self-assigned this Feb 13, 2023
@lgbo-ustc lgbo-ustc force-pushed the window_batch_add branch 2 times, most recently from c0dd92a to cbbdc96 Compare February 14, 2023 07:50
@novikd
Copy link
Member

novikd commented Feb 15, 2023

Please add performance tests: https://clickhouse.com/docs/en/development/tests#performance-tests

@lgbo-ustc lgbo-ustc force-pushed the window_batch_add branch 2 times, most recently from 8f4fcda to b9c4a78 Compare February 21, 2023 02:58
@lgbo-ustc
Copy link
Contributor Author

There is a failure in test_storage_rabbitmq/test.py::test_rabbitmq_drop_mv, seems not related to this pr.

@@ -131,6 +131,7 @@ class WindowTransform final : public IProcessor
void advanceFrameEnd();
void advanceFrameEndRangeOffset();

template<bool enable_batch_aggregate>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
template<bool enable_batch_aggregate>
template <bool enable_batch_aggregate>

@@ -1157,7 +1187,7 @@ void WindowTransform::appendChunk(Chunk & chunk)
// the frame boundaries, but it would require some care not to
// perform unnecessary work while we are still looking for the frame
// start, so do it the simple way for now.
updateAggregationState();
(this->*update_aggregate_state)();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it will be the same performance with if, because it will be predictable, but it will be cleaner.

Copy link
Member

@alexey-milovidov alexey-milovidov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's resolve conflicts and merge.

@alexey-milovidov
Copy link
Member

Merged in #56120.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
can be tested Allows running workflows for external contributors pr-performance Pull request with some performance improvements
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants