Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve count if rewrite #44728

Merged
merged 8 commits into from
Feb 8, 2023
Merged

Improve count if rewrite #44728

merged 8 commits into from
Feb 8, 2023

Conversation

taiyang-li
Copy link
Contributor

Changelog category (leave one):

  • Improvement

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

sumIf(123, cond) -> 123 * countIf(1, cond) 
sum(if(cond, 123, 0)) -> 123 * countIf(cond)
sum(if(cond, 0, 123)) -> 123 * countIf(not(cond))

Information about CI checks: https://clickhouse.com/docs/en/development/continuous-integration/

@robot-clickhouse-ci-1 robot-clickhouse-ci-1 added the pr-improvement Pull request with some product improvements label Dec 30, 2022
@zhanglistar
Copy link
Contributor

How much improve?

@Algunenano
Copy link
Member

There are cases where this changes the result:

:) Select sumIf(NaN, number > 10) from numbers(10);

SELECT sumIf(nan, number > 10)
FROM numbers(10)

Query id: 3610ba57-4d00-41d7-b2ca-da2c31937b02

┌─sumIf(nan, greater(number, 10))─┐
│                               0 │
└─────────────────────────────────┘

1 row in set. Elapsed: 0.004 sec. 

:) Select NaN * countIf(number > 10) from numbers(10);

SELECT nan * countIf(number > 10)
FROM numbers(10)

Query id: b9158e73-bb1c-49e3-8e56-f555637c6d63

┌─multiply(nan, countIf(greater(number, 10)))─┐
│                                         nan │
└─────────────────────────────────────────────┘

1 row in set. Elapsed: 0.009 sec. 

@taiyang-li
Copy link
Contributor Author

How much improve?

Query: SELECT sum(if(number % 2 == 0, 123, 0)) FROM numbers(1000000000)
Before: 1 rows in set. Elapsed: 1.233 sec. Processed 1.00 billion rows, 8.00 GB (810.95 million rows/s., 6.49 GB/s.)
After: 1 rows in set. Elapsed: 1.076 sec. Processed 1.00 billion rows, 8.00 GB (929.22 million rows/s., 7.43 GB/s.)

@taiyang-li
Copy link
Contributor Author

@Algunenano Please notice that
Select sumIf(NaN, number > 10) from numbers(10); won't be rewritten to Select NaN * countIf(number > 10) from numbers(10);
Because NaN's type is not UInt64 or Int64.

@Avogar Avogar self-assigned this Jan 4, 2023
@Avogar
Copy link
Member

Avogar commented Jan 11, 2023

@taiyang-li please look at the Fuzzer failures, they look related to the changes

@taiyang-li
Copy link
Contributor Author

@taiyang-li please look at the Fuzzer failures, they look related to the changes

@Avogar Thanks for reminding. They are fixed now.

@Avogar
Copy link
Member

Avogar commented Jan 13, 2023

It will be great if you will also add the same implementation in the new analyzer. See src/Analyzer/Passes/SumIfToCountIfPass.cpp

@taiyang-li
Copy link
Contributor Author

It will be great if you will also add the same implementation in the new analyzer. See src/Analyzer/Passes/SumIfToCountIfPass.cpp

Ok. I'll finish it later.

@taiyang-li
Copy link
Contributor Author

It will be great if you will also add the same implementation in the new analyzer. See src/Analyzer/Passes/SumIfToCountIfPass.cpp

Done.

@taiyang-li
Copy link
Contributor Author

@Avogar Do you have time to reivew again? Thanks very much.

@Avogar
Copy link
Member

Avogar commented Feb 8, 2023

Sorry for long response, I was on vacation. let me review it now

@Avogar Avogar merged commit 8bdb1c3 into ClickHouse:master Feb 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pr-improvement Pull request with some product improvements
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants