Skip to content

Add SUM(BOOL) overload #15042

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Nov 30, 2024
Merged

Add SUM(BOOL) overload #15042

merged 2 commits into from
Nov 30, 2024

Conversation

Mytherin
Copy link
Collaborator

This is a useful overload that allows the user to quickly count the amount of matches for a boolean condition, it is essentially syntactic sugar for:

SELECT SUM(CASE WHEN [cond] THEN 1 END) FROM tbl

As an added bonus, it is also slightly faster than the above expression:

SELECT SUM(l_extendedprice > 500) FROM lineitem;
-- 0.020s
SELECT SUM(CASE WHEN l_extendedprice > 500 THEN 1 END) FROM lineitem;
-- 0.028s

@Mytherin Mytherin added the Needs Documentation Use for issues or PRs that require changes in the documentation label Nov 30, 2024
@carlopi
Copy link
Contributor

carlopi commented Nov 30, 2024

I guess this becomes equivalent to:

SELECT count(l_extendedprice > 500) FROM lineitem;

but nice to provide the overload.

@Mytherin
Copy link
Collaborator Author

No, COUNT counts the number of non-NULL values, so count(l_extendedprice > 500) is equivalent to count(l_extendedprice) (which is equivalent to count(*) as lineitem does not have NULL values).

@carlopi
Copy link
Contributor

carlopi commented Nov 30, 2024

Thanks, again

@duckdb-draftbot duckdb-draftbot marked this pull request as draft November 30, 2024 10:24
@Mytherin Mytherin marked this pull request as ready for review November 30, 2024 10:24
@soerenwolfers
Copy link
Contributor

soerenwolfers commented Nov 30, 2024

Isn't

SELECT count() FILTER (l_extendedprice > 500) FROM lineitem;

the canonical way of doing this? It's also faster than the CASE WHEN expression for me, and while more verbose has the slight advantage that there is no chance to footgun oneself into thinking the aggregation happens in the boolean algebra, i.e., modulo 2. (That being said, I did write SUM(BOOL) in the expectation it act like in this PR before and was sad it didn't exist, so I'd welcome its addition).

@Mytherin
Copy link
Collaborator Author

That's another alternative yeah, but FILTER is supported in much fewer systems. It is around as fast as the CASE statement for me (i.e. still slightly slower than the SUM(BOOL) provided in this PR, but the difference is negligible since they are both fast).

@soerenwolfers
Copy link
Contributor

soerenwolfers commented Nov 30, 2024

By the way, the macro count_if could now be rewritten to be SUM(x::BOOL) see duckdb/duckdb-web#3294

@Mytherin Mytherin merged commit 440bdb6 into duckdb:main Nov 30, 2024
42 checks passed
@Mytherin
Copy link
Collaborator Author

Opened a PR for that here - #15061

Mytherin added a commit that referenced this pull request Dec 1, 2024
@Mytherin Mytherin deleted the sumbool branch December 8, 2024 06:51
krlmlr added a commit to duckdb/duckdb-r that referenced this pull request Dec 27, 2024
github-actions bot pushed a commit to duckdb/duckdb-r that referenced this pull request Dec 27, 2024
github-actions bot added a commit to duckdb/duckdb-r that referenced this pull request Dec 27, 2024
Add SUM(BOOL) overload (duckdb/duckdb#15042)

Co-authored-by: krlmlr <krlmlr@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Needs Documentation Use for issues or PRs that require changes in the documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants