Skip to content

Conversation

@izveigor
Copy link
Contributor

@izveigor izveigor commented Jun 7, 2023

Which issue does this PR close?

Follow on to #6276

Rationale for this change

We can use BIT_AND(DISTINCT), BIT_OR(DISTINCT), BOOL_AND(DISTINCT) and BOOL_OR(DISTINCT) but it won't have any effect, so the mechanism of action remains the same. Another case is BIT_XOR(DISTINCT), for this function the results with the default version will be different.

For example (where a = [4, 7, 4, 7, 15]):

select bit_xor(a);
----
15

select bit_xor(distinct a);
----
12

What changes are included in this PR?

All bitwise and boolean aggregate functions start to support DISTINCT cases.

Are these changes tested?

Yes

Are there any user-facing changes?

Yes

@github-actions github-actions bot added core Core DataFusion crate physical-expr Changes to the physical-expr crates sqllogictest SQL Logic Tests (.slt) labels Jun 7, 2023
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me @izveigor -- thank you ❤️ The only thing I think this PR needs is basic test SQL level coverage for distinct of the other bitwise aggregates

----
1632751011 5960911605712039654 148 54789 169634700

# csv_query_bit_xor_distinct
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# csv_query_bit_xor_distinct
# csv_query_bit_xor_distinct (should be different than above)

SELECT bit_xor(distinct c5 % 2) FROM aggregate_test_100
----
-2

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

even though, as you note,BIT_AND(DISTINCT), BIT_OR(DISTINCT), BOOL_AND(DISTINCT) and BOOL_OR(DISTINCT) are the same as their non distinct versions, I think we should still add SQL coverage so that we don't accidentally break the feature during some future refactoring


/// Expression for a BIT_XOR(DISTINCT) aggregation.
#[derive(Debug, Clone)]
pub struct DistinctBitXor {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it is possible to avoid duplicated code for distinct aggregates -- they are all basically doing the same thing 🤔

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 thank you @izveigor

@alamb alamb merged commit dbd92e6 into apache:main Jun 9, 2023
jayzhan211 pushed a commit to jayzhan211/datafusion that referenced this pull request Jun 12, 2023
* feat: distinct bitwise and boolean aggregate functions

* feat: add distinct sqllogictests for other bitwise and boolean aggregate functions

* feat: improve docs for csv_query_bit_xor_distinct
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Core DataFusion crate physical-expr Changes to the physical-expr crates sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants