Skip to content

Conversation

@izveigor
Copy link
Contributor

@izveigor izveigor commented May 7, 2023

Which issue does this PR close?

Closes #6275

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Yes

Are there any user-facing changes?

Yes

@izveigor izveigor marked this pull request as draft May 7, 2023 19:32
@github-actions github-actions bot added the physical-expr Changes to the physical-expr crates label May 7, 2023
@github-actions github-actions bot added core Core DataFusion crate logical-expr Logical plan and expressions sqllogictest SQL Logic Tests (.slt) labels May 8, 2023
@izveigor izveigor marked this pull request as ready for review May 9, 2023 19:44
@izveigor
Copy link
Contributor Author

izveigor commented May 9, 2023

@alamb, @tustvold, @mingmwang I wonder if you have time to review this PR.

@alamb
Copy link
Contributor

alamb commented May 9, 2023

Thanks @izveigor -- I will try and review this PR in the next day or two

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @izveigor -- this is looking good. I have gotten through this PR execept for the content of datafusion/physical-expr/src/aggregate/bit_and_or_xor.rs which hopefully I'll find time to work on tomorrow

('2021-01-01T05:11:10.432', 'Row 3');


statement ok
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given postgres also runs these queries what do you think about also adding them into a pg_compat test in https://github.com/apache/arrow-datafusion/tree/main/datafusion/core/tests/sqllogictests/test_files/pg_compat?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is a nice idea for bitwise aggregate functions.

query III
SELECT bit_and(c1), bit_and(c2), bit_and(c3) FROM bit_aggregate_functions
----
1 8 11
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I double checked this is consistent with postgres 👍

| approx_median(expr) | Calculates an approximation of the median for `expr`. |
| approx_percentile_cont(expr, percentile) | Calculates an approximation of the specified `percentile` for `expr`. |
| approx_percentile_cont_with_weight(expr, weight_expr, percentile) | Calculates an approximation of the specified `percentile` for `expr` and `weight_expr`. |
| bit_and(expr) | Computes the bitwise AND of all non-null input values for `expr`. |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for also updating the documentation

}

fn supports_bounded_execution(&self) -> bool {
true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure this aggregator actually supports bounded execution (that are used in sliding window functions) -- to do so I think it would need to remember how many tries it has seen perhaps.

@mustafasrepo or @ozankabak can you double check this?

Copy link
Contributor

@mustafasrepo mustafasrepo May 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it supports bounded execution either. To support bounded execution accumulator should implement retract_batch method. (Theoretically, I don't think there is a way to implement retract_batch for bitwise and , bitwise or. However, it can be implemented for bit wise xor though see discussion). Hence, I think we should not implement this method for and, or cases (it defaults to false). For xor we can support it, however, we may do these in followup PRs. As far as this Pr is concerned, deleting supports_bounded_execution implementation is enough I think.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also want to mention that this situation concerns relatively DISTINCT cases (only BIT_XOR(DISTINCT) makes sense, the rest of DISTINCT cases will produce the same results as without them)


// bool_and/bool_or of two scalar values.
macro_rules! typed_bool_and_or {
($VALUE:expr, $DELTA:expr, $SCALAR:ident, $OP:ident) => {{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that you added ScalarValue::bitand etc, maybe you could also add ScalarValue::and ? I think the overall code complexity would be the same but then the logic wouldn't be hidden in macros inside the aggregator

A similar comment applies to ScalarValue:or etc

use std::ops::BitAnd as BitAndImplementation;
use std::ops::BitOr as BitOrImplementation;

fn bool_and(array: &BooleanArray) -> Option<bool> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 these are good implementations

use std::ops::BitOr as BitOrImplementation;
use std::ops::BitXor as BitXorImplementation;

fn bit_and<T>(array: &PrimitiveArray<T>) -> Option<T::Native>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should strive to move these kernels into arrow-rs eventually.

Can you please file a ticket in arrow-rs to do so (or I can do to so if you prefer) and leave a comment here with a reference to that ticket?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @izveigor -- this looks like a great start to me and is quite close. I think the sliding accumulator stuff needs to be fixed prior to merge (as it would produce wrong results).

I also left some comments on how to make the code easier to maintain

Thanks again for this really nice contribution

None => {
let bit_and = data
.iter()
.fold(T::Native::ONE.neg_wrapping(), |accumulator, value| {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// bit_and/bit_or/bit_xor the array and returns a ScalarValue of its corresponding type.
macro_rules! bit_and_or_xor_batch {
($VALUES:expr, $OP:ident) => {{
match $VALUES.data_type() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thin you could probably use https://docs.rs/arrow/latest/arrow/macro.downcast_primitive_array.html to avoid some of this replication

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the usual way is more preferable (because other aggregate functions use this method). But, I think the idea can be continued in the separate PR for all aggregate functions.

@izveigor
Copy link
Contributor Author

@alamb I have updated the PR based on your comments.

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great -- thanks again @izveigor

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Core DataFusion crate logical-expr Logical plan and expressions physical-expr Changes to the physical-expr crates sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support bitwise and boolean aggregate functions

3 participants