Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Support Binary bitwise shift operators (<< and >>) #3037

Merged
merged 4 commits into from
Aug 15, 2022

Conversation

ovr
Copy link
Contributor

@ovr ovr commented Aug 4, 2022

Which issue does this PR close?

This PR introduces support for binary bitwise shift operators like >> and <<.

Refs #1619

Are there any user-facing changes?

It's new functionality, and there are no breaking changes.

Thanks!

@github-actions github-actions bot added core Core datafusion crate logical-expr Logical plan and expressions physical-expr Physical Expressions sql labels Aug 4, 2022
@ovr ovr force-pushed the binary-bitwise-shifts-pr branch 2 times, most recently from a5a06ff to 6449570 Compare August 4, 2022 17:29
@ovr
Copy link
Contributor Author

ovr commented Aug 4, 2022

Right now, DF uses Generic Dialect for sqlparser, but it's allowed only for PostgreSqlDialect

Token::ShiftLeft if dialect_of!(self is PostgreSqlDialect) => {
    Some(BinaryOperator::PGBitwiseShiftLeft)
}
Token::ShiftRight if dialect_of!(self is PostgreSqlDialect) => {
    Some(BinaryOperator::PGBitwiseShiftRight)
}

@alamb what should I do with it? Should I Introduce new methods for Dialect to allow these operators with Generic dialect?

Thanks

@alamb
Copy link
Contributor

alamb commented Aug 4, 2022

@alamb what should I do with it? Should I Introduce new methods for Dialect to allow these operators with Generic dialect?

What about allowing BinaryOperator::PGBitwiseShiftRight in GenericDialect ? I think the GenericDialect in sqlparser is sort of a "anything goes" dialect so it should be fine to extend it to allow these operators.

I can make another sql-parser release soon if you need one as well

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked at the PR and I think the basic direction is very nice 👍 thanks @ovr

@@ -94,6 +94,50 @@ pub(crate) fn bitwise_and(left: ArrayRef, right: ArrayRef) -> Result<ArrayRef> {
}
}

pub(crate) fn bitwise_shift_right(left: ArrayRef, right: ArrayRef) -> Result<ArrayRef> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❤️

@liukun4515
Copy link
Contributor

@alamb what should I do with it? Should I Introduce new methods for Dialect to allow these operators with Generic dialect?

What about allowing BinaryOperator::PGBitwiseShiftRight in GenericDialect ? I think the GenericDialect in sqlparser is sort of a "anything goes" dialect so it should be fine to extend it to allow these operators.

I can make another sql-parser release soon if you need one as well

LGTM, if we can merge the sql-rs first

@ovr
Copy link
Contributor Author

ovr commented Aug 5, 2022

Looks likes I forget about overflowing here. I think I can use overflowing_shl to handle it correctly to the Error https://doc.rust-lang.org/std/primitive.i64.html#method.overflowing_shl

❯ select 2 << 1024;
thread 'main' panicked at 'attempt to shift left with overflow', datafusion/physical-expr/src/expressions/binary.rs:524:13
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

@ovr ovr force-pushed the binary-bitwise-shifts-pr branch from 4e1dfde to 36aff67 Compare August 5, 2022 14:46
@ovr
Copy link
Contributor Author

ovr commented Aug 5, 2022

Panic on overflow was fixed in 474ed5a

Without error to be similar with PostgreSQL

image

@alamb
Copy link
Contributor

alamb commented Aug 6, 2022

https://crates.io/crates/sqlparser/0.20.0 is released, FYI

@ovr
Copy link
Contributor Author

ovr commented Aug 8, 2022

Awaiting #3072

@alamb
Copy link
Contributor

alamb commented Aug 8, 2022

Awaiting #3072

Is merged 🎉

@@ -49,7 +50,7 @@ macro_rules! binary_bitwise_array_op {
/// like int64, int32.
/// It is used to do bitwise operation on an array with a scalar.
macro_rules! binary_bitwise_array_scalar {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've changed binary_bitwise_array_scalar to pass function as expr to do casting, because wrapping_shr and wrapping_shl requires u32. I tried to do it with let right: $TYPE = scalar.try_into().unwrap(); but I found that it will show a error because it's not possible to cast all numeric types to u32.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it makes sense -- thank you

@ovr ovr marked this pull request as ready for review August 11, 2022 20:59
@ovr ovr marked this pull request as draft August 11, 2022 21:14
@codecov-commenter
Copy link

codecov-commenter commented Aug 11, 2022

Codecov Report

Merging #3037 (810f4b0) into master (48f9b7a) will decrease coverage by 0.03%.
The diff coverage is 63.15%.

@@            Coverage Diff             @@
##           master    #3037      +/-   ##
==========================================
- Coverage   85.95%   85.91%   -0.04%     
==========================================
  Files         291      291              
  Lines       52382    52475      +93     
==========================================
+ Hits        45025    45084      +59     
- Misses       7357     7391      +34     
Impacted Files Coverage Δ
...on/physical-expr/src/expressions/binary/kernels.rs 52.17% <44.44%> (-10.99%) ⬇️
datafusion/physical-expr/src/expressions/binary.rs 97.71% <93.10%> (-0.17%) ⬇️
datafusion/core/tests/sql/expr.rs 99.84% <100.00%> (+<0.01%) ⬆️
datafusion/expr/src/binary_rule.rs 84.61% <100.00%> (ø)
datafusion/expr/src/operator.rs 96.00% <100.00%> (+0.16%) ⬆️
datafusion/sql/src/planner.rs 81.96% <100.00%> (+0.01%) ⬆️
datafusion/core/src/physical_plan/metrics/value.rs 86.93% <0.00%> (-0.51%) ⬇️
datafusion/expr/src/logical_plan/plan.rs 77.60% <0.00%> (-0.18%) ⬇️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is looking pretty good 👍

Please mark it as "ready for review" when it is ready.

Thanks !

@ovr ovr marked this pull request as ready for review August 12, 2022 12:32
@ovr
Copy link
Contributor Author

ovr commented Aug 12, 2022

Marked PR as ready for review ✅

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great to me @ovr - thank you for the contribution!

I had one tiny testing suggestion, but otherwise I think this PR is ready to go

let modules = Arc::new(Int32Array::from(vec![Some(100)])) as ArrayRef;
let result = bitwise_shift_left(input.clone(), modules.clone())?;

let expected = Int32Array::from(vec![Some(32)]);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

@liukun4515 liukun4515 Aug 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2<<100 =>> 2<<(100%bit_width(i32)) =>> 2 << 4?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i am surprised that this is rotational shifting

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i am surprised that this is rotational shifting

me too...

#[test]
fn bitwise_shift_array_test() -> Result<()> {
let input = Arc::new(Int32Array::from(vec![Some(2), None, Some(10)])) as ArrayRef;
let modules =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest a test for when the modules is Null (you cover NULL for the input already)

#[test]
fn bitwise_shift_scalar_test() -> Result<()> {
let input = Arc::new(Int32Array::from(vec![Some(2), None, Some(4)])) as ArrayRef;
let module = ScalarValue::from(10i32);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Likewise, here a test for null handling might be good

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of doing unit test here, I did a simple SQL test which test the whole pipeline, and I found that DF doesn't handle nulls for bitwise operators.

Fixed in:

810f4b0

@@ -49,7 +50,7 @@ macro_rules! binary_bitwise_array_op {
/// like int64, int32.
/// It is used to do bitwise operation on an array with a scalar.
macro_rules! binary_bitwise_array_scalar {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it makes sense -- thank you

@@ -139,12 +139,14 @@ pub fn coerce_types(
fn bitwise_coercion(left_type: &DataType, right_type: &DataType) -> Option<DataType> {
use arrow::datatypes::DataType::*;

if !is_numeric(left_type) || !is_numeric(right_type) {
if !both_numeric_or_null_and_numeric(left_type, right_type) {
Copy link
Contributor Author

@ovr ovr Aug 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image

To be similar with default SQL behaviour for null.

Co-authored-by: Jiayu Liu <Jimexist@users.noreply.github.com>
@andygrove
Copy link
Member

Can we update the user guide to show this new functionality? This can be a follow-on issue.

@ovr
Copy link
Contributor Author

ovr commented Aug 15, 2022

@andygrove

Can we update the user guide to show this new functionality? This can be a follow-on issue.

https://github.com/apache/arrow-datafusion/blob/master/docs/source/user-guide/sql/sql_status.md#supported-sql

I didn't find any place where I should update the documentation, as I can see there is a small comment about binary expressions, but it doesn't include all binary expressions.

Many mathematical unary and binary expressions such as +, /, sqrt, tan, >=.

I updated the description on this PR to reference #1619

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again @ovr

@alamb alamb merged commit ee55d89 into apache:master Aug 15, 2022
@ursabot
Copy link

ursabot commented Aug 15, 2022

Benchmark runs are scheduled for baseline = 5ddad47 and contender = ee55d89. ee55d89 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ec2-t3-xlarge-us-east-2] ec2-t3-xlarge-us-east-2
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on test-mac-arm] test-mac-arm
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-i9-9960x] ursa-i9-9960x
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-thinkcentre-m75q] ursa-thinkcentre-m75q
Buildkite builds:
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core datafusion crate logical-expr Logical plan and expressions physical-expr Physical Expressions sql
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants