Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: error instead of panic when date_bin interval is 0 #6522

Merged
merged 3 commits into from
Jun 2, 2023

Conversation

NGA-TRAN
Copy link
Contributor

@NGA-TRAN NGA-TRAN commented Jun 1, 2023

Which issue does this PR close?

Closes ##6517

Rationale for this change

Instead of panic when the stride of date_bin is 0, let us just make it an error message

What changes are included in this PR?

Check if the stride is 0 and error out

Are these changes tested?

Yes

Are there any user-facing changes?

User will see a proper error message Execution error: DATE_BIN stride must be non-zero and positive instead of
thread 'main' panicked at 'attempt to calculate the remainder with a divisor of zero', /arrow-datafusion/datafusion/physical-expr/src/datetime_expressions.rs:354:34 note: run with RUST_BACKTRACE=1 environment variable to display a backtrace

@github-actions github-actions bot added core Core DataFusion crate physical-expr Physical Expressions sqllogictest SQL Logic Tests (.slt) labels Jun 1, 2023
(TIMESTAMP '2021-06-10 17:05:00Z', 0.5),
(TIMESTAMP '2021-06-10 17:19:10Z', 0.3)
) as t (time, val)
group by time;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remember @alamb want just right number of tests, I think these 3 tests are good to capture 2 different forms of stride on constant (second --> nanosecond in the code, and month -> month) and an aggregate on data.

@NGA-TRAN
Copy link
Contributor Author

NGA-TRAN commented Jun 1, 2023

Clippy error on things I did not touch. I am happy to change them but I am not sure if I am supposed to do so. Please let me know.
My ran clippy locally and it passed

@comphead
Copy link
Contributor

comphead commented Jun 1, 2023

Please try to update rust and clippy

rustup update
cargo update

Copy link
Contributor

@comphead comphead left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw in https://www.postgresql.org/docs/current/functions-datetime.html#FUNCTIONS-DATETIME-BIN

The stride interval must be greater than zero and cannot contain units of month or larger.
This is not part of this PR, but we may want consider this as well

@alamb
Copy link
Contributor

alamb commented Jun 2, 2023

Clippy errors are related to #6529 -- I'll have a fix shortly

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the contribution @NGA-TRAN

@@ -522,6 +522,13 @@ fn date_bin_impl(

let (stride, stride_fn) = stride.bin_fn();

// Return error if stride is 0
if stride == 0 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error says the stride must be positive and non zero, but this check is only for 0

It seems like the main branch supports negative intervals:

❯ SELECT DATE_BIN(INTERVAL '-15 minutes', TIMESTAMP '2022-08-03 14:38:50Z', TIMESTAMP '1970-01-01T00:00:00Z');
+-----------------------------------------------------------------------------------------------------------------+
| datebin(IntervalMonthDayNano("18446743173709551616"),Utf8("2022-08-03 14:38:50Z"),Utf8("1970-01-01T00:00:00Z")) |
+-----------------------------------------------------------------------------------------------------------------+
| 2022-08-03T14:30:00                                                                                             |
+-----------------------------------------------------------------------------------------------------------------+
1 row in set. Query took 0.001 seconds.

Thus, given what @comphead found in #6522 (review)

I think we should either:

  1. Update the error to say that the DATE_BIN stride must be non zero
  2. Update the code to match the error (prevent negative strides) and then add test coverage for negative intervals

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since I do not know if the negative stride is therefor a specific purpose and someone may use it, I chose not to change its behavior. I go with the option 1 to only fix the panic and have change the error message to "DATE_BIN stride must be non-zero"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @comphead and @alamb, I have addressed your comments.

@comphead : we supported interval month and year interval in this PR

Thanks I'll check that, PG doesn't support strides as months and larger, need to check what is the reason for that, as we consider PG behavior as an example

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using months as strides is significantly more complicated because they are not a fixed size (sometimes 28 days, sometimes 29, 30 or 31, etc)

Binning by months / years was important to us in IOx so @NGA-TRAN implemented that logic for #5689

Copy link
Contributor Author

@NGA-TRAN NGA-TRAN left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @comphead and @alamb, I have addressed your comments.

@comphead : we supported interval month and year interval in this PR

@@ -522,6 +522,13 @@ fn date_bin_impl(

let (stride, stride_fn) = stride.bin_fn();

// Return error if stride is 0
if stride == 0 {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since I do not know if the negative stride is therefor a specific purpose and someone may use it, I chose not to change its behavior. I go with the option 1 to only fix the panic and have change the error message to "DATE_BIN stride must be non-zero"

@NGA-TRAN NGA-TRAN requested review from comphead and alamb June 2, 2023 17:24
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @NGA-TRAN and @comphead -- makes sense to me!

@alamb
Copy link
Contributor

alamb commented Jun 2, 2023

I took the liberty of merging main to this PR to get the clippy fixes and get a clean CI run

@alamb
Copy link
Contributor

alamb commented Jun 2, 2023

The doc CI failure appears unrelated (it is an infrastructure looking thins): https://github.com/apache/arrow-datafusion/actions/runs/5158371209/jobs/9292110237?pr=6522

@alamb alamb merged commit 859251b into apache:main Jun 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate physical-expr Physical Expressions sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants