Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: error instead of panic when date_bin interval is 0 #6522

Merged
merged 3 commits into from
Jun 2, 2023
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 19 additions & 2 deletions datafusion/core/tests/sqllogictests/test_files/timestamps.slt
Original file line number Diff line number Diff line change
Expand Up @@ -392,11 +392,28 @@ drop table ts_data_millis
statement ok
drop table ts_data_secs



##########
## test date_bin function
##########

# not support interval 0
statement error Execution error: DATE_BIN stride must be non-zero
SELECT DATE_BIN(INTERVAL '0 second', TIMESTAMP '2022-08-03 14:38:50.000000006Z', TIMESTAMP '1970-01-01T00:00:00Z')

statement error Execution error: DATE_BIN stride must be non-zero
SELECT DATE_BIN(INTERVAL '0 month', TIMESTAMP '2022-08-03 14:38:50.000000006Z')

statement error Execution error: DATE_BIN stride must be non-zero
SELECT
DATE_BIN(INTERVAL '0' minute, time) AS time,
count(val)
FROM (
VALUES
(TIMESTAMP '2021-06-10 17:05:00Z', 0.5),
(TIMESTAMP '2021-06-10 17:19:10Z', 0.3)
) as t (time, val)
group by time;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remember @alamb want just right number of tests, I think these 3 tests are good to capture 2 different forms of stride on constant (second --> nanosecond in the code, and month -> month) and an aggregate on data.


query P
SELECT DATE_BIN(INTERVAL '15 minutes', TIMESTAMP '2022-08-03 14:38:50Z', TIMESTAMP '1970-01-01T00:00:00Z')
----
Expand Down
18 changes: 18 additions & 0 deletions datafusion/physical-expr/src/datetime_expressions.rs
Original file line number Diff line number Diff line change
Expand Up @@ -522,6 +522,13 @@ fn date_bin_impl(

let (stride, stride_fn) = stride.bin_fn();

// Return error if stride is 0
if stride == 0 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error says the stride must be positive and non zero, but this check is only for 0

It seems like the main branch supports negative intervals:

❯ SELECT DATE_BIN(INTERVAL '-15 minutes', TIMESTAMP '2022-08-03 14:38:50Z', TIMESTAMP '1970-01-01T00:00:00Z');
+-----------------------------------------------------------------------------------------------------------------+
| datebin(IntervalMonthDayNano("18446743173709551616"),Utf8("2022-08-03 14:38:50Z"),Utf8("1970-01-01T00:00:00Z")) |
+-----------------------------------------------------------------------------------------------------------------+
| 2022-08-03T14:30:00                                                                                             |
+-----------------------------------------------------------------------------------------------------------------+
1 row in set. Query took 0.001 seconds.

Thus, given what @comphead found in #6522 (review)

I think we should either:

  1. Update the error to say that the DATE_BIN stride must be non zero
  2. Update the code to match the error (prevent negative strides) and then add test coverage for negative intervals

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since I do not know if the negative stride is therefor a specific purpose and someone may use it, I chose not to change its behavior. I go with the option 1 to only fix the panic and have change the error message to "DATE_BIN stride must be non-zero"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @comphead and @alamb, I have addressed your comments.

@comphead : we supported interval month and year interval in this PR

Thanks I'll check that, PG doesn't support strides as months and larger, need to check what is the reason for that, as we consider PG behavior as an example

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using months as strides is significantly more complicated because they are not a fixed size (sometimes 28 days, sometimes 29, 30 or 31, etc)

Binning by months / years was important to us in IOx so @NGA-TRAN implemented that logic for #5689

return Err(DataFusionError::Execution(
"DATE_BIN stride must be non-zero".to_string(),
));
}

let f_nanos = |x: Option<i64>| x.map(|x| stride_fn(stride, x, origin));
let f_micros = |x: Option<i64>| {
let scale = 1_000;
Expand Down Expand Up @@ -1029,6 +1036,17 @@ mod tests {
"Execution error: DATE_BIN expects stride argument to be an INTERVAL but got Interval(YearMonth)"
);

// stride: invalid value
let res = date_bin(&[
ColumnarValue::Scalar(ScalarValue::IntervalDayTime(Some(0))),
ColumnarValue::Scalar(ScalarValue::TimestampNanosecond(Some(1), None)),
ColumnarValue::Scalar(ScalarValue::TimestampNanosecond(Some(1), None)),
]);
assert_eq!(
res.err().unwrap().to_string(),
"Execution error: DATE_BIN stride must be non-zero"
);

// stride: overflow of day-time interval
let res = date_bin(&[
ColumnarValue::Scalar(ScalarValue::IntervalDayTime(Some(i64::MAX))),
Expand Down