Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

date_bin cannot use a string origin parameter when the expression has a time zone. #7697

Closed
Tracked by #3148
mhilton opened this issue Sep 29, 2023 · 1 comment · Fixed by #7720
Closed
Tracked by #3148

date_bin cannot use a string origin parameter when the expression has a time zone. #7697

mhilton opened this issue Sep 29, 2023 · 1 comment · Fixed by #7720
Labels
bug Something isn't working

Comments

@mhilton
Copy link
Contributor

mhilton commented Sep 29, 2023

Describe the bug

The optional third parameter to date_bin, which is the origin time, can normally be specified with a time constant in a string. If the input expression has a type that includes a time zone then datafusion returns the following error:

Error during planning: No function matches the given name and argument types 'date_bin(Utf8, Timestamp(Nanosecond, Some("+00:00")), Utf8)'. You might need to add explicit type casts.
	Candidate functions:
	date_bin(Interval(MonthDayNano), Timestamp(Nanosecond, None), Timestamp(Nanosecond, None))
	date_bin(Interval(MonthDayNano), Timestamp(Nanosecond, Some("+TZ")), Timestamp(Nanosecond, Some("+TZ")))
	date_bin(Interval(DayTime), Timestamp(Nanosecond, None), Timestamp(Nanosecond, None))
	date_bin(Interval(DayTime), Timestamp(Nanosecond, Some("+TZ")), Timestamp(Nanosecond, Some("+TZ")))
	date_bin(Interval(MonthDayNano), Timestamp(Nanosecond, None))
	date_bin(Interval(MonthDayNano), Timestamp(Nanosecond, Some("+TZ")))
	date_bin(Interval(DayTime), Timestamp(Nanosecond, None))
	date_bin(Interval(DayTime), Timestamp(Nanosecond, Some("+TZ")))
	date_bin(Interval(MonthDayNano), Timestamp(Microsecond, None), Timestamp(Nanosecond, None))
	date_bin(Interval(MonthDayNano), Timestamp(Microsecond, Some("+TZ")), Timestamp(Nanosecond, Some("+TZ")))
	date_bin(Interval(DayTime), Timestamp(Microsecond, None), Timestamp(Nanosecond, None))
	date_bin(Interval(DayTime), Timestamp(Microsecond, Some("+TZ")), Timestamp(Nanosecond, Some("+TZ")))
	date_bin(Interval(MonthDayNano), Timestamp(Microsecond, None))
	date_bin(Interval(MonthDayNano), Timestamp(Microsecond, Some("+TZ")))
	date_bin(Interval(DayTime), Timestamp(Microsecond, None))
	date_bin(Interval(DayTime), Timestamp(Microsecond, Some("+TZ")))
	date_bin(Interval(MonthDayNano), Timestamp(Millisecond, None), Timestamp(Nanosecond, None))
	date_bin(Interval(MonthDayNano), Timestamp(Millisecond, Some("+TZ")), Timestamp(Nanosecond, Some("+TZ")))
	date_bin(Interval(DayTime), Timestamp(Millisecond, None), Timestamp(Nanosecond, None))
	date_bin(Interval(DayTime), Timestamp(Millisecond, Some("+TZ")), Timestamp(Nanosecond, Some("+TZ")))
	date_bin(Interval(MonthDayNano), Timestamp(Millisecond, None))
	date_bin(Interval(MonthDayNano), Timestamp(Millisecond, Some("+TZ")))
	date_bin(Interval(DayTime), Timestamp(Millisecond, None))
	date_bin(Interval(DayTime), Timestamp(Millisecond, Some("+TZ")))
	date_bin(Interval(MonthDayNano), Timestamp(Second, None), Timestamp(Nanosecond, None))
	date_bin(Interval(MonthDayNano), Timestamp(Second, Some("+TZ")), Timestamp(Nanosecond, Some("+TZ")))
	date_bin(Interval(DayTime), Timestamp(Second, None), Timestamp(Nanosecond, None))
	date_bin(Interval(DayTime), Timestamp(Second, Some("+TZ")), Timestamp(Nanosecond, Some("+TZ")))
	date_bin(Interval(MonthDayNano), Timestamp(Second, None))
	date_bin(Interval(MonthDayNano), Timestamp(Second, Some("+TZ")))
	date_bin(Interval(DayTime), Timestamp(Second, None))
	date_bin(Interval(DayTime), Timestamp(Second, Some("+TZ")))

To Reproduce

To reproduce run the following script in datafusion-cli:

-- Create a table with some timestamps in it.
CREATE TABLE test (
  time TIMESTAMP WITH TIME ZONE
) AS VALUES
  ('2000-01-01T00:00:00Z'),
  ('2000-01-01T00:00:01Z');

-- Check that the column preserved the time zone.
SELECT arrow_typeof(time) FROM test;

-- Check that date_bin runs without an origin.
SELECT date_bin('1 minute', time) FROM test;

-- Check that date_bin runs with an origin.
SELECT date_bin('1 minute', time, '1970-01-01T00:00:00Z') FROM test;

Expected behavior

The SELECT date_bin('1 minute', time) FROM test; and SELECT date_bin('1 minute', time, '1970-01-01T00:00:00Z') FROM test; queries should produce identical results.

Additional context

The time zone support for date_bin doesn't seem to be in a released version yet. I was testing this with a datafusion-cli built from commit 4b2b7dcfc63abfc03b0279abe122c5bdfcca5275.

@mhilton mhilton added the bug Something isn't working label Sep 29, 2023
@alamb
Copy link
Contributor

alamb commented Sep 29, 2023

@mhilton and I just had chat and here are some notes:

The way the code currently works is that date_bin lists several signatures it can handle, such as

https://github.com/apache/arrow-datafusion/blob/d19e9d684bbe1fd820674d48a96795bfbea9db7d/datafusion/expr/src/built_in_function.rs#L1041-L1046

If the user specifies arguments to that function as, eg, a String that doesn't match the required types, the datafusion coercion logic kicks in and adds casts on the specific arguments to make them conform to one of the available signatures.

The current convention is that Timestamp(Second, Some("+TZ"))) effectively means "any Timestamp type with a timezone, and then the implementation of the function (in this case date_bin) must handle any possible timezone that comes in.

The problem with the current convention is that the coercion rules for Timestamp(Second, Some("+TZ"))) https://github.com/apache/arrow-datafusion/blob/d19e9d684bbe1fd820674d48a96795bfbea9db7d/datafusion/expr/src/type_coercion/functions.rs#L222-L226

Only Support casting from another timestamp, not from Strings, the way Timestamp(Second, None)) does:
https://github.com/apache/arrow-datafusion/blob/d19e9d684bbe1fd820674d48a96795bfbea9db7d/datafusion/expr/src/type_coercion/functions.rs#L209-L216

Thus I think the solution @mhilton and I brainstormed is to change the signature to Timestamp(Second, Some("+00:00))) (aka only accept UTC timestamps) and teach the coercion logic to cast all argument to that time. Then the implementation of date_bin only needs to handle one timezone (UTC)

cc @wiedld

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants