Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support date_bin with 2 arguments (rather than requiring three) #5641

Closed
Tracked by #3148
alamb opened this issue Mar 19, 2023 · 3 comments · Fixed by #5643
Closed
Tracked by #3148

Support date_bin with 2 arguments (rather than requiring three) #5641

alamb opened this issue Mar 19, 2023 · 3 comments · Fixed by #5643
Labels
enhancement New feature or request good first issue Good for newcomers sql

Comments

@alamb
Copy link
Contributor

alamb commented Mar 19, 2023

Is your feature request related to a problem or challenge? Please describe what you are trying to do.
I am trying to bin my data by some particular interval using the date_bin function. Datafusion follows the postgres model https://www.postgresql.org/docs/current/functions-datetime.html#FUNCTIONS-DATETIME-BIN that requires a third argument for "origin" (where to start the bins)

However for most queries, the unix epoch is likely the bin that is desired (or something that starts on the start of a day)

This query in datafusion-cli with this file works great on cpu.zip

select date_bin('1 hour', time, '1970-01-01')  from 'cpu.parquet';

This query doesn't (omitting the '1970-01-01' origin)

select date_bin('1 hour', time)  from 'cpu.parquet';
Error during planning: Coercion from [Utf8, Timestamp(Nanosecond, None)] to the signature Exact([Interval(DayTime), Timestamp(Nanosecond, None), Timestamp(Nanosecond, None)]) failed.

Describe the solution you'd like
I would like

select date_bin('1 hour', time)  from 'cpu.parquet';

to run and give the same answer as

select date_bin('1 hour', time, '1970-01-01')  from 'cpu.parquet';

Describe alternatives you've considered

Additional context
I think this is a good way to add something to datafusion as most of the plumbing is there already and we simply need to add another variant of date_bin

The existing tests can be followed: https://github.com/apache/arrow-datafusion/blob/3ccf1aebb6959fbc6bbbf74d2821522ddfd7d484/datafusion/core/tests/sqllogictests/test_files/timestamps.slt#L100

@alamb alamb added enhancement New feature or request good first issue Good for newcomers sql labels Mar 19, 2023
@alamb alamb changed the title Support date_bin with 2 arguments (rather than requiring three) Support date_bin with 2 arguments (rather than requiring three) Mar 19, 2023
@alamb
Copy link
Contributor Author

alamb commented Mar 19, 2023

cc @waitingkuo

@waitingkuo
Copy link
Contributor

hi @alamb thank you for raising this, it looks great.

one thing i'm still struggleing is whether
select date_bin('30 minutes', '1970-01-01T00:31:01+08:00');
should output
1969-12-31T16:30:00
or
1970-01-01T00:30:00

note that datafusion currently outputs timestamp

select date_bin('30 minutes', timestamp '1970-01-01T00:31:01+08:00', '1970-01-01');
+----------------------------------------------------------------------------------+
| datebin(Utf8("30 minutes"),Utf8("1970-01-01T00:31:01+08:00"),Utf8("1970-01-01")) |
+----------------------------------------------------------------------------------+
| 1969-12-31T16:30:00                                                              |
+----------------------------------------------------------------------------------+
1 row in set. Query took 0.003 seconds.

while posgresql outputs timestamptz if either source or origin is timestamptz

        date_bin        
------------------------
 1969-12-31 16:30:00+00
(1 row)

@alamb
Copy link
Contributor Author

alamb commented Mar 20, 2023

while posgresql outputs timestamptz if either source or origin is timestamptz

I think this behavior would make sense to me (though that probably involves making date_bin return a different type based on the types of its arguments

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers sql
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants