Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support SUM AVG, MIN, MAX on Time columns. #3166

Closed
Tracked by #3148
alamb opened this issue Aug 15, 2022 · 14 comments · Fixed by #3178
Closed
Tracked by #3148

Support SUM AVG, MIN, MAX on Time columns. #3166

alamb opened this issue Aug 15, 2022 · 14 comments · Fixed by #3178
Labels
enhancement New feature or request good first issue Good for newcomers

Comments

@alamb
Copy link
Contributor

alamb commented Aug 15, 2022

Is your feature request related to a problem or challenge? Please describe what you are trying to do.

DataFusion does not support SUM, AVG, MIN or MAX on Time type columns a standard SQL feature that could be used to sum how much cumulative time was spent doing some operation, for example

Reproducer is as follows:

select 
  sum(t), 
  avg(t)
from 
  (select cast ('11:00:00' as TIME) as t) as sq;

Doing so today will result in

select 
  sum(t), 
  avg(t)
from 
  (select cast ('11:00:00' as TIME) as t) as sq;
Plan("The function Sum does not support inputs of type Time64(Nanosecond).")
❯ 

Describe the solution you'd like
A clear and concise description of what you want to happen.

Describe alternatives you've considered
N/A

Additional context
As described in #200

@alamb alamb added the enhancement New feature or request label Aug 15, 2022
@alamb
Copy link
Contributor Author

alamb commented Aug 15, 2022

I also think this is a relatively good first issue as it requires following the model for existing aggregates (e.g. SUM(int64)) so marking it as such.

@waitingkuo
Copy link
Contributor

waitingkuo commented Aug 15, 2022

@alamb postgrseql cast time to interval and then do SUM and AVG

@alamb
Copy link
Contributor Author

alamb commented Aug 15, 2022

@waitingkuo notes that postgres does:

willy=# select pg_typeof(SUM(time '00:00:00')),  pg_typeof(AVg(time '00:00:00'));
 pg_typeof | pg_typeof 
-----------+-----------
 interval  | interval
(1 row)

this make sense otherwise SUM(Time) is just quite strange

@alamb
Copy link
Contributor Author

alamb commented Aug 15, 2022

java.lang.ConcurrentModificationException 😆 (our updates got crossed)

@waitingkuo
Copy link
Contributor

@alamb i think we could add more expressions like MAX and MIN here (which is already implemented for timestamp

@alamb alamb changed the title Support SUM or AVG on Time columns. Support SUM AVG, MIN, MAX on Time columns. Aug 15, 2022
@waitingkuo
Copy link
Contributor

do we have interval for now?

select '00:00:00'::interval;
NotImplemented("Unsupported SQL type Interval")

but this works

select interval '1 second';
+------------------------------------------------+
| IntervalDayTime("1000")                        |
+------------------------------------------------+
| 0 years 0 mons 0 days 0 hours 0 mins 1.00 secs |
+------------------------------------------------+

@waitingkuo
Copy link
Contributor

@alamb should we use IntervalMonthDayNano insead?

this is what we have for now

    /// Number of elapsed whole months
    IntervalYearMonth(Option<i32>),

    /// Number of elapsed days and milliseconds (no leap seconds)
    /// stored as 2 contiguous 32-bit signed integers
    IntervalDayTime(Option<i64>),

    /// A triple of the number of elapsed months, days, and nanoseconds.
    /// Months and days are encoded as 32-bit signed integers.
    /// Nanoseconds is encoded as a 64-bit signed integer (no leap seconds).
    IntervalMonthDayNano(Option<i128>),

@alamb
Copy link
Contributor Author

alamb commented Aug 15, 2022

Since we are treating Time literals as Nanosecond precision, I agree @waitingkuo that the output of SUM() and AVG() of TIME should be Interval(MonthDayNano) 👍

select arrow_typeof(cast('10:00:00' as time));
+-----------------------------------------------------------+
| arrowtypeof(CAST(Utf8("10:00:00") AS Time64(Nanosecond))) |
+-----------------------------------------------------------+
| Time64(Nanosecond)                                        |
+-----------------------------------------------------------+
1 row in set. Query took 0.038 seconds.

@alamb
Copy link
Contributor Author

alamb commented Aug 15, 2022

do we have interval for now?

🤔 looks like another feature gap #3167 (comment)

@waitingkuo
Copy link
Contributor

waitingkuo commented Aug 15, 2022

Since we are treating Time literals as Nanosecond precision, I agree @waitingkuo that the output of SUM() and AVG() of TIME should be Interval(MonthDayNano) 👍

select arrow_typeof(cast('10:00:00' as time));
+-----------------------------------------------------------+
| arrowtypeof(CAST(Utf8("10:00:00") AS Time64(Nanosecond))) |
+-----------------------------------------------------------+
| Time64(Nanosecond)                                        |
+-----------------------------------------------------------+
1 row in set. Query took 0.038 seconds.

@alamb
what about select interval '1 second', should we output Interval(MonthDayNano) for making them all consistent?

@waitingkuo
Copy link
Contributor

@alamb i've created a pull request for max and min #3178

sum and avg requires more work depends on some change for interval, i'll create another issue to discuss it. once we finished, i'll get back to this issue

select min(t), max(t) from  (select '00:00:00' as t union select '00:00:01' union select '00:00:02');
+----------+----------+
| MIN(t)   | MAX(t)   |
+----------+----------+
| 00:00:00 | 00:00:02 |
+----------+----------+
1 row in set. Query took 0.009 seconds.
❯ 

@waitingkuo
Copy link
Contributor

@alamb i was wrong, current interval implementation outputs different data type according to different input #3180

@alamb
Copy link
Contributor Author

alamb commented Aug 16, 2022

Thanks @waitingkuo -- I am starting to work through my review backlog and will catch up with this shortly

@alamb
Copy link
Contributor Author

alamb commented Aug 16, 2022

cc @ovr

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants