Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARROW-11055: [Rust] [DataFusion] Support date_trunc function #9040

Closed
wants to merge 7 commits into from

Conversation

paveltiunov
Copy link
Contributor

date_trunc SQL function implementation and GROUP BY timestamp support.

@github-actions
Copy link

@jhorstmann
Copy link
Contributor

Cool! A few small comments:

  • In postgres the argument order is the other way around date_trunc('week', timestamp). I haven't compared with other databases, but following postgres makes sense to me most of the time :)
  • A few unit tests, especially for the week truncation around the beginning/end of a year would be nice. Maybe the test input could be prepared using to_timestamp so the test is easier to review.
  • In most usages, the date part parameter would be a literal and it might be worthwhile to optimize for that. Doesn't fit that nicely into the current BuiltinScalarFunction infrastructure, so I'm fine with considering that as out of scope for now.

@alamb
Copy link
Contributor

alamb commented Dec 31, 2020

The full set of Rust CI tests did not run on this PR :(

Can you please rebase this PR against apache/master to pick up the changes in #9056 so that they do?

I apologize for the inconvenience.

@paveltiunov
Copy link
Contributor Author

Hey @jhorstmann ! Thanks for the review!

  • In postgres the argument order is the other way around date_trunc('week', timestamp). I haven't compared with other databases, but following postgres makes sense to me most of the time :)

I've swapped arguments order. It's 50/50 among databases however agree Postgres order is a little bit more common.

  • A few unit tests, especially for the week truncation around the beginning/end of a year would be nice. Maybe the test input could be prepared using to_timestamp so the test is easier to review.

I've added some tests at the beginning of the year. Is it something you were looking for?

  • In most usages, the date part parameter would be a literal and it might be worthwhile to optimize for that. Doesn't fit that nicely into the current BuiltinScalarFunction infrastructure, so I'm fine with considering that as out of scope for now.

Yep. Agree it's out of scope for this PR.

@alamb I rebased it against the latest master.

@codecov-io
Copy link

codecov-io commented Jan 2, 2021

Codecov Report

Merging #9040 (7a1a618) into master (3992456) will increase coverage by 0.00%.
The diff coverage is 79.89%.

Impacted file tree graph

@@           Coverage Diff            @@
##           master    #9040    +/-   ##
========================================
  Coverage   82.56%   82.56%            
========================================
  Files         203      203            
  Lines       50042    50236   +194     
========================================
+ Hits        41315    41476   +161     
- Misses       8727     8760    +33     
Impacted Files Coverage Δ
rust/datafusion/src/physical_plan/group_scalar.rs 67.85% <0.00%> (-2.52%) ⬇️
rust/datafusion/src/scalar.rs 56.17% <0.00%> (-2.83%) ⬇️
rust/arrow/src/csv/reader.rs 93.15% <55.55%> (-1.34%) ⬇️
...ust/datafusion/src/physical_plan/hash_aggregate.rs 86.48% <60.00%> (-0.74%) ⬇️
rust/datafusion/src/physical_plan/hash_join.rs 89.53% <66.66%> (-0.41%) ⬇️
rust/datafusion/src/test/mod.rs 89.92% <75.00%> (-0.48%) ⬇️
...tafusion/src/physical_plan/datetime_expressions.rs 92.60% <90.74%> (-1.66%) ⬇️
rust/datafusion/src/execution/context.rs 89.20% <100.00%> (+0.36%) ⬆️
rust/datafusion/src/physical_plan/functions.rs 80.00% <100.00%> (+0.65%) ⬆️
rust/datafusion/src/physical_plan/type_coercion.rs 98.54% <100.00%> (+0.01%) ⬆️
... and 1 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3992456...7a1a618. Read the comment docs.

if array.is_null(i) {
Ok(0_i64)
} else {
let date_time = match granularity_array.value(i) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This list seems consistent with https://www.postgresql.org/docs/9.1/functions-datetime.html#FUNCTIONS-DATETIME-TRUNC 👍 I think it is fine that we don't yeet support some of the more esoteric stuff (like milleniumm')

string_builder.append_value("week")?;

ts_builder.append_value("2020-01-01T13:42:29.190855Z")?;
truncated_builder.append_value("2019-12-30T00:00:00.000000Z")?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@@ -35,6 +35,8 @@ pub(crate) enum GroupByScalar {
Int32(i32),
Int64(i64),
Utf8(Box<String>),
TimeMicrosecond(i64),
TimeNanosecond(i64),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @paveltiunov -- this looks like a great initial addition.

The only thing I think might be worthwhile (as a follow on PR) would be a sql level test in https://github.com/apache/arrow/blob/master/rust/datafusion/tests/sql.rs (to ensure everything was hooked up correctly)

Again, thanks again

@alamb alamb closed this in 4b7cdcb Jan 2, 2021
@paveltiunov
Copy link
Contributor Author

@alamb Gotcha. I actually added SQL test however it went to context.rs. I saw sql.rs test and I wasn't sure what's the best place for these integration tests. It makes sense to move it there. Good to know. Thanks!

GeorgeAp pushed a commit to sirensolutions/arrow that referenced this pull request Jun 7, 2021
`date_trunc` SQL function implementation and GROUP BY timestamp support.

Closes apache#9040 from paveltiunov/date-trunc

Authored-by: Pavel Tiunov <pavel.tiunov@gmail.com>
Signed-off-by: Andrew Lamb <andrew@nerdnetworks.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants