Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(compute): Support week0 (PostgreSQL behaviour) for temporal #2052

Merged
merged 5 commits into from
Jul 14, 2022

Conversation

ovr
Copy link
Contributor

@ovr ovr commented Jul 12, 2022

Rationale for this change

DataFusion uses temporal to operate with date like structures. I want to implement EXTRACT(DOW from NOW()).

What changes are included in this PR?

Implementation of temporal::dow

Are there any user-facing changes?

No

Thanks

@ovr ovr changed the title feat(compute): Support dow for temporal feat(compute): Support DOW (day of week) for temporal Jul 12, 2022
@github-actions github-actions bot added the arrow Changes to the arrow crate label Jul 12, 2022
Copy link
Contributor

@tustvold tustvold left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did some research and implementations seem to be wildly inconsistent on whether the start of the week is Sunday or Monday, and whether the returned value start at 0 or 1.

postgres extract(dow ...) and datepart is 0-6 starting on Sunday
mysql weekday is 1-7 starting on Monday
SQL server lets you set the start date to anything, although its datepart starts at 1

I don't know if @alamb has any source on what the "correct" behaviour is. The ISO standard is 1-7 starting on Monday, but I understand America has its own date standards...


fn quarter0(&self) -> u32 {
self.month0() / 3
/// Extracts the day of week of a given temporal array as an array of integers
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// Extracts the day of week of a given temporal array as an array of integers
/// Extracts the day of week of a given temporal array as an array of integers
/// with each integer representing the number of days from sunday.
///
/// For example, Sunday 1st 2010, would return 1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops! Sorry, I didnt saw your commit suggestion while I was working on the update for PR.

image

Anyway, if you think that updated comment is inadequate, feel free to add addition, I'll commit it.

Thanks

fn quarter0(&self) -> u32 {
self.month0() / 3
/// Extracts the day of week of a given temporal array as an array of integers
pub fn dow<T>(array: &PrimitiveArray<T>) -> Result<Int32Array>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder how this is different than weekday: https://docs.rs/arrow/18.0.0/arrow/compute/kernels/temporal/fn.weekday.html (defined on line 304 of this same file)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, I didn't saw this new method, because we use old DF & old Arrow.

@alamb
Copy link
Contributor

alamb commented Jul 12, 2022

My opinion is that DataFusion should follow the postgres behavior, but arrow is not thus constrained. If it would be more convenient to have kernels that started on 0 and one that started on 1 (e.g. weekday and weekday0) I think that would also be fine

@ovr
Copy link
Contributor Author

ovr commented Jul 12, 2022

If it would be more convenient to have kernels that started on 0 and one that started on 1 (e.g. weekday and weekday0) I think that would also be fine

👍

If it's okey, I can update this PR to reflect similar way.

@@ -654,6 +620,26 @@ mod tests {
assert_eq!(2, b.value(2));
}

#[test]
fn test_temporal_array_date64_weekday0() {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same behaivour as PostgreSQL:

image

@ovr ovr changed the title feat(compute): Support DOW (day of week) for temporal feat(compute): Support week0 (PostgreSQL behaivour) for temporal Jul 12, 2022
@ovr ovr changed the title feat(compute): Support week0 (PostgreSQL behaivour) for temporal feat(compute): Support week0 (PostgreSQL behaviour) for temporal Jul 12, 2022
@codecov-commenter
Copy link

Codecov Report

Merging #2052 (a610ccb) into master (742a590) will increase coverage by 0.00%.
The diff coverage is 81.81%.

@@           Coverage Diff           @@
##           master    #2052   +/-   ##
=======================================
  Coverage   83.55%   83.56%           
=======================================
  Files         222      222           
  Lines       58216    58234   +18     
=======================================
+ Hits        48645    48662   +17     
- Misses       9571     9572    +1     
Impacted Files Coverage Δ
arrow/src/compute/kernels/temporal.rs 94.95% <81.81%> (+0.30%) ⬆️
...row/src/array/builder/string_dictionary_builder.rs 90.64% <0.00%> (-0.72%) ⬇️
parquet/src/encodings/encoding.rs 93.43% <0.00%> (-0.20%) ⬇️
parquet_derive/src/parquet_field.rs 65.98% <0.00%> (+0.22%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 742a590...a610ccb. Read the comment docs.

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @ovr -- this looks good to me. I think we should consider different names (but I realize you already renamed things once, so I would be happy to do the renaming as a follow on).

cc @tustvold

arrow/src/compute/kernels/temporal.rs Outdated Show resolved Hide resolved
arrow/src/compute/kernels/temporal.rs Outdated Show resolved Hide resolved
fn quarter0(&self) -> u32 {
self.month0() / 3
fn weekday0(&self) -> i32 {
self.weekday().num_days_from_sunday() as i32
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 now I see that chrono uses the names num_days_from_sunday and num_days_from_monday

https://docs.rs/chrono/0.4.19/chrono/enum.Weekday.html#method.num_days_from_sunday
https://docs.rs/chrono/0.4.19/chrono/enum.Weekday.html#method.num_days_from_monday

What would you think about using those names instead of weekday0 and weekday @ovr (I would be happy to rename things in a follow on PR prior to release)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ done in 6e80702

is it ok?

Thanks

ovr and others added 3 commits July 13, 2022 13:48
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
@alamb
Copy link
Contributor

alamb commented Jul 14, 2022

Thanks @ovr

@alamb alamb merged commit 4444cb7 into apache:master Jul 14, 2022
@ursabot
Copy link

ursabot commented Jul 14, 2022

Benchmark runs are scheduled for baseline = 50e285f and contender = 4444cb7. 4444cb7 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on ec2-t3-xlarge-us-east-2] ec2-t3-xlarge-us-east-2
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on test-mac-arm] test-mac-arm
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on ursa-i9-9960x] ursa-i9-9960x
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on ursa-thinkcentre-m75q] ursa-thinkcentre-m75q
Buildkite builds:
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants