ARROW-13022: [R] bindings for lubridate's year, isoyear, quarter, month, day, wday, yday, isoweek, hour, minute, and second functions #10507

thisisnic · 2021-06-10T20:18:33Z

No description provided.

github-actions · 2021-06-10T20:18:50Z

https://issues.apache.org/jira/browse/ARROW-13022

jorisvandenbossche · 2021-06-11T06:31:23Z

r/R/dplyr-functions.R

+#' 
+#' Arrow's `day_of_week` kernel counts from 0 (Monday) to 6 (Sunday), whereas
+#' `lubridate::wday` counts from 1 to 7, and allows users to specify which day
+#' of the week is first (Sunday by default).  This function converts the returned


I was going to mention that we actually also have an "ISO weekday" in the C++ kernels, but only as field in the iso_calendar kernel, and not as separate one. We could also add iso_day_of_week if that helps, or even just make day_of_week follow the ISO conventions. Because the 0 (monday) - 6 (sunday) might be something Python specific.

But then looked at what C++ does, and there the default is actually 0 (sunday) - 6 (saturday), so yet something else ;) (although I see that also Postgres uses that for dow)

In the end, once you have it in one form, it's an easy conversion to any of the other of course, so it might not matter that much.

We could add options to the C++ kernel to enable different behaviors there.

That probably makes more sense than my workaround here actually.

If we go that way it would probably be best to have another Jira for "TemporalOptions". It's probably best you proceed with the workaround and we loop back to this later.

OK, will do, and will open a JIRA, thanks!

https://issues.apache.org/jira/browse/ARROW-13054

Please leave a comment with that JIRA issue somewhere in this code that we expect to remove.

jorisvandenbossche · 2021-06-11T06:32:25Z

r/R/expression.R

+  "day" = "day",
+  "yday" = "day_of_year",
+  "isoweek" = "iso_week",
+  "minute" = "minute",


"hour" is missing here?

Oops, will add in now.

jorisvandenbossche · 2021-06-11T06:33:27Z

r/R/expression.R

+  "yday" = "day_of_year",
+  "isoweek" = "iso_week",
+  "minute" = "minute",
+  "second" = "second"


Note that second might do something different. I think "second" in lubridate is the equivalent of "second + subsecond" in arrow

Good catch. Hmm, I think this might be more complicated than that actually, as lubridate rounds it to 1 decimal place, using R's odd-even rounding whereas Arrow doesn't do the rounding, so I'm not sure if this can be done without rounding being implemented (see https://issues.apache.org/jira/browse/ARROW-12744)

What do you mean exactly with rounding? A quick try gives me:

> second(ymd_hms("2011-06-04 12:00:01.123456")) [1] 1.123456

which seems to give all decimals

So it does; I think I must have been getting mixed up with something I'd changed when I needed to update a test I wrote, never mind,

What about nanoseconds?

> second(ymd_hms("2011-06-04 12:00:01.123456789")) [1] 1.123457

Arrow would probably return 1.123456789.

I don't think lubridate / R supports nanosecond resolution

I think so too. So probably it should be "second = second + round(subsecond, 6)" to match that behavior?

IDK that we should truncate the data like this. A slight (technical) API difference is probably better than throwing away precision.

Agreed, I would also say that it's not because lubridate does not support nanoseconds that if you actually have nanoseconds (which is possible since arrow supports it) those should be discarded.

r/tests/testthat/test-dplyr-lubridate.R

r/R/dplyr-functions.R

r/tests/testthat/test-dplyr-lubridate.R

nealrichardson · 2021-06-22T17:48:41Z

r/R/dplyr-functions.R

+        Expression$create(
+          "cast",
+          Expression$create("divide_checked", e1, e2),
+          options = cast_options(to_type = int32(), allow_float_truncate = TRUE,


If you make all of the scalars integers (e.g. 1L), do you still need to cast?

nealrichardson · 2021-06-22T17:49:15Z

r/R/dplyr-functions.R

+
+  e2 = Expression$scalar(7)
+
+  # (e1 - e2 * ( e1 %/% e2 )) + 1


Did you try doing exactly this expression? I would think it should just work because Ops.Expression is defined for all of those.

Awesome, I had no idea, deleted a lot of unnecessary lines of code now, thanks @nealrichardson !

r/R/dplyr-functions.R

r/R/expression.R

nealrichardson · 2021-06-22T17:50:26Z

r/R/expression.R

@@ -29,8 +29,17 @@
  # stringr spellings of those
  "str_length" = "utf8_length",
  "str_to_lower" = "utf8_lower",
-  "str_to_upper" = "utf8_upper"
+  "str_to_upper" = "utf8_upper",
  # str_trim is defined in dplyr.R


Can you update this comment to say dplyr-functions.R?

r/tests/testthat/test-dplyr-lubridate.R

r/R/dplyr-functions.R

nealrichardson

One whitespace suggestion but otherwise LGTM, great work!

Co-authored-by: Neal Richardson <neal.p.richardson@gmail.com>

…r the "day_of_week" temporal kernel This is to resolve [ARROW-13054](https://issues.apache.org/jira/browse/ARROW-13054). This will be needed for casting timezone-naive timestamps [ARROW-13033](https://issues.apache.org/jira/browse/ARROW-13033) and defining [starting day of the week](#10507 (review)). Closes #10598 from rok/ARROW-13054 Lead-authored-by: Rok <rok@mihevc.org> Co-authored-by: Rok Mihevc <rok@mihevc.org> Signed-off-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>

github-actions bot added the Component: R label Jun 10, 2021

jorisvandenbossche reviewed Jun 11, 2021

View reviewed changes

ianmcook reviewed Jun 11, 2021

View reviewed changes

r/tests/testthat/test-dplyr-lubridate.R Outdated Show resolved Hide resolved

nealrichardson reviewed Jun 11, 2021

View reviewed changes

r/R/dplyr-functions.R Show resolved Hide resolved

thisisnic force-pushed the ARROW-13022_lubridate branch from 0e0a832 to 660ccfd Compare June 21, 2021 08:31

thisisnic commented Jun 22, 2021

View reviewed changes

r/tests/testthat/test-dplyr-lubridate.R Show resolved Hide resolved

thisisnic commented Jun 22, 2021

View reviewed changes

r/tests/testthat/test-dplyr-lubridate.R Show resolved Hide resolved

thisisnic added 12 commits June 22, 2021 08:50

Add functions for extracting time/date components

475962a

Implement day_of_week so it matches lubridate::wday

51f3aca

Sort out spacing

a4bd4b8

Add bindings for hour

5621cd6

Add in implementation of second, update offset func, separate tests

ab8abfd

Entirely refactor wday formulation so it is achievable via Expressions

4803b8b

Separate out tests and add an extra week_start one

7539b76

Call nse_func directly when expecting an error

c6e20b2

Add test for if there is a timezone aware timestamp

e66ac3d

Can't extract date from date32

08d44ba

Fix test

a98623f

Update error message

942e92f

thisisnic force-pushed the ARROW-13022_lubridate branch from d67a224 to 942e92f Compare June 22, 2021 07:54

thisisnic added 2 commits June 22, 2021 12:23

Rearrange and tidy comments

86af140

Remove unnecessary field ref creation

583a996

thisisnic requested a review from nealrichardson June 22, 2021 16:10

nealrichardson reviewed Jun 22, 2021

View reviewed changes

r/R/dplyr-functions.R Outdated Show resolved Hide resolved

nealrichardson reviewed Jun 22, 2021

View reviewed changes

r/R/expression.R Show resolved Hide resolved

nealrichardson reviewed Jun 22, 2021

View reviewed changes

r/tests/testthat/test-dplyr-lubridate.R Show resolved Hide resolved

nealrichardson reviewed Jun 22, 2021

View reviewed changes

r/tests/testthat/test-dplyr-lubridate.R Show resolved Hide resolved

thisisnic added 7 commits June 23, 2021 14:17

Add comment that certain functions defined in dplyr-functions.R

e8d7a5f

Massively simplify arrow::wday -> lubridate::wday code

bc0f851

Add link to ticket that affects supporting label arg

7390eaa

Simplify expression further

9452071

Simplyify further

fb4bc02

Reference correct file

f08cfdc

Add ticket numbers to unsupported features

5719c03

thisisnic requested a review from nealrichardson June 23, 2021 14:54

nealrichardson reviewed Jun 23, 2021

View reviewed changes

r/R/dplyr-functions.R Outdated Show resolved Hide resolved

nealrichardson approved these changes Jun 23, 2021

View reviewed changes

Update r/R/dplyr-functions.R

0ce4c5a

Co-authored-by: Neal Richardson <neal.p.richardson@gmail.com>

nealrichardson closed this in 5275e72 Jun 24, 2021

rok mentioned this pull request Jun 24, 2021

ARROW-13054: [C++] Add option to specify the first day of the week for the "day_of_week" temporal kernel #10598

Closed

This was referenced Nov 24, 2021

[R] bindings for lubridate's year, isoyear, quarter, month, day, wday, yday, isoweek, hour, minute, and second functions #28736

Closed

[R] Add support for locale-specific day of week (and month of year?) returns from timestamp accessor functions #28834

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ARROW-13022: [R] bindings for lubridate's year, isoyear, quarter, month, day, wday, yday, isoweek, hour, minute, and second functions #10507

ARROW-13022: [R] bindings for lubridate's year, isoyear, quarter, month, day, wday, yday, isoweek, hour, minute, and second functions #10507

thisisnic commented Jun 10, 2021

github-actions bot commented Jun 10, 2021

jorisvandenbossche Jun 11, 2021

rok Jun 11, 2021

thisisnic Jun 11, 2021

rok Jun 11, 2021

thisisnic Jun 11, 2021

thisisnic Jun 11, 2021

nealrichardson Jun 11, 2021

thisisnic Jun 23, 2021

jorisvandenbossche Jun 11, 2021

thisisnic Jun 11, 2021

jorisvandenbossche Jun 11, 2021

thisisnic Jun 11, 2021

jorisvandenbossche Jun 11, 2021

thisisnic Jun 11, 2021

rok Jun 11, 2021

jorisvandenbossche Jun 11, 2021

rok Jun 11, 2021 •

edited

Loading

nealrichardson Jun 11, 2021

jorisvandenbossche Jun 14, 2021

nealrichardson Jun 22, 2021

nealrichardson Jun 22, 2021

thisisnic Jun 23, 2021

nealrichardson Jun 22, 2021

thisisnic Jun 23, 2021

nealrichardson left a comment

ARROW-13022: [R] bindings for lubridate's year, isoyear, quarter, month, day, wday, yday, isoweek, hour, minute, and second functions #10507

ARROW-13022: [R] bindings for lubridate's year, isoyear, quarter, month, day, wday, yday, isoweek, hour, minute, and second functions #10507

Conversation

thisisnic commented Jun 10, 2021

github-actions bot commented Jun 10, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rok Jun 11, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nealrichardson left a comment

Choose a reason for hiding this comment

rok Jun 11, 2021 •

edited

Loading