Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARROW-15098 [R] Add binding for lubridate::duration() and/or as.difftime() #12506

Closed
wants to merge 51 commits into from

Conversation

dragosmg
Copy link
Contributor

@dragosmg dragosmg commented Feb 24, 2022

This PR adds bindings for base::difftime() and base::as.difftime().

@github-actions
Copy link

@dragosmg dragosmg marked this pull request as ready for review March 7, 2022 18:10
@dragosmg dragosmg marked this pull request as draft March 7, 2022 18:10
@dragosmg dragosmg marked this pull request as ready for review March 7, 2022 19:25
@dragosmg dragosmg marked this pull request as draft March 7, 2022 19:26
@dragosmg dragosmg marked this pull request as ready for review March 8, 2022 14:19
Copy link
Member

@paleolimbot paleolimbot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few thoughts! Feel free to ignore any of these as I haven't been following the latest bindings PRs.

r/R/dplyr-funcs-type.R Outdated Show resolved Hide resolved
r/R/dplyr-funcs-type.R Outdated Show resolved Hide resolved
r/R/dplyr-funcs-type.R Outdated Show resolved Hide resolved
r/R/dplyr-funcs-type.R Outdated Show resolved Hide resolved
r/tests/testthat/test-dplyr-funcs-datetime.R Outdated Show resolved Hide resolved
r/tests/testthat/test-dplyr-funcs-datetime.R Outdated Show resolved Hide resolved
r/tests/testthat/test-dplyr-funcs-datetime.R Outdated Show resolved Hide resolved
r/tests/testthat/test-dplyr-funcs-datetime.R Outdated Show resolved Hide resolved
r/tests/testthat/test-dplyr-funcs-datetime.R Outdated Show resolved Hide resolved
r/tests/testthat/test-dplyr-funcs-type.R Outdated Show resolved Hide resolved
@dragosmg
Copy link
Contributor Author

I feel this is a trade-off between:

  • erroring on all doubles and having a clear message
  • erroring on pure doubles (function works on integer-like doubles) with a C++ message that might be less clear

Not sure where to go.

@dragosmg dragosmg marked this pull request as ready for review March 14, 2022 11:00
Copy link
Member

@thisisnic thisisnic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor point on testing C++ error message, otherwise LGTM

r/R/dplyr-funcs-datetime.R Outdated Show resolved Hide resolved
r/R/dplyr-funcs-datetime.R Outdated Show resolved Hide resolved
r/R/dplyr-funcs-datetime.R Outdated Show resolved Hide resolved
r/R/dplyr-funcs-datetime.R Outdated Show resolved Hide resolved
Copy link
Member

@jonkeane jonkeane left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more comment on the tests.

r/tests/testthat/test-dplyr-funcs-datetime.R Outdated Show resolved Hide resolved
@dragosmg dragosmg requested a review from jonkeane March 16, 2022 11:41
Copy link
Member

@jonkeane jonkeane left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good — sorry for the relatively large number of comments here, I think I've finally started seeing some things for the first time now that a lot of the if/elses with tz got out of the way.

Mostly comments about messages + default args. Though you might want to look into time32() for as.difftime().

r/R/dplyr-funcs-datetime.R Outdated Show resolved Hide resolved
r/R/dplyr-funcs-datetime.R Outdated Show resolved Hide resolved
r/R/dplyr-funcs-datetime.R Outdated Show resolved Hide resolved
r/R/dplyr-funcs-datetime.R Outdated Show resolved Hide resolved
r/R/dplyr-funcs-datetime.R Outdated Show resolved Hide resolved
r/tests/testthat/test-dplyr-funcs-datetime.R Outdated Show resolved Hide resolved
r/tests/testthat/test-dplyr-funcs-datetime.R Outdated Show resolved Hide resolved
r/tests/testthat/test-dplyr-funcs-datetime.R Outdated Show resolved Hide resolved
r/tests/testthat/test-dplyr-funcs-datetime.R Show resolved Hide resolved
r/tests/testthat/test-dplyr-funcs-datetime.R Outdated Show resolved Hide resolved
@dragosmg dragosmg requested a review from jonkeane March 21, 2022 10:09
@dragosmg dragosmg changed the title ARROW-15098 [R] Add binding for lubridate::duration() and/or as.difftime() ARROW-15098 [R] Add binding for lubridate::duration() and/or as.difftime() Mar 21, 2022
Copy link
Member

@jonkeane jonkeane left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this! One small clean up + a comment about follow on jiras

Comment on lines 253 to 262
if (!(inherits(time1, "Expression") &&
time1$type_id() %in% Type[c("TIMESTAMP", "DATE32", "DATE64")])) {
time1 <- build_expr("cast", time1, options = cast_options(to_type = timestamp(timezone = "UTC")))
}

if (!(inherits(time2, "Expression") &&
time2$type_id() %in% Type[c("TIMESTAMP", "DATE32", "DATE64")])) {
time2 <- build_expr("cast", time2, options = cast_options(to_type = timestamp(timezone = "UTC")))
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we might have talked about this (though it might have been in a different PR and not this one — I've looked through the comments and haven't found it).

Could we use !call_binding("is.instant", time1) in the if conditions here? The expression bits are the same, and build_expr should take care of the converting the R types to Arrow types.

Or do we need to do something totally different if we get R objects (e.g. always skip to line 263 below)? If that's the case, maybe we should do something like if (!inherits(time1, "Expression") && !call.binding(is.instant, time1)) { to make that a bit clearer?

Copy link
Contributor Author

@dragosmg dragosmg Mar 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the final line in that binding (row 264):

build_expr("cast", time1 - time2, options = cast_options(to_type = duration("s")))

if we use !call_binging("is.instant", time1) it means that when time1 is a POSIXct object it will bypass the cast to timestamp and jump straight to building the subtraction -> casting expression. We get an error because of the subtraction operation -.

via_table <- rlang::eval_tidy(expr, rlang::new_data_mask(rlang::env(.input = arrow_table(tbl))))` threw an unexpected warning.
Message: Incompatible methods ("-.POSIXt", "Ops.Expression") for "-"
Class:   simpleWarning/warning/condition

I guess we could build a subtraction expression, but that feels more complicated to me - too big a trade-off for using is.instant -> my decision to only use a part of the is.instant binding.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For what it's worth the behavior you've run into here should probably have a jira + be cleaned up. But I'll leave it up to you as to how you want to work around that here (with is.instant + other catches or the approach you have already)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

on a second thought, the "is.instant" binding + "subtract_checked" Expression isn't that clunky and a bit cleaner, so we could go with that:

    if (!call_binding("is.instant", time1)) {
      time1 <- build_expr("cast", time1, options = cast_options(to_type = timestamp(timezone = "UTC")))
    }

    if (!call_binding("is.instant", time2)) {
      time2 <- build_expr("cast", time2, options = cast_options(to_type = timestamp(timezone = "UTC")))
    }

    # we need to do this instead of `time1 - time2` to prevent complaints when
    # we try to subtract an R object from an Expression
    subtract_output <- build_expr("subtract_checked", time1, time2)
    build_expr("cast", subtract_output, options = cast_options(to_type = duration("s")))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

build_expr("-", time1, time2) works too

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jonkeane what behaviour were you thinking of? - complaining when having to figure out how to subtract an R object from an Expression?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the behaviour that seemd off:

via_table <- rlang::eval_tidy(expr, rlang::new_data_mask(rlang::env(.input = arrow_table(tbl))))` threw an unexpected warning.
Message: Incompatible methods ("-.POSIXt", "Ops.Expression") for "-"
Class:   simpleWarning/warning/condition

But maybe that's actually a result of setting up the expressions wrong. You should check the tests + code for how we define these generics and see if one is missing such that someone might run into this or if it's that the code that produced that error was an error in the PR instead and that's a setup that would never happen for a user.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do think that is a bit PR specific, as it's hard to envisage a situation in which a user will be faced with subtracting an R object from an Expression.

r/tests/testthat/test-dplyr-funcs-datetime.R Outdated Show resolved Hide resolved
@dragosmg dragosmg force-pushed the difftime branch 2 times, most recently from 81dde4a to 0e0445c Compare March 23, 2022 09:53
Copy link
Member

@jonkeane jonkeane left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for sticking through this. It looks good, I've made a few suggestions that I think can be accepted + run the tests and confirm that that looks good.

The three changes all together were intended to remove the mid-function returns() + de duplicate the duration cast to seconds at the end.

Thoughts?

r/R/dplyr-funcs-datetime.R Outdated Show resolved Hide resolved
r/R/dplyr-funcs-datetime.R Outdated Show resolved Hide resolved
@dragosmg
Copy link
Contributor Author

@jonkeane when you get the chance, would you mind having another look?

@jonkeane jonkeane closed this in e83ef42 Mar 24, 2022
@ursabot
Copy link

ursabot commented Mar 24, 2022

Benchmark runs are scheduled for baseline = a17137f and contender = e83ef42. e83ef42 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Failed ⬇️0.0% ⬆️0.0%] ec2-t3-xlarge-us-east-2
[Finished ⬇️0.17% ⬆️0.04%] test-mac-arm
[Failed ⬇️0.0% ⬆️0.0%] ursa-i9-9960x
[Finished ⬇️0.04% ⬆️0.04%] ursa-thinkcentre-m75q
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants