New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARROW-15098 [R] Add binding for lubridate::duration()
and/or as.difftime()
#12506
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a few thoughts! Feel free to ignore any of these as I haven't been following the latest bindings PRs.
I feel this is a trade-off between:
Not sure where to go. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor point on testing C++ error message, otherwise LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One more comment on the tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good — sorry for the relatively large number of comments here, I think I've finally started seeing some things for the first time now that a lot of the if/elses with tz got out of the way.
Mostly comments about messages + default args. Though you might want to look into time32()
for as.difftime()
.
lubridate::duration()
and/or as.difftime()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this! One small clean up + a comment about follow on jiras
r/R/dplyr-funcs-datetime.R
Outdated
if (!(inherits(time1, "Expression") && | ||
time1$type_id() %in% Type[c("TIMESTAMP", "DATE32", "DATE64")])) { | ||
time1 <- build_expr("cast", time1, options = cast_options(to_type = timestamp(timezone = "UTC"))) | ||
} | ||
|
||
if (!(inherits(time2, "Expression") && | ||
time2$type_id() %in% Type[c("TIMESTAMP", "DATE32", "DATE64")])) { | ||
time2 <- build_expr("cast", time2, options = cast_options(to_type = timestamp(timezone = "UTC"))) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we might have talked about this (though it might have been in a different PR and not this one — I've looked through the comments and haven't found it).
Could we use !call_binding("is.instant", time1)
in the if
conditions here? The expression bits are the same, and build_expr
should take care of the converting the R types to Arrow types.
Or do we need to do something totally different if we get R objects (e.g. always skip to line 263 below)? If that's the case, maybe we should do something like if (!inherits(time1, "Expression") && !call.binding(is.instant, time1)) {
to make that a bit clearer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the final line in that binding (row 264):
build_expr("cast", time1 - time2, options = cast_options(to_type = duration("s")))
if we use !call_binging("is.instant", time1)
it means that when time1
is a POSIXct object it will bypass the cast to timestamp and jump straight to building the subtraction -> casting expression. We get an error because of the subtraction operation -
.
via_table <- rlang::eval_tidy(expr, rlang::new_data_mask(rlang::env(.input = arrow_table(tbl))))` threw an unexpected warning.
Message: Incompatible methods ("-.POSIXt", "Ops.Expression") for "-"
Class: simpleWarning/warning/condition
I guess we could build a subtraction expression, but that feels more complicated to me - too big a trade-off for using is.instant
-> my decision to only use a part of the is.instant
binding.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For what it's worth the behavior you've run into here should probably have a jira + be cleaned up. But I'll leave it up to you as to how you want to work around that here (with is.instant
+ other catches or the approach you have already)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
on a second thought, the "is.instant"
binding + "subtract_checked"
Expression isn't that clunky and a bit cleaner, so we could go with that:
if (!call_binding("is.instant", time1)) {
time1 <- build_expr("cast", time1, options = cast_options(to_type = timestamp(timezone = "UTC")))
}
if (!call_binding("is.instant", time2)) {
time2 <- build_expr("cast", time2, options = cast_options(to_type = timestamp(timezone = "UTC")))
}
# we need to do this instead of `time1 - time2` to prevent complaints when
# we try to subtract an R object from an Expression
subtract_output <- build_expr("subtract_checked", time1, time2)
build_expr("cast", subtract_output, options = cast_options(to_type = duration("s")))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
build_expr("-", time1, time2)
works too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jonkeane what behaviour were you thinking of? -
complaining when having to figure out how to subtract an R object from an Expression
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the behaviour that seemd off:
via_table <- rlang::eval_tidy(expr, rlang::new_data_mask(rlang::env(.input = arrow_table(tbl))))` threw an unexpected warning.
Message: Incompatible methods ("-.POSIXt", "Ops.Expression") for "-"
Class: simpleWarning/warning/condition
But maybe that's actually a result of setting up the expressions wrong. You should check the tests + code for how we define these generics and see if one is missing such that someone might run into this or if it's that the code that produced that error was an error in the PR instead and that's a setup that would never happen for a user.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do think that is a bit PR specific, as it's hard to envisage a situation in which a user will be faced with subtracting an R object from an Expression.
81dde4a
to
0e0445c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for sticking through this. It looks good, I've made a few suggestions that I think can be accepted + run the tests and confirm that that looks good.
The three changes all together were intended to remove the mid-function returns()
+ de duplicate the duration cast to seconds at the end.
Thoughts?
… since we now return an `hms::difftime` object
@jonkeane when you get the chance, would you mind having another look? |
Benchmark runs are scheduled for baseline = a17137f and contender = e83ef42. e83ef42 is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
This PR adds bindings for
base::difftime()
andbase::as.difftime()
.