-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARROW-9856: [R] Add bindings for string compute functions #9423
Conversation
@github-actions crossbow submit homebrew-r-autobrew |
Revision: 66d81b8 Submitted crossbow builds: ursacomputing/crossbow @ actions-79
|
Aside from possibly adding more tests here (arguably redundant since the functions themselves are tested in C++, but perhaps worth more assurance that they behave as an R user would expect), this is done. Other TODOs have been made into followup JIRAs. |
r/tests/testthat/test-dplyr.R
Outdated
expect_dplyr_equal( | ||
input %>% | ||
filter(dbl > 2, toupper(chr) %in% c("D", "F")) %>% | ||
collect(), | ||
tbl | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test would pass without the changes in this PR. Do you want to try to test toupper()
and tolower()
in a way that would have failed before but succeeds now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's tricky with the R string functions like this that start with
if (!is.character(x))
x <- as.character(x)
because that will work by coercing the Array data silently.
Maybe we can test that by defining as.character <- function(...) stop("🙅")
in the test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I looked into this. You can't just define as.character
in the test because you need to replace the one that toupper
calls. You can't use testthat::with_mock
to mock base functions anymore, and my other hack of using trace
to insert a crash seems to crash too many things (presumably legitimate places where as.character gets called elsewhere in the dplyr code). mockery
might allow this more targeted but I haven't had great success setting it up before; there are other ways we could instrument our code to make sure the Arrow function is called, but I'm not sure it's worth it. It definitely won't work if you called toupper
on a Dataset expression, so we get some related test coverage there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did work out a test for this, PTAL
b55bd62
to
de9ae12
Compare
d4608a9
to
356c300
Compare
de9ae12
to
9603cb2
Compare
I'm going to merge this now. We're going to be iterating on this code for a while so I don't feel compelled to make everything perfect and thorough here. |
collect(), | ||
tbl | ||
), | ||
NA) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a clever solution
Includes dependency toolchain updates (utf8proc, re2) and bindings for * "nchar", "str_length" -> "binary_length" * "tolower", "str_to_lower" -> "utf8_lower" * "toupper", "str_to_upper" -> "utf8_upper" * "str_trim" is handled a little differently since it's one function in `stringr` but three functions in arrow (utf8_ltrim_whitespace, utf8_rtrim_whitespace, utf8_trim_whitespace) Closes apache#9423 from nealrichardson/r-strings Authored-by: Neal Richardson <neal.p.richardson@gmail.com> Signed-off-by: Neal Richardson <neal.p.richardson@gmail.com>
Includes dependency toolchain updates (utf8proc, re2) and bindings for * "nchar", "str_length" -> "binary_length" * "tolower", "str_to_lower" -> "utf8_lower" * "toupper", "str_to_upper" -> "utf8_upper" * "str_trim" is handled a little differently since it's one function in `stringr` but three functions in arrow (utf8_ltrim_whitespace, utf8_rtrim_whitespace, utf8_trim_whitespace) Closes apache#9423 from nealrichardson/r-strings Authored-by: Neal Richardson <neal.p.richardson@gmail.com> Signed-off-by: Neal Richardson <neal.p.richardson@gmail.com>
Includes dependency toolchain updates (utf8proc, re2) and bindings for
stringr
but three functions in arrow (utf8_ltrim_whitespace, utf8_rtrim_whitespace, utf8_trim_whitespace)