GH-41323: [R] Redo how summarize() evaluates expressions #41223

nealrichardson · 2024-04-15T18:38:34Z

Rationale for this change

Previously, the NSE expression handling in summarize() worked differently from filter/mutate/etc. Among the implications, it would not have been possible to define bindings for other aggregation functions that can be translated into some combination of supported aggregations, such as weighted.mean().

What changes are included in this PR?

Expressions in summarize() can now be evaluated with "regular" arrow_eval(). Aggregation bindings stick the contents of the aggregation data they previously returned into an ..aggregations list that lives in an enclosing environment, and then return a FieldRef pointing to that. This makes the code in e.g. summarize_eval() a little harder to follow, since it's grabbing and pointing to objects out of its immediate scope, but I've tried to comment thoroughly and am happy to add more.
arrow_eval() inspects the expression it receives for any functions that are not in the NSE mask and not in some other package's namespace (i.e. hopefully just user functions) and inserts them into the NSE mask, setting the enclosing environment for that copy of the function to be the mask, so that if the function calls other functions that we do have bindings for, the bindings get called. This is the approach I suggested back in [R] Try to arrow_eval user-defined functions #29667 (comment), and it is what fixes [R] Try to arrow_eval user-defined functions #29667 and [R] Improve user experience when mixing R code with Arrow dplyr pipelines #40938.

Are these changes tested?

Existing tests, which are pretty comprehensive, pass. But it would be good to try to be more evil in manual testing with the user-defined R function support.

Are there any user-facing changes?

Yes.

GitHub Issue: [R] Redo how summarize() evaluates #41323

jonkeane

Thanks for this! I haven't pulled it locally (but should sometime soon). A few comments, mostly around comments

r/R/dplyr-eval.R

r/R/dplyr-summarize.R

jonkeane · 2024-04-15T21:41:05Z

r/R/dplyr-summarize.R

-  .data$aggregations <- ctx$aggregations
+  .data$aggregations <- ..aggregations


I'm curious to know more about going from the ctx object to storing these as ..aggregations. I'm not at all opposed, and think this looks more natural given some of our other machinery — but can't tell directly here if/why that's necessary

I'll explain here, and then I'll add more in comments.

summarize() is complicated because you can do a mixture of scalar operations and aggregations, but that's not how acero works. So we have to pull out the aggregations, collect them in one list (that will become an Aggregate ExecNode), and in the expressions, replace them with FieldRefs so that further operations can happen (in what will become a ProjectNode that works on the result of the Aggregate).

In "normal" arrow_eval, like in mutate(), each expression/quosure results in a single Arrow Expression. But in summarize(), it could generate one or more aggregations that go into Aggregate, and then an Expression after that. Example from the comments in do_arrow_summarize():

# For example, # summarize(mean = sum(x) / n()) # is effectively implemented as # summarize(..temp0 = sum(x), ..temp1 = n()) %>% # mutate(mean = ..temp0 / ..temp1) %>% # select(-starts_with("..temp"))

So, each aggregation binding needs to push a ..tempN aggregation onto some list somewhere and return a FieldRef so that any projections happening after evaluate "normally".

ctx was useful previously because we weren't calling arrow_eval() on the expressions, we were walking them, looking for known aggregation functions, pulling them out and inserting into the expression the ..tempN symbols, and then once it was all scalar functions left, calling arrow_eval(). But that doesn't allow for defining a weighted.mean binding as function(x, w) sum(x * w) / sum(w) because you can't just substitute into the call to weighted.mean(x, w), you really just want to evaluate it and let the usual Arrow Expression logic work.

The challenge was: where is that "somewhere" to collect the aggregations? It needs to be somewhere where the binding functions can find it, we can't pass it in as an argument everywhere.

Generally, R looks up symbols in each parent frame of where the function is defined, like this:

> x <- 1 > f <- function() x > f() [1] 1

So you'd think that you could just have ..aggregations in this function, and when you evaluate the expressions, they would find them in the enclosing environment. But no. It's like this:

> g <- function() { + x <- 2 + f() + } > g() [1] 1

f() has its own environment and looks up symbols from its parents.

So, I put ..aggregations in this environment, and then set this environment as the parent for each of the aggregation bindings in arrow_mask(). That way, they can find it and assign into it. This seemed better than the alternatives I almost gave up and fell back to, like something global/in the package namespace.

After doing that, I realized that I could do a similar thing with user functions: copy them into the mask and set their environment to be the mask, so they'd find the other bindings there.

Does that make sense?

Aaah yes, got it. I didn't totally put together that ..aggregations wasn't at the package scope, but that's really clever. And because it's transient within the call we don't have to worry about flushing it at the end / cleaning it up / managing state, yeah?

I added/reworked the comments in this function, and otherwise did some simplification so that I hope the code is more readable too. LMK what you think.

jonkeane · 2024-04-15T21:41:14Z

r/R/dplyr-summarize.R

+      # We can tell the expression is invalid if it references fields not in
+      # the schema of the data after summarize(). Evaulating its type will
+      # throw an error if it's invalid.
+      tryCatch(..post_mutate[[post]]$type(out$.data$schema), error = function(e) {
+        msg <- paste(
+          "Expression", as_label(exprs[[post]]),
+          "is not a valid aggregation expression or is"
+        )
+        arrow_not_supported(msg)
+      })


jonkeane · 2024-04-15T21:43:08Z

r/tests/testthat/test-dplyr-across.R

@@ -279,7 +277,17 @@ test_that("purrr-style lambda functions are supported", {
  )
 })

-test_that("ARROW-14071 - function(x)-style lambda functions are not supported", {
+test_that("ARROW-14071 - user-defined R functions", {


Suggested change

test_that("ARROW-14071 - user-defined R functions", {

test_that("ARROW-14071 - R functions from a user's environment", {

Just to be super clear this isn't about UDFs

jonkeane · 2024-04-15T21:44:34Z

r/tests/testthat/test-dplyr-summarize.R

+  # We can also define functions that call supported aggregation functions
+  # and it just works
+  wtd_mean <- function(x, w) sum(x * w) / sum(w)
+  withr::local_options(list(arrow.debug = TRUE))


Nice, this is a helpful catch / test that honestly I could see being helpful when debugging this too — but makes this super clear that it's hitting this code and not some other path

r/tests/testthat/test-dplyr-summarize.R

thisisnic · 2024-04-16T16:56:55Z

This is cool. A few additional tests it might be worth chucking in to show how it works (or in the docs somewhere?) that are illustrative of a few things I wanted to check:

library(dplyr)
library(arrow)

single_transform <- function(x){
  str_remove_all(x, "[aeiou]")
}

multistep_transform <- function(x){
  y = stringr::str_replace_all(x, "B", "c")
  z = str_remove_all(y, "[aeiou]")
  z2 = str_to_upper(z)
  z2
}

multistep_transform_in_one <- function(x){
  str_to_upper(str_remove_all(stringr::str_replace_all(x, "B", "c"), "[aeiou]"))
}

tibble::tibble(x = c("Foo", "Bar", "Baz", "Qux")) %>%
  arrow_table() %>%
  mutate(y = single_transform(x)) %>%
  collect()
#> # A tibble: 4 × 2
#>   x     y    
#>   <chr> <chr>
#> 1 Foo   F    
#> 2 Bar   Br   
#> 3 Baz   Bz   
#> 4 Qux   Qx

tibble::tibble(x = c("Foo", "Bar", "Baz", "Qux")) %>%
  arrow_table() %>%
  mutate(y = multistep_transform(x)) %>%
  collect()
#> # A tibble: 4 × 2
#>   x     y    
#>   <chr> <chr>
#> 1 Foo   F    
#> 2 Bar   CR   
#> 3 Baz   CZ   
#> 4 Qux   QX

tibble::tibble(x = c("Foo", "Bar", "Baz", "Qux")) %>%
  arrow_table() %>%
  mutate(y = multistep_transform_in_one(x)) %>%
  collect()
#> # A tibble: 4 × 2
#>   x     y    
#>   <chr> <chr>
#> 1 Foo   F    
#> 2 Bar   CR   
#> 3 Baz   CZ   
#> 4 Qux   QX

thisisnic · 2024-04-18T15:59:10Z

This is what I was referring to with "not supported in arrow" @nealrichardson though I hadn't realised it already works really well when calling functions directly in terms of reporting the failed function, so things might be fine as-is, though printing out which function doesn't have bindings would be a nice-to-have.

library(arrow)
library(dplyr)

single_transform <- function(x){
  # this function does have bindings
  stringr::str_remove_all(x, "[aeiou]")
}

# succeeds
tibble::tibble(x = c("Foo", "Bar", "Baz", "Qux")) %>%
  arrow_table() %>%
  mutate(y = single_transform(x)) %>%
  collect()
#> # A tibble: 4 × 2
#>   x     y    
#>   <chr> <chr>
#> 1 Foo   F    
#> 2 Bar   Br   
#> 3 Baz   Bz   
#> 4 Qux   Qx

single_transform2 <- function(x){
  # this function doesn't have bindings
  stringr::str_to_sentence(x)
}

tibble::tibble(x = c("Foo", "Bar", "Baz", "Qux")) %>%
  arrow_table() %>%
  mutate(y = stringr::str_to_sentence(x)) %>%
  collect()
#> Warning: Expression stringr::str_to_sentence(x) not supported in Arrow; pulling
#> data into R
#> # A tibble: 4 × 2
#>   x     y    
#>   <chr> <chr>
#> 1 Foo   Foo  
#> 2 Bar   Bar  
#> 3 Baz   Baz  
#> 4 Qux   Qux

tibble::tibble(x = c("Foo", "Bar", "Baz", "Qux")) %>%
  arrow_table() %>%
  mutate(y = single_transform2(x)) %>%
  collect()
#> Warning: Expression single_transform2(x) not supported in Arrow; pulling data
#> into R
#> # A tibble: 4 × 2
#>   x     y    
#>   <chr> <chr>
#> 1 Foo   Foo  
#> 2 Bar   Bar  
#> 3 Baz   Baz  
#> 4 Qux   Qux

github-actions · 2024-04-21T16:07:18Z

⚠️ GitHub issue #41323 has been automatically assigned in GitHub to PR creator.

nealrichardson · 2024-04-22T13:48:00Z

This is ready to go AFAIK @jonkeane @thisisnic, LMK if you have any more feedback.

thisisnic

Looks good though probably worth addressing this comment in a follow-up ticket perhaps?

nealrichardson · 2024-04-22T20:02:40Z

Looks good though probably worth addressing this comment in a follow-up ticket perhaps?

I don't think I can do any better than what it does now. With single_transform2(x) failing, there's not anything in that expression that would give away what is not supported, you have to step through the function, and it could be doing anything so IDK that looking for unsupported functions named within that would do much. And unfortunately the error that stringr::str_to_sentence raises isn't obvious either: if you set options(arrow.debug=TRUE), the warning shown includes the actual error, saying: "Warning: In single_transform2(x), cannot coerce type 'environment' to vector of type 'character'; pulling data into R".

conbench-apache-arrow · 2024-04-22T23:45:13Z

After merging your PR, Conbench analyzed the 7 benchmarking runs that have been run so far on merge-commit 5865e96.

There was 1 benchmark result indicating a performance regression:

Commit Run on ursa-i9-9960x at 2024-04-22 20:34:10Z
- tpch (R) with engine=arrow, format=native, language=R, memory_map=False, query_id=TPCH-01, scale_factor=1

The full Conbench report has more details. It also includes information about 5 possible false positives for unstable benchmarks that are known to sometimes produce them.

r/NEWS.md

…e#41223) ### Rationale for this change Previously, the NSE expression handling in `summarize()` worked differently from filter/mutate/etc. Among the implications, it would not have been possible to define bindings for other aggregation functions that can be translated into some combination of supported aggregations, such as `weighted.mean()`. ### What changes are included in this PR? * Expressions in `summarize()` can now be evaluated with "regular" `arrow_eval()`. Aggregation bindings stick the contents of the aggregation data they previously returned into an `..aggregations` list that lives in an enclosing environment, and then return a FieldRef pointing to that. This makes the code in e.g. `summarize_eval()` a little harder to follow, since it's grabbing and pointing to objects out of its immediate scope, but I've tried to comment thoroughly and am happy to add more. * `arrow_eval()` inspects the expression it receives for any functions that are not in the NSE mask and not in some other package's namespace (i.e. hopefully just user functions) and inserts them into the NSE mask, setting the enclosing environment for that copy of the function to be the mask, so that if the function calls other functions that we do have bindings for, the bindings get called. This is the approach I suggested back in apache#29667 (comment), and it is what fixes apache#29667 and apache#40938. ### Are these changes tested? Existing tests, which are pretty comprehensive, pass. But it would be good to try to be more evil in manual testing with the user-defined R function support. ### Are there any user-facing changes? Yes. * GitHub Issue: apache#41323

### Rationale for this change This clarifies the language added in #41223, as discussed in a post-merge review in #41223 (comment). ### What changes are included in this PR? Just a tweak to R's NEWS.md file. ### Are these changes tested? No. ### Are there any user-facing changes? No. Authored-by: Bryce Mecum <petridish@gmail.com> Signed-off-by: Bryce Mecum <petridish@gmail.com>

### Rationale for this change Since it doesn't look like Acero will be getting window functions any time soon, implement support in `mutate()` for transformations that involve aggregations, like `x - mean(x)`, via left_join. ### What changes are included in this PR? Following #41223, I realized I could reuse that evaluation path in `mutate()`. Evaluating expressions accumulates `..aggregations` and `mutate_stuff`; in summarize() we apply aggregations and then mutate on the result. If expressions in the `mutate_stuff` reference columns in the original data and not just the result of aggregations, we reject it. Here, if there are aggregations, we apply them on a copy of the query up to that point, and join the result back onto the query, then apply the mutations on that. It's not a problem for those mutate expressions to reference both columns in the original data and the results of the aggregations because both are present. There are ~three~ two caveats: * Join has non-deterministic order, so while `mutate()` doesn't generally affect row order, if this code path is activated, row order may not be stable. With datasets, it's not guaranteed anyway. * ~Acero's join seems to have a limitation currently where missing values are not joined to each other. If your join key has NA in it, and you do a left_join, your new columns will all be NA, even if there is a corresponding value in the right dataset. I made #41358 to address that, and in the meantime, I've added a workaround (b9de504) that's not awesome but has the right behavior.~ Fixed and rebased. * I believe it is possible in dplyr to get this behavior in other verbs: filter, arrange, even summarize. I've only done this for mutate. Are we ok with that? ### Are these changes tested? Yes ### Are there any user-facing changes? This works now: ``` r library(arrow) library(dplyr) mtcars |> arrow_table() |> select(cyl, mpg, hp) |> group_by(cyl) |> mutate(stdize_mpg = (mpg - mean(mpg)) / sd(mpg)) |> collect() #> # A tibble: 32 × 4 #> # Groups: cyl [3] #> cyl mpg hp stdize_mpg #> <dbl> <dbl> <dbl> <dbl> #> 1 6 21 110 0.865 #> 2 6 21 110 0.865 #> 3 4 22.8 93 -0.857 #> 4 6 21.4 110 1.14 #> 5 8 18.7 175 1.41 #> 6 6 18.1 105 -1.13 #> 7 8 14.3 245 -0.312 #> 8 4 24.4 62 -0.502 #> 9 4 22.8 95 -0.857 #> 10 6 19.2 123 -0.373 #> # ℹ 22 more rows ``` Created on 2024-04-23 with [reprex v2.1.0](https://reprex.tidyverse.org) * GitHub Issue: #29537

…e#41223) ### Rationale for this change Previously, the NSE expression handling in `summarize()` worked differently from filter/mutate/etc. Among the implications, it would not have been possible to define bindings for other aggregation functions that can be translated into some combination of supported aggregations, such as `weighted.mean()`. ### What changes are included in this PR? * Expressions in `summarize()` can now be evaluated with "regular" `arrow_eval()`. Aggregation bindings stick the contents of the aggregation data they previously returned into an `..aggregations` list that lives in an enclosing environment, and then return a FieldRef pointing to that. This makes the code in e.g. `summarize_eval()` a little harder to follow, since it's grabbing and pointing to objects out of its immediate scope, but I've tried to comment thoroughly and am happy to add more. * `arrow_eval()` inspects the expression it receives for any functions that are not in the NSE mask and not in some other package's namespace (i.e. hopefully just user functions) and inserts them into the NSE mask, setting the enclosing environment for that copy of the function to be the mask, so that if the function calls other functions that we do have bindings for, the bindings get called. This is the approach I suggested back in apache#29667 (comment), and it is what fixes apache#29667 and apache#40938. ### Are these changes tested? Existing tests, which are pretty comprehensive, pass. But it would be good to try to be more evil in manual testing with the user-defined R function support. ### Are there any user-facing changes? Yes. * GitHub Issue: apache#41323

…he#41368) ### Rationale for this change This clarifies the language added in apache#41223, as discussed in a post-merge review in apache#41223 (comment). ### What changes are included in this PR? Just a tweak to R's NEWS.md file. ### Are these changes tested? No. ### Are there any user-facing changes? No. Authored-by: Bryce Mecum <petridish@gmail.com> Signed-off-by: Bryce Mecum <petridish@gmail.com>

…he#41350) ### Rationale for this change Since it doesn't look like Acero will be getting window functions any time soon, implement support in `mutate()` for transformations that involve aggregations, like `x - mean(x)`, via left_join. ### What changes are included in this PR? Following apache#41223, I realized I could reuse that evaluation path in `mutate()`. Evaluating expressions accumulates `..aggregations` and `mutate_stuff`; in summarize() we apply aggregations and then mutate on the result. If expressions in the `mutate_stuff` reference columns in the original data and not just the result of aggregations, we reject it. Here, if there are aggregations, we apply them on a copy of the query up to that point, and join the result back onto the query, then apply the mutations on that. It's not a problem for those mutate expressions to reference both columns in the original data and the results of the aggregations because both are present. There are ~three~ two caveats: * Join has non-deterministic order, so while `mutate()` doesn't generally affect row order, if this code path is activated, row order may not be stable. With datasets, it's not guaranteed anyway. * ~Acero's join seems to have a limitation currently where missing values are not joined to each other. If your join key has NA in it, and you do a left_join, your new columns will all be NA, even if there is a corresponding value in the right dataset. I made apache#41358 to address that, and in the meantime, I've added a workaround (apache@b9de504) that's not awesome but has the right behavior.~ Fixed and rebased. * I believe it is possible in dplyr to get this behavior in other verbs: filter, arrange, even summarize. I've only done this for mutate. Are we ok with that? ### Are these changes tested? Yes ### Are there any user-facing changes? This works now: ``` r library(arrow) library(dplyr) mtcars |> arrow_table() |> select(cyl, mpg, hp) |> group_by(cyl) |> mutate(stdize_mpg = (mpg - mean(mpg)) / sd(mpg)) |> collect() #> # A tibble: 32 × 4 #> # Groups: cyl [3] #> cyl mpg hp stdize_mpg #> <dbl> <dbl> <dbl> <dbl> #> 1 6 21 110 0.865 #> 2 6 21 110 0.865 #> 3 4 22.8 93 -0.857 #> 4 6 21.4 110 1.14 #> 5 8 18.7 175 1.41 #> 6 6 18.1 105 -1.13 #> 7 8 14.3 245 -0.312 #> 8 4 24.4 62 -0.502 #> 9 4 22.8 95 -0.857 #> 10 6 19.2 123 -0.373 #> # ℹ 22 more rows ``` Created on 2024-04-23 with [reprex v2.1.0](https://reprex.tidyverse.org) * GitHub Issue: apache#29537

…e#41223) ### Rationale for this change Previously, the NSE expression handling in `summarize()` worked differently from filter/mutate/etc. Among the implications, it would not have been possible to define bindings for other aggregation functions that can be translated into some combination of supported aggregations, such as `weighted.mean()`. ### What changes are included in this PR? * Expressions in `summarize()` can now be evaluated with "regular" `arrow_eval()`. Aggregation bindings stick the contents of the aggregation data they previously returned into an `..aggregations` list that lives in an enclosing environment, and then return a FieldRef pointing to that. This makes the code in e.g. `summarize_eval()` a little harder to follow, since it's grabbing and pointing to objects out of its immediate scope, but I've tried to comment thoroughly and am happy to add more. * `arrow_eval()` inspects the expression it receives for any functions that are not in the NSE mask and not in some other package's namespace (i.e. hopefully just user functions) and inserts them into the NSE mask, setting the enclosing environment for that copy of the function to be the mask, so that if the function calls other functions that we do have bindings for, the bindings get called. This is the approach I suggested back in apache#29667 (comment), and it is what fixes apache#29667 and apache#40938. ### Are these changes tested? Existing tests, which are pretty comprehensive, pass. But it would be good to try to be more evil in manual testing with the user-defined R function support. ### Are there any user-facing changes? Yes. * GitHub Issue: apache#41323

…he#41350) ### Rationale for this change Since it doesn't look like Acero will be getting window functions any time soon, implement support in `mutate()` for transformations that involve aggregations, like `x - mean(x)`, via left_join. ### What changes are included in this PR? Following apache#41223, I realized I could reuse that evaluation path in `mutate()`. Evaluating expressions accumulates `..aggregations` and `mutate_stuff`; in summarize() we apply aggregations and then mutate on the result. If expressions in the `mutate_stuff` reference columns in the original data and not just the result of aggregations, we reject it. Here, if there are aggregations, we apply them on a copy of the query up to that point, and join the result back onto the query, then apply the mutations on that. It's not a problem for those mutate expressions to reference both columns in the original data and the results of the aggregations because both are present. There are ~three~ two caveats: * Join has non-deterministic order, so while `mutate()` doesn't generally affect row order, if this code path is activated, row order may not be stable. With datasets, it's not guaranteed anyway. * ~Acero's join seems to have a limitation currently where missing values are not joined to each other. If your join key has NA in it, and you do a left_join, your new columns will all be NA, even if there is a corresponding value in the right dataset. I made apache#41358 to address that, and in the meantime, I've added a workaround (apache@b9de504) that's not awesome but has the right behavior.~ Fixed and rebased. * I believe it is possible in dplyr to get this behavior in other verbs: filter, arrange, even summarize. I've only done this for mutate. Are we ok with that? ### Are these changes tested? Yes ### Are there any user-facing changes? This works now: ``` r library(arrow) library(dplyr) mtcars |> arrow_table() |> select(cyl, mpg, hp) |> group_by(cyl) |> mutate(stdize_mpg = (mpg - mean(mpg)) / sd(mpg)) |> collect() #> # A tibble: 32 × 4 #> # Groups: cyl [3] #> cyl mpg hp stdize_mpg #> <dbl> <dbl> <dbl> <dbl> #> 1 6 21 110 0.865 #> 2 6 21 110 0.865 #> 3 4 22.8 93 -0.857 #> 4 6 21.4 110 1.14 #> 5 8 18.7 175 1.41 #> 6 6 18.1 105 -1.13 #> 7 8 14.3 245 -0.312 #> 8 4 24.4 62 -0.502 #> 9 4 22.8 95 -0.857 #> 10 6 19.2 123 -0.373 #> # ℹ 22 more rows ``` Created on 2024-04-23 with [reprex v2.1.0](https://reprex.tidyverse.org) * GitHub Issue: apache#29537

github-actions bot added Component: R awaiting review Awaiting review labels Apr 15, 2024

jonkeane approved these changes Apr 15, 2024

View reviewed changes

nealrichardson added 9 commits April 21, 2024 11:53

[R] Rework summarize() to use normal(ish) arrow_eval

8ead1ea

Fix segfault on nested query

1cecb11

More comment

4111f53

ARROW-14071: [R] Try to arrow_eval user-defined functions

5e5725e

Cleanup

74357ab

Simplify summarize_eval

335217b

More/better comments

01010d9

Add lots of comments, a few tests, and fix some edge cases they exposed

678c3f6

Refactor for readability

7312933

github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Apr 21, 2024

Add NEWS

b59cda7

nealrichardson force-pushed the redo-summarize-eval branch from 2b44bd1 to b59cda7 Compare April 21, 2024 16:02

github-actions bot added the Component: Documentation label Apr 21, 2024

nealrichardson mentioned this pull request Apr 21, 2024

[R] Redo how summarize() evaluates #41323

Closed

nealrichardson changed the title ~~[R] Redo how summarize() evaluates expressions~~ GH-41323: [R] Redo how summarize() evaluates expressions Apr 21, 2024

nealrichardson linked an issue Apr 21, 2024 that may be closed by this pull request

[R] Redo how summarize() evaluates #41323

Closed

nealrichardson mentioned this pull request Apr 21, 2024

[R] Add binding for stats::weighted.mean() #41324

Open

apache deleted a comment from github-actions bot Apr 21, 2024

Add another test and clean up comments

c0a599c

nealrichardson marked this pull request as ready for review April 22, 2024 13:46

nealrichardson requested review from paleolimbot and thisisnic as code owners April 22, 2024 13:46

thisisnic approved these changes Apr 22, 2024

View reviewed changes

github-actions bot added awaiting merge Awaiting merge and removed awaiting change review Awaiting change review labels Apr 22, 2024

nealrichardson merged commit 5865e96 into apache:main Apr 22, 2024
12 checks passed

nealrichardson deleted the redo-summarize-eval branch April 22, 2024 20:20

This was referenced Apr 23, 2024

GH-29537: [R] Support mutate/summarize with implicit join #41350

Merged

[R] Improve user experience when mixing R code with Arrow dplyr pipelines #40938

Closed

amoeba reviewed Apr 24, 2024

View reviewed changes

r/NEWS.md Show resolved Hide resolved

nealrichardson mentioned this pull request Apr 24, 2024

[R] Functionality for evaluating function in its binding environment #39446

Closed

amoeba mentioned this pull request Apr 24, 2024

MINOR: [R] Update language in NEWS.md related to GH-41223 #41368

Merged

nealrichardson mentioned this pull request May 3, 2024

[R] Simplify arrow_eval() logic and bindings environments #41537

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GH-41323: [R] Redo how summarize() evaluates expressions #41223

GH-41323: [R] Redo how summarize() evaluates expressions #41223

nealrichardson commented Apr 15, 2024 •

edited

jonkeane left a comment

jonkeane Apr 15, 2024

nealrichardson Apr 16, 2024

jonkeane Apr 16, 2024

nealrichardson Apr 16, 2024

nealrichardson Apr 21, 2024

jonkeane Apr 15, 2024

jonkeane Apr 15, 2024

jonkeane Apr 15, 2024

thisisnic commented Apr 16, 2024

thisisnic commented Apr 18, 2024 •

edited

github-actions bot commented Apr 21, 2024

nealrichardson commented Apr 22, 2024

thisisnic left a comment

nealrichardson commented Apr 22, 2024

conbench-apache-arrow bot commented Apr 22, 2024

		.data$aggregations <- ctx$aggregations
		.data$aggregations <- ..aggregations

	test_that("ARROW-14071 - user-defined R functions", {
	test_that("ARROW-14071 - R functions from a user's environment", {

GH-41323: [R] Redo how summarize() evaluates expressions #41223

GH-41323: [R] Redo how summarize() evaluates expressions #41223

Conversation

nealrichardson commented Apr 15, 2024 • edited

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

jonkeane left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

thisisnic commented Apr 16, 2024

thisisnic commented Apr 18, 2024 • edited

github-actions bot commented Apr 21, 2024

nealrichardson commented Apr 22, 2024

thisisnic left a comment

Choose a reason for hiding this comment

nealrichardson commented Apr 22, 2024

conbench-apache-arrow bot commented Apr 22, 2024

nealrichardson commented Apr 15, 2024 •

edited

thisisnic commented Apr 18, 2024 •

edited