Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARROW-11657: [R] group_by with .drop specified errors #9510

Closed
wants to merge 3 commits into from

Conversation

nealrichardson
Copy link
Contributor

tidyverse/dplyr#5763 concisely exposes multiple issues:

  • We don't expect the .drop argument to group_by (fixed here)
  • You can apparently provide expressions to ... in group_by, which effectively do mutate() to add the columns and then group by them (detected here with a useful error; implementation of the feature deferred to ARROW-11658)
  • Our input validation in [.ArrowTabular and the Table/RecordBatch SelectColumns methods was incomplete, and where it was present was not very helpful (fixed here)

Other issues observed here and deferred:

  • Table has a proper SelectColumns method in the C++ library but the RecordBatch one is in the R library and should be pushed down to C++ (ARROW-11660)
  • The .drop argument is not actually implemented here, it is only caught if specified, and if the value given is other than the default, it errors. We should keep it around like we do the group_vars themselves (ARROW-11658), and we'll need to implement it too in the C++ query engine eventually.

@github-actions
Copy link

GeorgeAp pushed a commit to sirensolutions/arrow that referenced this pull request Jun 7, 2021
tidyverse/dplyr#5763 concisely exposes multiple issues:

* We don't expect the .drop argument to group_by (fixed here)
* You can apparently provide expressions to `...` in group_by, which effectively do `mutate()` to add the columns and then group by them (detected here with a useful error; implementation of the feature deferred to ARROW-11658)
* Our input validation in `[.ArrowTabular` and the Table/RecordBatch `SelectColumns` methods was incomplete, and where it was present was not very helpful (fixed here)

Other issues observed here and deferred:

* Table has a proper SelectColumns method in the C++ library but the RecordBatch one is in the R library and should be pushed down to C++ (ARROW-11660)
* The .drop argument is not actually implemented here, it is only caught if specified, and if the value given is other than the default, it errors. We should keep it around like we do the group_vars themselves (ARROW-11658), and we'll need to implement it too in the C++ query engine eventually.

Closes apache#9510 from nealrichardson/group-by-drop

Authored-by: Neal Richardson <neal.p.richardson@gmail.com>
Signed-off-by: Neal Richardson <neal.p.richardson@gmail.com>
michalursa pushed a commit to michalursa/arrow that referenced this pull request Jun 13, 2021
tidyverse/dplyr#5763 concisely exposes multiple issues:

* We don't expect the .drop argument to group_by (fixed here)
* You can apparently provide expressions to `...` in group_by, which effectively do `mutate()` to add the columns and then group by them (detected here with a useful error; implementation of the feature deferred to ARROW-11658)
* Our input validation in `[.ArrowTabular` and the Table/RecordBatch `SelectColumns` methods was incomplete, and where it was present was not very helpful (fixed here)

Other issues observed here and deferred:

* Table has a proper SelectColumns method in the C++ library but the RecordBatch one is in the R library and should be pushed down to C++ (ARROW-11660)
* The .drop argument is not actually implemented here, it is only caught if specified, and if the value given is other than the default, it errors. We should keep it around like we do the group_vars themselves (ARROW-11658), and we'll need to implement it too in the C++ query engine eventually.

Closes apache#9510 from nealrichardson/group-by-drop

Authored-by: Neal Richardson <neal.p.richardson@gmail.com>
Signed-off-by: Neal Richardson <neal.p.richardson@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant