Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARROW-11704: [R] Wire up dplyr::mutate() for datasets #9586

Closed

Conversation

nealrichardson
Copy link
Member

@nealrichardson nealrichardson commented Feb 26, 2021

No description provided.

@github-actions
Copy link

@nealrichardson nealrichardson marked this pull request as ready for review March 4, 2021 23:55
@nealrichardson
Copy link
Member Author

I'll add a couple more tests, particularly around error handling, but I'd like to move on to other issues and get this merged so that others can test it out more widely. cc @ianmcook @jonkeane

r/NEWS.md Outdated Show resolved Hide resolved
Co-authored-by: Ian Cook <ianmcook@gmail.com>
@ianmcook
Copy link
Member

ianmcook commented Mar 5, 2021

Looks like mutate() on datasets errors when expressions are literals:

ds %>% transmute(x=42) %>% head(1)
## Error in dataset___ScannerBuilder__ProjectExprs(self, cols, names(cols)) : 
##  Invalid R object for std::__1::shared_ptr<arrow::dataset::Expression>, must be an ArrowObject

ds %>% transmute(x="foo") %>% head(1)
## Error in dataset___ScannerBuilder__ProjectExprs(self, cols, names(cols)) : 
##  Invalid R object for std::__1::shared_ptr<arrow::dataset::Expression>, must be an ArrowObject

@nealrichardson
Copy link
Member Author

Thanks @ianmcook, fixed in 3ac4086, PTAL.

@ianmcook
Copy link
Member

ianmcook commented Mar 5, 2021

Thanks @ianmcook, fixed in 3ac4086, PTAL.

Works fine now 👍 Thanks!

@ianmcook
Copy link
Member

ianmcook commented Mar 5, 2021

transmute() with no arguments should return no columns. Currently it returns all columns. This is true for Tables and RecordBatches too.

r/R/dplyr.R Show resolved Hide resolved
@ianmcook
Copy link
Member

ianmcook commented Mar 5, 2021

So now we have scalar recycling in mutate() for Datasets but not yet for Tables and RecordBatches, correct?

@nealrichardson
Copy link
Member Author

So now we have scalar recycling in mutate() for Datasets but not yet for Tables and RecordBatches, correct?

Correct, that's ARROW-11705

GeorgeAp pushed a commit to sirensolutions/arrow that referenced this pull request Jun 7, 2021
Closes apache#9586 from nealrichardson/r-dataset-projection

Authored-by: Neal Richardson <neal.p.richardson@gmail.com>
Signed-off-by: Neal Richardson <neal.p.richardson@gmail.com>
michalursa pushed a commit to michalursa/arrow that referenced this pull request Jun 13, 2021
Closes apache#9586 from nealrichardson/r-dataset-projection

Authored-by: Neal Richardson <neal.p.richardson@gmail.com>
Signed-off-by: Neal Richardson <neal.p.richardson@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants