Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[R] Implementing tidyr interface #24956

Open
asfimport opened this issue May 15, 2020 · 4 comments
Open

[R] Implementing tidyr interface #24956

asfimport opened this issue May 15, 2020 · 4 comments

Comments

@asfimport
Copy link

I think it would be reasonable to implement an interface to the tidyr package. The implementation would allow to lazily process ArrowTables before put it back into the memory. However, currently you need to collect the table first before applying tidyr methods. The following code chunk shows an example routine:

library(magrittr)
arrow_table <- arrow::read_feather("table.feather", as_data_frame = FALSE) 
nested_df <-
   arrow_table %>%
   dplyr::select(ID, 4:7, Value) %>%
   dplyr::filter(Value >= 5) %>%
   dplyr::group_by(ID) %>%
   dplyr::collect() %>%
   tidyr::nest()

The main focus might be the following three methods:

  • tidyr::[un]nest(),

  • tidyr::pivot_[longer|wider](), and

  • tidyr::seperate().

    I suppose the last two can be fairly quickly implemented, but tidyr::nest() and tidyr::unnest() cannot be implement before conversion to List will be accessible.

Reporter: Dominic Dennenmoser

Note: This issue was originally created as ARROW-8813. Please see the migration documentation for further details.

@asfimport
Copy link
Author

Neal Richardson / @nealrichardson:
If you wanted to explore this, one challenge I see is that pivot_longer and pivot_wider aren't generics, so you can't just make arrow methods for them.

@asfimport
Copy link
Author

Dominic Dennenmoser:
Thanks for refering to that. I've just looked for issues or pull-requests mention anything in that direction. Fortunately, a generic version of pivot_[longer|wider]() will be available in the upcoming version of tidyr, and is already implemented into the development version (#800).

@asfimport
Copy link
Author

Nigel McKernan:
The issue [~domiden] references was committed into tidyr  1.1.0 back in May of 2020, as you can see here, more than 2 years ago.

 

Would it be possible now to incorporate some tidyr methods that have been converted to generics into {}arrow{}?

EDIT: As well, the nest() generic is now lazily-evaluated, making it easier to do remote operations, as of the tidyr 1.2.0 release earlier this year.

@eitsupi
Copy link
Contributor

eitsupi commented Feb 21, 2023

Related to #34265

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants