Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[R] add max_partitions argument to write_dataset() #28117

Closed
asfimport opened this issue Apr 9, 2021 · 3 comments
Closed

[R] add max_partitions argument to write_dataset() #28117

asfimport opened this issue Apr 9, 2021 · 3 comments

Comments

@asfimport
Copy link
Collaborator

asfimport commented Apr 9, 2021

the Python docs show that we can pass, say, 1025 partitions
https://arrow.apache.org/docs/_modules/pyarrow/dataset.html

but in R this argument doesn't exist, it would be good to add this for arrow v4.0.0

this is useful, for example, with intl trade datasets:

# d = UN COMTRADE - World's bilateral flows 2019
# 13,050,535 x 22 data.frame
d %>%
          group_by(Year, `Reporter ISO`, `Partner ISO`) %>%
          write_dataset("parquet", hive_style = F)

Error: Invalid: Fragment would be written into 12808 partitions. This exceeds the maximum of 1024

Reporter: Mauricio 'Pachá' Vargas Sepúlveda / @pachadotdev

Related issues:

PRs and other links:

Note: This issue was originally created as ARROW-12315. Please see the migration documentation for further details.

@asfimport
Copy link
Collaborator Author

Mauricio 'Pachá' Vargas Sepúlveda / @pachadotdev:
related to ARROW-12373, the PR for this ticket adds a verification so that instead of converting values of -n, ..., -3, -2, -1 max partitions to 18,446,744,073,709,551,613, it returns an error message about feasibility.

@asfimport
Copy link
Collaborator Author

Matt Matolcsi:
Hello, I am running into this issue with Arrow 6.0.0.9000, would it be possible to implement the max_partitions argument for write_dataset()?

Thanks everyone for your hard work on Arrow, it is really great to be able to use it, especially in R.

@asfimport
Copy link
Collaborator Author

Jonathan Keane / @jonkeane:
Issue resolved by pull request 9972
#9972

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant