-
Notifications
You must be signed in to change notification settings - Fork 4k
ARROW-5393: [R] Add tests and example for read_parquet() #4371
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@nealrichardson Can you use the same parquet files we also use for C++? There is a git submodule that brings them all in. |
|
@xhochy I don't think that's necessary. Here we just want a small file we can use for an end-to-end test and to use in documentation examples. (FWIW, I copied this file from the pyarrow test suite.) The purpose is not to add R integration tests against every funky Parquet file out there--I'll trust that those are covered in the C++ tests, and if parquet-cpp can make an Arrow Table out of a Parquet file, then the R package can handle it because it can work with Arrow Tables. |
|
Hm, we're really trying not to check binary files into the codebase. In the event that a binary file is needed (e.g. to exercise some scenario that we are unable to produce dynamically) then we use either the arrow-testing or parquet-testing repos The best scenario is to write a new file as part of the unit test |
|
I agree that we should generally avoid adding binary files, but this one is only 4k and already exists (along with others) in https://github.com/apache/arrow/tree/master/python/pyarrow/tests/data/parquet, so it didn't seem like a big deal. The other reason I included it is for use in an example in the docs. R package documentation is expected to include executable examples (in fact, CRAN will reject packages that have no examples, which we'd probably hear about if we could get past the Linux build issue), and it is customary to include example datasets in packages. I guess the example could write a Parquet file (once that's supported) and read it back, but that's not a very clean example. |
|
I think it's OK in this particular case but we should avoid adding many more files |
|
Roger that. Thanks for the dispensation. |
wesm
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
This PR also sets the
ARROW_R_WITH_PARQUETfeature flag to on for all installation methods.A future patch should add
write_parquet()and align these functions with read/write_feather() and any other file reader/writers.