Skip to content

Conversation

@nealrichardson
Copy link
Member

This PR also sets the ARROW_R_WITH_PARQUET feature flag to on for all installation methods.

A future patch should add write_parquet() and align these functions with read/write_feather() and any other file reader/writers.

@xhochy
Copy link
Member

xhochy commented May 23, 2019

@nealrichardson Can you use the same parquet files we also use for C++? There is a git submodule that brings them all in.

@nealrichardson
Copy link
Member Author

@xhochy I don't think that's necessary. Here we just want a small file we can use for an end-to-end test and to use in documentation examples. (FWIW, I copied this file from the pyarrow test suite.) The purpose is not to add R integration tests against every funky Parquet file out there--I'll trust that those are covered in the C++ tests, and if parquet-cpp can make an Arrow Table out of a Parquet file, then the R package can handle it because it can work with Arrow Tables.

@wesm
Copy link
Member

wesm commented May 23, 2019

Hm, we're really trying not to check binary files into the codebase. In the event that a binary file is needed (e.g. to exercise some scenario that we are unable to produce dynamically) then we use either the arrow-testing or parquet-testing repos

The best scenario is to write a new file as part of the unit test

@nealrichardson
Copy link
Member Author

I agree that we should generally avoid adding binary files, but this one is only 4k and already exists (along with others) in https://github.com/apache/arrow/tree/master/python/pyarrow/tests/data/parquet, so it didn't seem like a big deal. write_parquet() doesn't yet exist in R, so I can't write a file and then prove that we can read it back in.

The other reason I included it is for use in an example in the docs. R package documentation is expected to include executable examples (in fact, CRAN will reject packages that have no examples, which we'd probably hear about if we could get past the Linux build issue), and it is customary to include example datasets in packages. I guess the example could write a Parquet file (once that's supported) and read it back, but that's not a very clean example.

@wesm
Copy link
Member

wesm commented May 23, 2019

I think it's OK in this particular case but we should avoid adding many more files

@nealrichardson
Copy link
Member Author

Roger that. Thanks for the dispensation.

Copy link
Member

@wesm wesm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants