ARROW-5393: [R] Add tests and example for read_parquet() #4371

nealrichardson · 2019-05-22T14:32:18Z

This PR also sets the ARROW_R_WITH_PARQUET feature flag to on for all installation methods.

A future patch should add write_parquet() and align these functions with read/write_feather() and any other file reader/writers.

…parquet support is built

xhochy · 2019-05-23T14:39:08Z

@nealrichardson Can you use the same parquet files we also use for C++? There is a git submodule that brings them all in.

nealrichardson · 2019-05-23T15:38:36Z

@xhochy I don't think that's necessary. Here we just want a small file we can use for an end-to-end test and to use in documentation examples. (FWIW, I copied this file from the pyarrow test suite.) The purpose is not to add R integration tests against every funky Parquet file out there--I'll trust that those are covered in the C++ tests, and if parquet-cpp can make an Arrow Table out of a Parquet file, then the R package can handle it because it can work with Arrow Tables.

wesm · 2019-05-23T15:41:10Z

Hm, we're really trying not to check binary files into the codebase. In the event that a binary file is needed (e.g. to exercise some scenario that we are unable to produce dynamically) then we use either the arrow-testing or parquet-testing repos

The best scenario is to write a new file as part of the unit test

nealrichardson · 2019-05-23T15:50:12Z

I agree that we should generally avoid adding binary files, but this one is only 4k and already exists (along with others) in https://github.com/apache/arrow/tree/master/python/pyarrow/tests/data/parquet, so it didn't seem like a big deal. write_parquet() doesn't yet exist in R, so I can't write a file and then prove that we can read it back in.

The other reason I included it is for use in an example in the docs. R package documentation is expected to include executable examples (in fact, CRAN will reject packages that have no examples, which we'd probably hear about if we could get past the Linux build issue), and it is customary to include example datasets in packages. I guess the example could write a Parquet file (once that's supported) and read it back, but that's not a very clean example.

wesm · 2019-05-23T15:53:08Z

I think it's OK in this particular case but we should avoid adding many more files

nealrichardson · 2019-05-23T15:54:20Z

Roger that. Thanks for the dispensation.

wesm

+1

nealrichardson added 3 commits May 20, 2019 18:20

Add tests for read_parquet(); revise configure script to ensure that …

771be5a

…parquet support is built

Move example parquet file to inst/ for use in documentation

32432cc

Add boilerplate

1c6293b

wesm approved these changes May 30, 2019

View reviewed changes

wesm closed this in 64f2cc7 May 30, 2019

nealrichardson deleted the r-parquet-tests branch May 30, 2019 17:50

nealrichardson mentioned this pull request Jun 12, 2019

ARROW-5509: [R] Add basic write_parquet #4492

Closed

asfimport mentioned this pull request May 30, 2019

[R] Add tests and example for read_parquet() #21850

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ARROW-5393: [R] Add tests and example for read_parquet() #4371

ARROW-5393: [R] Add tests and example for read_parquet() #4371

Uh oh!

nealrichardson commented May 22, 2019

Uh oh!

xhochy commented May 23, 2019

Uh oh!

nealrichardson commented May 23, 2019

Uh oh!

wesm commented May 23, 2019

Uh oh!

nealrichardson commented May 23, 2019

Uh oh!

wesm commented May 23, 2019

Uh oh!

nealrichardson commented May 23, 2019

Uh oh!

wesm left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ARROW-5393: [R] Add tests and example for read_parquet() #4371

ARROW-5393: [R] Add tests and example for read_parquet() #4371

Uh oh!

Conversation

nealrichardson commented May 22, 2019

Uh oh!

xhochy commented May 23, 2019

Uh oh!

nealrichardson commented May 23, 2019

Uh oh!

wesm commented May 23, 2019

Uh oh!

nealrichardson commented May 23, 2019

Uh oh!

wesm commented May 23, 2019

Uh oh!

nealrichardson commented May 23, 2019

Uh oh!

wesm left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants