Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[R] Unable to read/write parquet files – regression of 15.0.1 CRAN binary for OSX #41050

Closed
torfason opened this issue Apr 6, 2024 · 5 comments

Comments

@torfason
Copy link

torfason commented Apr 6, 2024

Describe the bug, including details regarding any error messages, version, and platform.

After upgrading to arrow 15.0.1 for R, I was unable to read or write parquet files, which in my case was quite the fundamental feature for using the package, because they are so amazing for big datasets :-)

Downgrading to 14.0.0.2 fixes this for me.

There are other issues filed about compilation problems, but this has to do with the CRAN release specifically, and so seems separate although it may be somewhat related. Following are reprexes for each version (based on manually installing the binaries I had downloaded from CRAN for each version.

15.0.1

dir <- ".../arrow_releases"
#install.packages(file.path(dir, "arrow_14.0.0.2.tgz"), repos = NULL)
install.packages(file.path(dir, "arrow_15.0.1.tgz"), repos = NULL)
#> Installing package into '.../R/x86_64/4.3/library'
#> (as 'lib' is unspecified)

library(arrow, warn.conflicts = FALSE)
#> Some features are not enabled in this build of Arrow. Run `arrow_info()` for more information.
write_parquet(cars, "temp.parquet")
#> Error in parquet___WriterProperties___Builder__create(): Cannot call parquet___WriterProperties___Builder__create(). See https://arrow.apache.org/docs/r/articles/install.html for help installing Arrow C++ libraries.
read_parquet("temp.parquet") |> tibble::as_tibble() |> print()
#> Error in parquet___arrow___ArrowReaderProperties__Make(isTRUE(use_threads)): Cannot call parquet___arrow___ArrowReaderProperties__Make(). See https://arrow.apache.org/docs/r/articles/install.html for help installing Arrow C++ libraries.

Created on 2024-04-06 with reprex v2.1.0

14.0.0.2

dir <- ".../arrow_releases"
install.packages(file.path(dir, "arrow_14.0.0.2.tgz"), repos = NULL)
#> Installing package into '.../R/x86_64/4.3/library'
#> (as 'lib' is unspecified)
#install.packages(file.path(dir, "arrow_15.0.1.tgz"), repos = NULL)

library(arrow, warn.conflicts = FALSE)
write_parquet(cars, "temp.parquet")
read_parquet("temp.parquet") |> tibble::as_tibble() |> print()
#> # A tibble: 50 × 2
#>    speed  dist
#>    <dbl> <dbl>
#>  1     4     2
#>  2     4    10
#>  3     7     4
#>  4     7    22
#>  5     8    16
#>  6     9    10
#>  7    10    18
#>  8    10    26
#>  9    10    34
#> 10    11    17
#> # ℹ 40 more rows

Created on 2024-04-06 with reprex v2.1.0

Component(s)

R

@amoeba
Copy link
Member

amoeba commented Apr 6, 2024

Hi @torfason, the macOS CRAN binary for 15.0.1 unfortunately got built by CRAN without the usual features (including w/o Parquet). Until we get a fixed version up, you can install a full-featured version from r-universe,

install.packages("arrow", repos = c("https://apache.r-universe.dev"))

Let me know if that works for your case.

@torfason
Copy link
Author

torfason commented Apr 6, 2024

Thanks for the quick response. Yes, installing from r-universe according to your directions worked like a charm and everything seems to work with 15.0.1. The Parquet files do differ from the ones generated with the last version, but they seem to read well either way so I guess that change represents some improvement rather than an issue.

@kou kou changed the title Unable to read/write parquet files – regression of 15.0.1 CRAN binary for OSX [R] Unable to read/write parquet files – regression of 15.0.1 CRAN binary for OSX Apr 6, 2024
@amoeba
Copy link
Member

amoeba commented Apr 6, 2024

Differences in things like checksums and size are expected from version-to-version due to things like metadata and enabled compression routines but if you see any differences in the data itself (values, types, dimensions) between arrow R versions please do file an issue.

@amoeba amoeba closed this as completed Apr 6, 2024
@barcadad
Copy link

Thank you, I had the same issue and this fixed it.

@srweintraub
Copy link

srweintraub commented May 6, 2024

Hello. I found this post since I seem to be having the same trouble using the arrow package on a mac after updating to 15.0.1. Unfortunately the suggestion above to install from r-universe did not fix it for me. Any other suggestions?

EDIT: After manually downgrading to arrow 13.0.0 (I had trouble installing anything higher), the package works again as exected.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants