-
-
Notifications
You must be signed in to change notification settings - Fork 12.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rPackages.arrow: workaround to reduce futurearrow-cpp
mismatches and fix build for darwin
#294933
rPackages.arrow: workaround to reduce futurearrow-cpp
mismatches and fix build for darwin
#294933
Conversation
Package compiles and works as expected on > .libPaths(.libPaths()[-1])
> arrow::arrow_info()
Arrow package version: 14.0.0.2
Capabilities:
acero TRUE
dataset TRUE
substrait TRUE
parquet TRUE
json TRUE
s3 TRUE
gcs FALSE
utf8proc TRUE
re2 TRUE
snappy TRUE
gzip TRUE
brotli TRUE
zstd TRUE
lz4 TRUE
lz4_frame TRUE
lzo FALSE
bz2 TRUE
jemalloc FALSE
mimalloc TRUE
Memory:
Allocator mimalloc
Current 0 bytes
Max 0 bytes
Runtime:
SIMD Level none
Detected SIMD Level none
Build:
C++ Library Version 14.0.0
C++ Compiler Clang
C++ Compiler Version 16.0.6
> sessionInfo()
R version 4.3.2 (2023-10-31)
Platform: aarch64-apple-darwin23.3.0 (64-bit)
Running under: macOS Sonoma 14.2.1
Matrix products: default
BLAS: /nix/store/aknvmhykghnk87i9fj4v8f8wk2s36pil-blas-3/lib/libblas.dylib
LAPACK: /nix/store/v6dhb17ps35n1b0hh07qsgwvpzqpgk64-openblas-0.3.26/lib/libopenblasp-r0.3.26.dylib; LAPACK version 3.12.0
locale:
[1] en_US.UTF-8/UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
time zone: Europe/Zurich
tzcode source: internal
attached base packages:
[1] stats graphics grDevices utils datasets methods base
loaded via a namespace (and not attached):
[1] tidyselect_1.2.0 bit_4.0.5 compiler_4.3.2 magrittr_2.0.3
[5] assertthat_0.2.1 R6_2.5.1 cli_3.6.2 glue_1.6.2
[9] bit64_4.0.5 vctrs_0.6.5 lifecycle_1.0.4 arrow_14.0.0.2
[13] rlang_1.1.2 purrr_1.0.2 |
works on > .libPaths(.libPaths()[-1])
> arrow::arrow_info()
Arrow package version: 14.0.0.2
Capabilities:
acero TRUE
dataset TRUE
substrait TRUE
parquet TRUE
json TRUE
s3 TRUE
gcs TRUE
utf8proc TRUE
re2 TRUE
snappy TRUE
gzip TRUE
brotli TRUE
zstd TRUE
lz4 TRUE
lz4_frame TRUE
lzo FALSE
bz2 TRUE
jemalloc TRUE
mimalloc TRUE
Memory:
Allocator jemalloc
Current 0 bytes
Max 0 bytes
Runtime:
SIMD Level avx512
Detected SIMD Level avx512
Build:
C++ Library Version 14.0.0
C++ Compiler GNU
C++ Compiler Version 13.2.0
> sessionInfo()
timedatectl: /lib64/libc.so.6: version `GLIBC_2.35' not found (required by /nix/store/lvzp8051nk5v3frx94jpcam7lvb9f0ci-gfortran-13.2.0-lib/lib/libgcc_s.so.1)
timedatectl: /lib64/libc.so.6: version `GLIBC_2.34' not found (required by /nix/store/lvzp8051nk5v3frx94jpcam7lvb9f0ci-gfortran-13.2.0-lib/lib/libgcc_s.so.1)
R version 4.3.2 (2023-10-31)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Rocky Linux 8.7 (Green Obsidian)
Matrix products: default
BLAS/LAPACK: /nix/store/p3fad3big11ad6b9grahhf5fb23k94f6-blas-3/lib/libblas.so.3; LAPACK version 3.12.0
locale:
[1] C
time zone: Europe/Zurich
tzcode source: system (glibc)
attached base packages:
[1] stats graphics grDevices utils datasets methods base
loaded via a namespace (and not attached):
[1] tidyselect_1.2.0 bit_4.0.5 compiler_4.3.2 magrittr_2.0.3
[5] assertthat_0.2.1 R6_2.5.1 cli_3.6.2 glue_1.6.2
[9] bit64_4.0.5 vctrs_0.6.5 lifecycle_1.0.4 arrow_14.0.0.2
[13] rlang_1.1.2 purrr_1.0.2
Warning message:
In system("timedatectl", intern = TRUE) :
running command 'timedatectl' had status 1 and error message 'Function not implemented' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All good, just one little change needed, arrow should go into buildInputs instead of nativeBuildInputs. It doesn't actually have an effect yet, but it's more future-proof in case we rethink the r-modules.
Otherwise, LGTM. write_parquet examples work too!
Arrow package version: 14.0.0.2
Capabilities:
acero TRUE
dataset TRUE
substrait TRUE
parquet TRUE
json TRUE
s3 TRUE
gcs TRUE
utf8proc TRUE
re2 TRUE
snappy TRUE
gzip TRUE
brotli TRUE
zstd TRUE
lz4 TRUE
lz4_frame TRUE
lzo FALSE
bz2 TRUE
jemalloc TRUE
mimalloc TRUE
Memory:
Allocator jemalloc
Current 0 bytes
Max 0 bytes
Runtime:
SIMD Level avx2
Detected SIMD Level avx2
Build:
C++ Library Version 14.0.0
C++ Compiler GNU
C++ Compiler Version 13.2.0
4c8f22b
to
e4f91ab
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All good, let's gooooo
The rPackages tree has been bumped (#298837) so I guess this PR should get updated to reflect arrow’s update as well. I guess {arrow} should build, since version 15 has been released and is now in sync with arrow-cpp, but probably we should adopt your approach here as it would avoid a desync sometime in the future. What do you think? |
i agree with that, i think we should keep pushing this change to avoid any version conflicts. |
@jbedo would be great if this can be merged, so that we have a robust way in future plus we also have it running on darwin. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The hash doesn't seem to match:
error: hash mismatch in fixed-output derivation '/nix/store/wgrbgg0lnagdrqaqk6368wd4imrxkhpd-apache-arrow-15.0.1.tar.gz.drv':
specified: sha256-TrDaUOwHG68V/BY8tIBYkx4AbxyGLI3vDhgP0H1TECE=
got: sha256-Vdtj7Z/WkXt6v+XUGGyfUyy+SKpT9AQNV+fCmtcLzvo=
Has the tarball changed?
Yes, rPackages.arrow v15 has been released, right? Maybe we can first test for version equality. If it's a match, use pkgs.arrow-cpp, otherwise the overridden one. But we'll always have to update the hash when arrow-cpp overtakes rPackages.arrow in nixpkgs. At least Idon't know of a mechanism to do it automatically. Ideally, arrow-cpp should update the R package too, or hold back until the R package is released on CRAN |
Exellent idea to test that the R package major version at least corresponds to the libarrow (arrow-cpp) major version. This is also what is expected when building the R package. |
3d9af17
to
6774431
Compare
9c02e9d
to
058ccee
Compare
its still v14 for rPackages.arrow, so i'd suggest to also bump that in this PR to make it efficient and get v15 directly. Before, to show that it still works with the conditional override (arrow-cpp in nixpkgs is already at v15, hence the download and build of custom arrow-cpp v14.0.0 input version is triggered for rPackages.arrow (14.0.0.2). nix-shell -I nixpkgs=. -p rPackages.arrow R
[nix-shell:~/git/nixpkgs]$ R
R version 4.3.2 (2023-10-31) -- "Eye Holes"
Copyright (C) 2023 The R Foundation for Statistical Computing
Platform: aarch64-apple-darwin23.3.0 (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
Natural language support but running in an English locale
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
[Previously saved workspace restored]
> .libPaths(.libPaths()[-1])
> arrow::arrow_info()
Arrow package version: 14.0.0.2
Capabilities:
acero TRUE
dataset TRUE
substrait TRUE
parquet TRUE
json TRUE
s3 TRUE
gcs FALSE
utf8proc TRUE
re2 TRUE
snappy TRUE
gzip TRUE
brotli TRUE
zstd TRUE
lz4 TRUE
lz4_frame TRUE
lzo FALSE
bz2 TRUE
jemalloc FALSE
mimalloc TRUE
Memory:
Allocator mimalloc
Current 0 bytes
Max 0 bytes
Runtime:
SIMD Level none
Detected SIMD Level none
Build:
C++ Library Version 14.0.0
C++ Compiler Clang
C++ Compiler Version 16.0.6 |
The package tree has been bumped and arrow 15 is now available, you're not on the latest commit I believe |
you need to rebase on latest master |
@Kupac yeah
ah yes indeed thanks, just checked here and there it was still on 14 https://search.nixos.org/packages?channel=unstable&from=0&size=50&sort=relevance&type=packages&query=rPackages.arrow (is this based on queue?( |
In situations like that, I nuke the branch, update my fork, recreate the branch with the same name and put my changes back and force push 🤣 |
e4d1df2
to
a01b4cd
Compare
Yeah. In case major(arrow-cpp) > major(rPackages.arrow), then we now need to manually update the hash. I have an idea for part of @Kupac is suggesting. We could have a trigger via a bot that opens a PR in nixpkgs, via a CI that fetches https://raw.githubusercontent.com/rstats-on-nix/nixpkgs/r-daily/pkgs/development/r-modules/cran-packages.nix and compares that to the nixpkgs upstream main branch, and also parses the version of arrow-cpp via Nixpkgs whether that one advances. Ideally, we'd have a shiny dashboard that would remind us that we need a manual update, something along what @b-rodrigues has started :-) |
arrow-cpp
mismatches and fix build for darwin
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Builds and runs successfully on arm64 linux
> arrow::arrow_info()
Arrow package version: 15.0.1
Capabilities:
acero TRUE
dataset TRUE
substrait TRUE
parquet TRUE
json TRUE
s3 TRUE
gcs TRUE
utf8proc TRUE
re2 TRUE
snappy TRUE
gzip TRUE
brotli TRUE
zstd TRUE
lz4 TRUE
lz4_frame TRUE
lzo FALSE
bz2 TRUE
jemalloc TRUE
mimalloc TRUE
Memory:
Allocator jemalloc
Current 0 bytes
Max 0 bytes
Runtime:
SIMD Level none
Detected SIMD Level none
Build:
C++ Library Version 15.0.0
C++ Compiler GNU
C++ Compiler Version 13.2.0
It's a bit dissastifying to accept that the hash is going to break as part of the automatic update in future. It might be preferable to just accept that we can't syncronise with CRAN for this package in a sensible way and do:
Then we won't have any regular breakages. |
I’m not sure I understand, what does this do? It builds the R package from the source included in the arrow-cpp project, instead of building the r package source and linking to arrow-cpp? In that case, we may end up in situations where nixpkgs will supply a more recent (still in-development version) of the R package. This may break in other ways, no? For example, arrow 15 was released on 21st of January: https://github.com/apache/arrow/releases/tag/apache-arrow-15.0.0 but some more work continued for the r package for quite some time. Wouldn’t your approach install the R packages as it was on the 21st of January? |
Where does the R development take place if not that repository? The CRAN metadata points to https://github.com/apache/arrow. |
Ah I get it. It's probably ahead of CRAN, likely synced with arrow-cpp. So indeed probably the best solution. Because it's part of the arrow-cpp repo, we dont have to check hash, correct? Minimal maintenance achieved. |
If the versions don't match (only if arrow-cpp major version > major rPackages.arrow, we will have breakages, won't we? This will terminate the build process: https://github.com/apache/arrow/blob/6aa33210ecd9773f18ceb775306b79de132b6c6f/r/configure#L191-L192 |
Yeah good idea @jbedo the releases are synchronised like at stage @b-rodrigues posted above (https://github.com/apache/arrow/tree/a61f4af724cd06c3a9b4abd20491345997e532c0) , so they are likely building. Means we never have the CRAN state but it will likely work. |
What do we do with the version of arrow here where we'll have a mismatch in source and version?: https://raw.githubusercontent.com/NixOS/nixpkgs/master/pkgs/development/r-modules/cran-packages.nix |
yes but we don't have any guarantee that the R code is in a state that will build the R package when a major release of arrow cpp is released, I believe |
it could happen, but it is rather unlikely. at release stage they seem to synchronize all downstream package like for R, python etc. apache/arrow@a61f4af#diff-71852d5b939936af45e8e9b501688fd48fff455de44fed8130896dd5ea8b89d5 pretty sure they take care it compiles for releases. |
If that's the case then what @jbedo suggests would be the best approach |
We'll have a mismatch in the versions. It reports 15.0.1 (CRAN from cran-packages.nix )in the nix path but it uses the 15.0.0 for the source. > .libPaths()
[1] "/Users/philipp/Library/R/arm64/4.3/library"
[2] "/nix/store/zkvd1icg3k4smsajc4m4bsfmgx67c68j-r-arrow-15.0.1/library"
[3] "/nix/store/hnr0v66cah81941bxzn1g3q90dn8phfk-r-assertthat-0.2.1/library"
[4] "/nix/store/1083x08shvisrak9sg28k6ga1121fw9a-r-bit64-4.0.5/library"
[5] "/nix/store/c352djcc65lgv8ym9w349nd5kb91fy5y-r-bit-4.0.5/library"
[6] "/nix/store/9yjksf1l03iwigyhyhf4x4lxammfb77n-r-cpp11-0.4.7/library"
[7] "/nix/store/krnp8xvda19g32h9zc58516a049dd00n-r-glue-1.7.0/library"
[8] "/nix/store/ywq4nq7h4is4y1sf17ah78bfxyfppwvf-r-purrr-1.0.2/library"
[9] "/nix/store/0khkc353iqsga6capbngcydbdb68z8bg-r-cli-3.6.2/library"
[10] "/nix/store/3pr8ia83blvb4i9w14ghdfgh7hfds1dc-r-lifecycle-1.0.4/library"
[11] "/nix/store/2vikwgfmmp5sqiw92zizcn6pfacnqp8z-r-rlang-1.1.3/library"
[12] "/nix/store/vd5qg62gagixxk7mf0xz2fy16gbbf6lp-r-magrittr-2.0.3/library"
[13] "/nix/store/vzkix10k1l19vfk8h7xfl0ajax9l2v1x-r-vctrs-0.6.5/library"
[14] "/nix/store/alkqhfhd0a2cx1m73zj9i43h6ab5pig5-r-R6-2.5.1/library"
[15] "/nix/store/q136l6pybrivh94xxvd0z7b9kj6azny3-r-tidyselect-1.2.1/library"
[16] "/nix/store/svazjq003nxhhxsbajcf4az2hxpl4x72-r-withr-3.0.0/library"
[17] "/nix/store/5wgl1kh8klm45pv4g55k5fw9ffdyz070-apple-framework-CoreFoundation-11.0.0/Library"
[18] "/nix/store/7dhbarll7piq26cx0pcig5sk9cd9df21-R-4.3.3/lib/R/library"
> .libPaths(.libPaths()[-1])
> arrow::arrow_info()
Arrow package version: 15.0.0
Capabilities:
acero TRUE
dataset TRUE
substrait TRUE
parquet TRUE
json TRUE
s3 TRUE
gcs FALSE
utf8proc TRUE
re2 TRUE
snappy TRUE
gzip TRUE
brotli TRUE
zstd TRUE
lz4 TRUE
lz4_frame TRUE
lzo FALSE
bz2 TRUE
jemalloc FALSE
mimalloc TRUE
Memory:
Allocator mimalloc
Current 0 bytes
Max 0 bytes
Runtime:
SIMD Level none
Detected SIMD Level none
Build:
C++ Library Version 15.0.0
C++ Compiler Clang
C++ Compiler Version 16.0.6 |
Add |
@jbedo does it justify an extension of the update process/script? because we would then need to have a specific list of packages from CRAN but with specific source handling. |
thanks :-) > arrow::arrow_info()
Arrow package version: 15.0.0
Capabilities:
acero TRUE
dataset TRUE
substrait TRUE
parquet TRUE
json TRUE
s3 TRUE
gcs FALSE
utf8proc TRUE
re2 TRUE
snappy TRUE
gzip TRUE
brotli TRUE
zstd TRUE
lz4 TRUE
lz4_frame TRUE
lzo FALSE
bz2 TRUE
jemalloc FALSE
mimalloc TRUE
Memory:
Allocator mimalloc
Current 0 bytes
Max 0 bytes
Runtime:
SIMD Level none
Detected SIMD Level none
Build:
C++ Library Version 15.0.0
C++ Compiler Clang
C++ Compiler Version 16.0.6 The main reason this is easier is the time it needs for merging PRs, when arrow-cpp increases in major release on Nixpkgs. Since this now works i agree its a preferable solution. It looks not much happened at v15 on CRAN except probably fixing build stuff for cran comments. https://github.com/apache/arrow/blob/main/r/NEWS.md https://cran.r-project.org/web/packages/arrow/news/news.html https://cran.r-project.org/web/packages/arrow/news/news.html |
9c34a3c
to
470a490
Compare
# ...
Auto-merging pkgs/development/r-modules/default.nix
Automatic merge went well; stopped before committing as requested
$ nix-env --extra-experimental-features no-url-literals --option system aarch64-darwin -f <nixpkgs> --nix-path nixpkgs=/Users/philipp/.cache/nixpkgs-review/rev-470a49066c264cd50f197b9817b7101817905205/nixpkgs nixpkgs-overlays=/tmp/tmpcqncjo00 -qaP --xml --out-path --show-trace --no-allow-import-from-derivation --meta
Nothing to be built.
$ /nix/store/97wfvkpc87y75ngqka565wp3djjm072f-nix-2.18.1/bin/nix-shell /Users/philipp/.cache/nixpkgs-review/rev-470a49066c264cd50f197b9817b7101817905205/shell.nix --nix-path nixpkgs=/Users/philipp/.cache/nixpkgs-review/rev-470a49066c264cd50f197b9817b7101817905205/nixpkgs nixpkgs-overlays=/tmp/tmpcqncjo00
$ git merge --no-commit --no-ff 470a49066c264cd50f197b9817b7101817905205
Auto-merging pkgs/development/r-modules/default.nix
Automatic merge went well; stopped before committing as requested
$ nix-env --extra-experimental-features no-url-literals --option system x86_64-linux -f /home/spectral-cockpit/.cache/nixpkgs-review/rev-470a49066c264cd50f197b9817b7101817905205/nixpkgs -qaP --xml --out-path --show-trace --no-allow-import-from-derivation --meta
Nothing to be built. |
@jbedo this should be ready like requested, it builds. We do not exactly match CRAN source but it should work well for future releases of |
@ofborg build rPackages.arrow |
Thanks @jbedo :-) ! |
Description of changes
Fix compiling
rPackages.arrow
. Current {arrow} on CRAN is now at 14.0.0.2: https://cran.r-project.org/web/packages/arrow/index.html .arrow-cpp
is a run-time dependency ofrPackages.arrow
.arrow-cpp
(libarrow) in Nixpkgs master is currently at 15.0.0, which means that the versions are not compatible when compilingrPackages.arrow
. Currently, the release cycle of the R package is out-of-sync with the current libarrow release, more details here: apache/arrow#39698 .Things done
Added cmake dependency to
nativeBuildInputs
forrPackages.arrow
. Added missinglibintl
headers for darwin. Adjusted version/source ofarrow-cpp
to match the CRANrPackages.arrow
version currently packaged in nixpkgs => as suggested by @Kupac . Additionally, the the override just happens when the major version ofrPackages.arrow
is lower thanarrow-cpp
. Only in that case, future mismatches still need a PR to update the hash.nix.conf
? (See Nix manual)sandbox = relaxed
sandbox = true
nix-shell -p nixpkgs-review --run "nixpkgs-review rev HEAD"
. Note: all changes have to be committed, also see nixpkgs-review usage./result/bin/
)Add a 👍 reaction to pull requests you find important.