Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rPackages.arrow: workaround to reduce futurearrow-cpp mismatches and fix build for darwin #294933

Merged
merged 1 commit into from
Apr 18, 2024

Conversation

philipp-baumann
Copy link
Contributor

@philipp-baumann philipp-baumann commented Mar 11, 2024

Description of changes

Fix compiling rPackages.arrow. Current {arrow} on CRAN is now at 14.0.0.2: https://cran.r-project.org/web/packages/arrow/index.html . arrow-cpp is a run-time dependency of rPackages.arrow. arrow-cpp (libarrow) in Nixpkgs master is currently at 15.0.0, which means that the versions are not compatible when compiling rPackages.arrow. Currently, the release cycle of the R package is out-of-sync with the current libarrow release, more details here: apache/arrow#39698 .

Things done

Added cmake dependency to nativeBuildInputs for rPackages.arrow. Added missing libintl headers for darwin. Adjusted version/source of arrow-cpp to match the CRAN rPackages.arrow version currently packaged in nixpkgs => as suggested by @Kupac . Additionally, the the override just happens when the major version of rPackages.arrow is lower than arrow-cpp. Only in that case, future mismatches still need a PR to update the hash.

  • Built on platform(s)
    • x86_64-linux
    • aarch64-linux
    • x86_64-darwin
    • aarch64-darwin
  • For non-Linux: Is sandboxing enabled in nix.conf? (See Nix manual)
    • sandbox = relaxed
    • sandbox = true
  • Tested, as applicable:
  • Tested compilation of all packages that depend on this change using nix-shell -p nixpkgs-review --run "nixpkgs-review rev HEAD". Note: all changes have to be committed, also see nixpkgs-review usage
  • Tested basic functionality of all binary files (usually in ./result/bin/)
  • 24.05 Release Notes (or backporting 23.05 and 23.11 Release notes)
    • (Package updates) Added a release notes entry if the change is major or breaking
    • (Module updates) Added a release notes entry if the change is significant
    • (Module addition) Added a release notes entry if adding a new NixOS module
  • Fits CONTRIBUTING.md.

Add a 👍 reaction to pull requests you find important.

@philipp-baumann
Copy link
Contributor Author

philipp-baumann commented Mar 11, 2024

Package compiles and works as expected on darwin-aarch64 (For SIMD support via xsimd on darwin-aarch64, seems that things need to be tweaked for darwin there https://github.com/NixOS/nixpkgs/blob/c8e74c2f83fe12b4e5a8bd1abbc090575b0f7611/pkgs/development/libraries/xsimd/default.nix#L18C1-L21C47).

> .libPaths(.libPaths()[-1])
> arrow::arrow_info()
Arrow package version: 14.0.0.2

Capabilities:
               
acero      TRUE
dataset    TRUE
substrait  TRUE
parquet    TRUE
json       TRUE
s3         TRUE
gcs       FALSE
utf8proc   TRUE
re2        TRUE
snappy     TRUE
gzip       TRUE
brotli     TRUE
zstd       TRUE
lz4        TRUE
lz4_frame  TRUE
lzo       FALSE
bz2        TRUE
jemalloc  FALSE
mimalloc   TRUE

Memory:
                  
Allocator mimalloc
Current    0 bytes
Max        0 bytes

Runtime:
                        
SIMD Level          none
Detected SIMD Level none

Build:
                           
C++ Library Version  14.0.0
C++ Compiler          Clang
C++ Compiler Version 16.0.6

> sessionInfo()
R version 4.3.2 (2023-10-31)
Platform: aarch64-apple-darwin23.3.0 (64-bit)
Running under: macOS Sonoma 14.2.1

Matrix products: default
BLAS:   /nix/store/aknvmhykghnk87i9fj4v8f8wk2s36pil-blas-3/lib/libblas.dylib 
LAPACK: /nix/store/v6dhb17ps35n1b0hh07qsgwvpzqpgk64-openblas-0.3.26/lib/libopenblasp-r0.3.26.dylib;  LAPACK version 3.12.0

locale:
[1] en_US.UTF-8/UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: Europe/Zurich
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
 [1] tidyselect_1.2.0 bit_4.0.5        compiler_4.3.2   magrittr_2.0.3  
 [5] assertthat_0.2.1 R6_2.5.1         cli_3.6.2        glue_1.6.2      
 [9] bit64_4.0.5      vctrs_0.6.5      lifecycle_1.0.4  arrow_14.0.0.2  
[13] rlang_1.1.2      purrr_1.0.2

@philipp-baumann
Copy link
Contributor Author

works on linux-x86_64.

> .libPaths(.libPaths()[-1])
> arrow::arrow_info()
Arrow package version: 14.0.0.2

Capabilities:
               
acero      TRUE
dataset    TRUE
substrait  TRUE
parquet    TRUE
json       TRUE
s3         TRUE
gcs        TRUE
utf8proc   TRUE
re2        TRUE
snappy     TRUE
gzip       TRUE
brotli     TRUE
zstd       TRUE
lz4        TRUE
lz4_frame  TRUE
lzo       FALSE
bz2        TRUE
jemalloc   TRUE
mimalloc   TRUE

Memory:
                  
Allocator jemalloc
Current    0 bytes
Max        0 bytes

Runtime:
                          
SIMD Level          avx512
Detected SIMD Level avx512

Build:
                           
C++ Library Version  14.0.0
C++ Compiler            GNU
C++ Compiler Version 13.2.0

> sessionInfo()
timedatectl: /lib64/libc.so.6: version `GLIBC_2.35' not found (required by /nix/store/lvzp8051nk5v3frx94jpcam7lvb9f0ci-gfortran-13.2.0-lib/lib/libgcc_s.so.1)
timedatectl: /lib64/libc.so.6: version `GLIBC_2.34' not found (required by /nix/store/lvzp8051nk5v3frx94jpcam7lvb9f0ci-gfortran-13.2.0-lib/lib/libgcc_s.so.1)
R version 4.3.2 (2023-10-31)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Rocky Linux 8.7 (Green Obsidian)

Matrix products: default
BLAS/LAPACK: /nix/store/p3fad3big11ad6b9grahhf5fb23k94f6-blas-3/lib/libblas.so.3;  LAPACK version 3.12.0

locale:
[1] C

time zone: Europe/Zurich
tzcode source: system (glibc)

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
 [1] tidyselect_1.2.0 bit_4.0.5        compiler_4.3.2   magrittr_2.0.3  
 [5] assertthat_0.2.1 R6_2.5.1         cli_3.6.2        glue_1.6.2      
 [9] bit64_4.0.5      vctrs_0.6.5      lifecycle_1.0.4  arrow_14.0.0.2  
[13] rlang_1.1.2      purrr_1.0.2     
Warning message:
In system("timedatectl", intern = TRUE) :
  running command 'timedatectl' had status 1 and error message 'Function not implemented'

Copy link
Contributor

@Kupac Kupac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All good, just one little change needed, arrow should go into buildInputs instead of nativeBuildInputs. It doesn't actually have an effect yet, but it's more future-proof in case we rethink the r-modules.

Otherwise, LGTM. write_parquet examples work too!

Arrow package version: 14.0.0.2

Capabilities:
               
acero      TRUE
dataset    TRUE
substrait  TRUE
parquet    TRUE
json       TRUE
s3         TRUE
gcs        TRUE
utf8proc   TRUE
re2        TRUE
snappy     TRUE
gzip       TRUE
brotli     TRUE
zstd       TRUE
lz4        TRUE
lz4_frame  TRUE
lzo       FALSE
bz2        TRUE
jemalloc   TRUE
mimalloc   TRUE

Memory:
                  
Allocator jemalloc
Current    0 bytes
Max        0 bytes

Runtime:
                        
SIMD Level          avx2
Detected SIMD Level avx2

Build:
                           
C++ Library Version  14.0.0
C++ Compiler            GNU
C++ Compiler Version 13.2.0

pkgs/development/r-modules/default.nix Outdated Show resolved Hide resolved
Copy link
Contributor

@Kupac Kupac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All good, let's gooooo

@b-rodrigues
Copy link
Contributor

The rPackages tree has been bumped (#298837) so I guess this PR should get updated to reflect arrow’s update as well. I guess {arrow} should build, since version 15 has been released and is now in sync with arrow-cpp, but probably we should adopt your approach here as it would avoid a desync sometime in the future. What do you think?

@philipp-baumann
Copy link
Contributor Author

The rPackages tree has been bumped (#298837) so I guess this PR should get updated to reflect arrow’s update as well. I guess {arrow} should build, since version 15 has been released and is now in sync with arrow-cpp, but probably we should adopt your approach here as it would avoid a desync sometime in the future. What do you think?

i agree with that, i think we should keep pushing this change to avoid any version conflicts.

@philipp-baumann
Copy link
Contributor Author

@jbedo would be great if this can be merged, so that we have a robust way in future plus we also have it running on darwin.

Copy link
Contributor

@jbedo jbedo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The hash doesn't seem to match:

error: hash mismatch in fixed-output derivation '/nix/store/wgrbgg0lnagdrqaqk6368wd4imrxkhpd-apache-arrow-15.0.1.tar.gz.drv':
        specified: sha256-TrDaUOwHG68V/BY8tIBYkx4AbxyGLI3vDhgP0H1TECE=
           got:    sha256-Vdtj7Z/WkXt6v+XUGGyfUyy+SKpT9AQNV+fCmtcLzvo=

Has the tarball changed?

@Kupac
Copy link
Contributor

Kupac commented Apr 6, 2024

Yes, rPackages.arrow v15 has been released, right? Maybe we can first test for version equality. If it's a match, use pkgs.arrow-cpp, otherwise the overridden one.

But we'll always have to update the hash when arrow-cpp overtakes rPackages.arrow in nixpkgs. At least Idon't know of a mechanism to do it automatically. Ideally, arrow-cpp should update the R package too, or hold back until the R package is released on CRAN

@philipp-baumann
Copy link
Contributor Author

Yes, rPackages.arrow v15 has been released, right? Maybe we can first test for version equality. If it's a match, use pkgs.arrow-cpp, otherwise the overridden one.

But we'll always have to update the hash when arrow-cpp overtakes rPackages.arrow in nixpkgs. At least Idon't know of a mechanism to do it automatically. Ideally, arrow-cpp should update the R package too, or hold back until the R package is released on CRAN

Exellent idea to test that the R package major version at least corresponds to the libarrow (arrow-cpp) major version. This is also what is expected when building the R package.
https://github.com/apache/arrow/blob/6aa33210ecd9773f18ceb775306b79de132b6c6f/r/tools/check-versions.R#L33

@philipp-baumann philipp-baumann force-pushed the fix2_rPackages.arrow_14.0.0.2 branch 3 times, most recently from 9c02e9d to 058ccee Compare April 6, 2024 07:54
@philipp-baumann
Copy link
Contributor Author

philipp-baumann commented Apr 6, 2024

Yes, rPackages.arrow v15 has been released, right? Maybe we can first test for version equality. If it's a match, use pkgs.arrow-cpp, otherwise the overridden one.

its still v14 for rPackages.arrow, so i'd suggest to also bump that in this PR to make it efficient and get v15 directly. Before, to show that it still works with the conditional override (arrow-cpp in nixpkgs is already at v15, hence the download and build of custom arrow-cpp v14.0.0 input version is triggered for rPackages.arrow (14.0.0.2).

 nix-shell -I nixpkgs=. -p rPackages.arrow R

[nix-shell:~/git/nixpkgs]$ R

R version 4.3.2 (2023-10-31) -- "Eye Holes"
Copyright (C) 2023 The R Foundation for Statistical Computing
Platform: aarch64-apple-darwin23.3.0 (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

  Natural language support but running in an English locale

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

[Previously saved workspace restored]

> .libPaths(.libPaths()[-1])
> arrow::arrow_info()
Arrow package version: 14.0.0.2

Capabilities:
               
acero      TRUE
dataset    TRUE
substrait  TRUE
parquet    TRUE
json       TRUE
s3         TRUE
gcs       FALSE
utf8proc   TRUE
re2        TRUE
snappy     TRUE
gzip       TRUE
brotli     TRUE
zstd       TRUE
lz4        TRUE
lz4_frame  TRUE
lzo       FALSE
bz2        TRUE
jemalloc  FALSE
mimalloc   TRUE

Memory:
                  
Allocator mimalloc
Current    0 bytes
Max        0 bytes

Runtime:
                        
SIMD Level          none
Detected SIMD Level none

Build:
                           
C++ Library Version  14.0.0
C++ Compiler          Clang
C++ Compiler Version 16.0.6

@b-rodrigues
Copy link
Contributor

The package tree has been bumped and arrow 15 is now available, you're not on the latest commit I believe

@b-rodrigues
Copy link
Contributor

you need to rebase on latest master

@philipp-baumann
Copy link
Contributor Author

@Kupac yeah

Yes, rPackages.arrow v15 has been released, right? Maybe we can first test for version equality. If it's a match, use pkgs.arrow-cpp, otherwise the overridden one.

But we'll always have to update the hash when arrow-cpp overtakes rPackages.arrow in nixpkgs. At least Idon't know of a mechanism to do it automatically. Ideally, arrow-cpp should update the R package too, or hold back until the R package is released on CRAN

you need to rebase on latest master

ah yes indeed thanks, just checked here and there it was still on 14 https://search.nixos.org/packages?channel=unstable&from=0&size=50&sort=relevance&type=packages&query=rPackages.arrow (is this based on queue?(

@b-rodrigues
Copy link
Contributor

In situations like that, I nuke the branch, update my fork, recreate the branch with the same name and put my changes back and force push 🤣

@philipp-baumann philipp-baumann force-pushed the fix2_rPackages.arrow_14.0.0.2 branch 3 times, most recently from e4d1df2 to a01b4cd Compare April 6, 2024 09:35
@philipp-baumann
Copy link
Contributor Author

philipp-baumann commented Apr 6, 2024

Now it works and does not use the override.

But it will if there's a mismatch, right? I really like this approach

Yeah. In case major(arrow-cpp) > major(rPackages.arrow), then we now need to manually update the hash. I have an idea for part of @Kupac is suggesting. We could have a trigger via a bot that opens a PR in nixpkgs, via a CI that fetches https://raw.githubusercontent.com/rstats-on-nix/nixpkgs/r-daily/pkgs/development/r-modules/cran-packages.nix and compares that to the nixpkgs upstream main branch, and also parses the version of arrow-cpp via Nixpkgs whether that one advances. Ideally, we'd have a shiny dashboard that would remind us that we need a manual update, something along what @b-rodrigues has started :-)

@philipp-baumann philipp-baumann changed the title rPackages.arrow: fix build v14.0.0.2 rPackages.arrow: workaround to reduce futurearrow-cpp mismatches and fix build for darwin Apr 6, 2024
Copy link
Contributor

@b-rodrigues b-rodrigues left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Builds and runs successfully on arm64 linux

> arrow::arrow_info()
Arrow package version: 15.0.1

Capabilities:
               
acero      TRUE
dataset    TRUE
substrait  TRUE
parquet    TRUE
json       TRUE
s3         TRUE
gcs        TRUE
utf8proc   TRUE
re2        TRUE
snappy     TRUE
gzip       TRUE
brotli     TRUE
zstd       TRUE
lz4        TRUE
lz4_frame  TRUE
lzo       FALSE
bz2        TRUE
jemalloc   TRUE
mimalloc   TRUE

Memory:
                  
Allocator jemalloc
Current    0 bytes
Max        0 bytes

Runtime:
                        
SIMD Level          none
Detected SIMD Level none

Build:
                           
C++ Library Version  15.0.0
C++ Compiler            GNU
C++ Compiler Version 13.2.0

@jbedo
Copy link
Contributor

jbedo commented Apr 7, 2024

It's a bit dissastifying to accept that the hash is going to break as part of the automatic update in future. It might be preferable to just accept that we can't syncronise with CRAN for this package in a sensible way and do:

     arrow = old.arrow.overrideAttrs (attrs: {
      src = pkgs.arrow-cpp.src;
      prePatch = "cd r";
      postPatch = ''
        patchShebangs configure
      '';
      buildInputs = attrs.buildInputs ++ [
        pkgs.arrow-cpp
      ];
    });

Then we won't have any regular breakages.

@b-rodrigues
Copy link
Contributor

b-rodrigues commented Apr 7, 2024

I’m not sure I understand, what does this do? It builds the R package from the source included in the arrow-cpp project, instead of building the r package source and linking to arrow-cpp? In that case, we may end up in situations where nixpkgs will supply a more recent (still in-development version) of the R package. This may break in other ways, no?

For example, arrow 15 was released on 21st of January: https://github.com/apache/arrow/releases/tag/apache-arrow-15.0.0

but some more work continued for the r package for quite some time. Wouldn’t your approach install the R packages as it was on the 21st of January?

@jbedo
Copy link
Contributor

jbedo commented Apr 7, 2024

Where does the R development take place if not that repository? The CRAN metadata points to https://github.com/apache/arrow.

@Kupac
Copy link
Contributor

Kupac commented Apr 7, 2024

Ah I get it. It's probably ahead of CRAN, likely synced with arrow-cpp. So indeed probably the best solution. Because it's part of the arrow-cpp repo, we dont have to check hash, correct? Minimal maintenance achieved.

@philipp-baumann
Copy link
Contributor Author

philipp-baumann commented Apr 7, 2024

It's a bit dissastifying to accept that the hash is going to break as part of the automatic update in future. It might be preferable to just accept that we can't syncronise with CRAN for this package in a sensible way and do:

     arrow = old.arrow.overrideAttrs (attrs: {
      src = pkgs.arrow-cpp.src;
      prePatch = "cd r";
      postPatch = ''
        patchShebangs configure
      '';
      buildInputs = attrs.buildInputs ++ [
        pkgs.arrow-cpp
      ];
    });

Then we won't have any regular breakages.

If the versions don't match (only if arrow-cpp major version > major rPackages.arrow, we will have breakages, won't we? This will terminate the build process:

https://github.com/apache/arrow/blob/6aa33210ecd9773f18ceb775306b79de132b6c6f/r/configure#L191-L192
via calling this script via Rscript
https://github.com/apache/arrow/blob/6aa33210ecd9773f18ceb775306b79de132b6c6f/r/tools/check-versions.R#L23-L35

@philipp-baumann
Copy link
Contributor Author

It's a bit dissastifying to accept that the hash is going to break as part of the automatic update in future. It might be preferable to just accept that we can't syncronise with CRAN for this package in a sensible way and do:

     arrow = old.arrow.overrideAttrs (attrs: {
      src = pkgs.arrow-cpp.src;
      prePatch = "cd r";
      postPatch = ''
        patchShebangs configure
      '';
      buildInputs = attrs.buildInputs ++ [
        pkgs.arrow-cpp
      ];
    });

Then we won't have any regular breakages.

If the versions don't match (only if arrow-cpp major version > major rPackages.arrow, we will have breakages, won't we? This will terminate the build process:

https://github.com/apache/arrow/blob/6aa33210ecd9773f18ceb775306b79de132b6c6f/r/configure#L191-L192 via calling this script via Rscript https://github.com/apache/arrow/blob/6aa33210ecd9773f18ceb775306b79de132b6c6f/r/tools/check-versions.R#L23-L35

Yeah good idea @jbedo the releases are synchronised like at stage @b-rodrigues posted above (https://github.com/apache/arrow/tree/a61f4af724cd06c3a9b4abd20491345997e532c0) , so they are likely building. Means we never have the CRAN state but it will likely work.

@philipp-baumann
Copy link
Contributor Author

It's a bit dissastifying to accept that the hash is going to break as part of the automatic update in future. It might be preferable to just accept that we can't syncronise with CRAN for this package in a sensible way and do:

     arrow = old.arrow.overrideAttrs (attrs: {
      src = pkgs.arrow-cpp.src;
      prePatch = "cd r";
      postPatch = ''
        patchShebangs configure
      '';
      buildInputs = attrs.buildInputs ++ [
        pkgs.arrow-cpp
      ];
    });

Then we won't have any regular breakages.

If the versions don't match (only if arrow-cpp major version > major rPackages.arrow, we will have breakages, won't we? This will terminate the build process:
https://github.com/apache/arrow/blob/6aa33210ecd9773f18ceb775306b79de132b6c6f/r/configure#L191-L192 via calling this script via Rscript https://github.com/apache/arrow/blob/6aa33210ecd9773f18ceb775306b79de132b6c6f/r/tools/check-versions.R#L23-L35

Yeah good idea @jbedo the releases are synchronised like at stage @b-rodrigues posted above (https://github.com/apache/arrow/tree/a61f4af724cd06c3a9b4abd20491345997e532c0) , so they are likely building. Means we never have the CRAN state but it will likely work.

What do we do with the version of arrow here where we'll have a mismatch in source and version?: https://raw.githubusercontent.com/NixOS/nixpkgs/master/pkgs/development/r-modules/cran-packages.nix

@b-rodrigues
Copy link
Contributor

Where does the R development take place if not that repository? The CRAN metadata points to https://github.com/apache/arrow.

yes but we don't have any guarantee that the R code is in a state that will build the R package when a major release of arrow cpp is released, I believe

@philipp-baumann
Copy link
Contributor Author

philipp-baumann commented Apr 7, 2024

Where does the R development take place if not that repository? The CRAN metadata points to https://github.com/apache/arrow.

yes but we don't have any guarantee that the R code is in a state that will build the R package when a major release of arrow cpp is released, I believe

it could happen, but it is rather unlikely. at release stage they seem to synchronize all downstream package like for R, python etc. apache/arrow@a61f4af#diff-71852d5b939936af45e8e9b501688fd48fff455de44fed8130896dd5ea8b89d5 pretty sure they take care it compiles for releases.

@b-rodrigues
Copy link
Contributor

If that's the case then what @jbedo suggests would be the best approach

@philipp-baumann
Copy link
Contributor Author

philipp-baumann commented Apr 7, 2024

We'll have a mismatch in the versions. It reports 15.0.1 (CRAN from cran-packages.nix )in the nix path but it uses the 15.0.0 for the source.

> .libPaths()
 [1] "/Users/philipp/Library/R/arm64/4.3/library"                                               
 [2] "/nix/store/zkvd1icg3k4smsajc4m4bsfmgx67c68j-r-arrow-15.0.1/library"                       
 [3] "/nix/store/hnr0v66cah81941bxzn1g3q90dn8phfk-r-assertthat-0.2.1/library"                   
 [4] "/nix/store/1083x08shvisrak9sg28k6ga1121fw9a-r-bit64-4.0.5/library"                        
 [5] "/nix/store/c352djcc65lgv8ym9w349nd5kb91fy5y-r-bit-4.0.5/library"                          
 [6] "/nix/store/9yjksf1l03iwigyhyhf4x4lxammfb77n-r-cpp11-0.4.7/library"                        
 [7] "/nix/store/krnp8xvda19g32h9zc58516a049dd00n-r-glue-1.7.0/library"                         
 [8] "/nix/store/ywq4nq7h4is4y1sf17ah78bfxyfppwvf-r-purrr-1.0.2/library"                        
 [9] "/nix/store/0khkc353iqsga6capbngcydbdb68z8bg-r-cli-3.6.2/library"                          
[10] "/nix/store/3pr8ia83blvb4i9w14ghdfgh7hfds1dc-r-lifecycle-1.0.4/library"                    
[11] "/nix/store/2vikwgfmmp5sqiw92zizcn6pfacnqp8z-r-rlang-1.1.3/library"                        
[12] "/nix/store/vd5qg62gagixxk7mf0xz2fy16gbbf6lp-r-magrittr-2.0.3/library"                     
[13] "/nix/store/vzkix10k1l19vfk8h7xfl0ajax9l2v1x-r-vctrs-0.6.5/library"                        
[14] "/nix/store/alkqhfhd0a2cx1m73zj9i43h6ab5pig5-r-R6-2.5.1/library"                           
[15] "/nix/store/q136l6pybrivh94xxvd0z7b9kj6azny3-r-tidyselect-1.2.1/library"                   
[16] "/nix/store/svazjq003nxhhxsbajcf4az2hxpl4x72-r-withr-3.0.0/library"                        
[17] "/nix/store/5wgl1kh8klm45pv4g55k5fw9ffdyz070-apple-framework-CoreFoundation-11.0.0/Library"
[18] "/nix/store/7dhbarll7piq26cx0pcig5sk9cd9df21-R-4.3.3/lib/R/library"                        
> .libPaths(.libPaths()[-1])
> arrow::arrow_info()
Arrow package version: 15.0.0

Capabilities:
               
acero      TRUE
dataset    TRUE
substrait  TRUE
parquet    TRUE
json       TRUE
s3         TRUE
gcs       FALSE
utf8proc   TRUE
re2        TRUE
snappy     TRUE
gzip       TRUE
brotli     TRUE
zstd       TRUE
lz4        TRUE
lz4_frame  TRUE
lzo       FALSE
bz2        TRUE
jemalloc  FALSE
mimalloc   TRUE

Memory:
                  
Allocator mimalloc
Current    0 bytes
Max        0 bytes

Runtime:
                        
SIMD Level          none
Detected SIMD Level none

Build:
                           
C++ Library Version  15.0.0
C++ Compiler          Clang
C++ Compiler Version 16.0.6

@jbedo
Copy link
Contributor

jbedo commented Apr 7, 2024

Add name = "r-arrow-${pkgs.arrow-cpp.version}" into the overrides to update the path name too.

@philipp-baumann
Copy link
Contributor Author

We'll have a mismatch in the versions. It reports 15.0.1 (CRAN from cran-packages.nix )in the nix path but it uses the 15.0.0 for the source.

> .libPaths()
 [1] "/Users/philipp/Library/R/arm64/4.3/library"                                               
 [2] "/nix/store/zkvd1icg3k4smsajc4m4bsfmgx67c68j-r-arrow-15.0.1/library"                       
 [3] "/nix/store/hnr0v66cah81941bxzn1g3q90dn8phfk-r-assertthat-0.2.1/library"                   
 [4] "/nix/store/1083x08shvisrak9sg28k6ga1121fw9a-r-bit64-4.0.5/library"                        
 [5] "/nix/store/c352djcc65lgv8ym9w349nd5kb91fy5y-r-bit-4.0.5/library"                          
 [6] "/nix/store/9yjksf1l03iwigyhyhf4x4lxammfb77n-r-cpp11-0.4.7/library"                        
 [7] "/nix/store/krnp8xvda19g32h9zc58516a049dd00n-r-glue-1.7.0/library"                         
 [8] "/nix/store/ywq4nq7h4is4y1sf17ah78bfxyfppwvf-r-purrr-1.0.2/library"                        
 [9] "/nix/store/0khkc353iqsga6capbngcydbdb68z8bg-r-cli-3.6.2/library"                          
[10] "/nix/store/3pr8ia83blvb4i9w14ghdfgh7hfds1dc-r-lifecycle-1.0.4/library"                    
[11] "/nix/store/2vikwgfmmp5sqiw92zizcn6pfacnqp8z-r-rlang-1.1.3/library"                        
[12] "/nix/store/vd5qg62gagixxk7mf0xz2fy16gbbf6lp-r-magrittr-2.0.3/library"                     
[13] "/nix/store/vzkix10k1l19vfk8h7xfl0ajax9l2v1x-r-vctrs-0.6.5/library"                        
[14] "/nix/store/alkqhfhd0a2cx1m73zj9i43h6ab5pig5-r-R6-2.5.1/library"                           
[15] "/nix/store/q136l6pybrivh94xxvd0z7b9kj6azny3-r-tidyselect-1.2.1/library"                   
[16] "/nix/store/svazjq003nxhhxsbajcf4az2hxpl4x72-r-withr-3.0.0/library"                        
[17] "/nix/store/5wgl1kh8klm45pv4g55k5fw9ffdyz070-apple-framework-CoreFoundation-11.0.0/Library"
[18] "/nix/store/7dhbarll7piq26cx0pcig5sk9cd9df21-R-4.3.3/lib/R/library"                        
> .libPaths(.libPaths()[-1])
> arrow::arrow_info()
Arrow package version: 15.0.0

Capabilities:
               
acero      TRUE
dataset    TRUE
substrait  TRUE
parquet    TRUE
json       TRUE
s3         TRUE
gcs       FALSE
utf8proc   TRUE
re2        TRUE
snappy     TRUE
gzip       TRUE
brotli     TRUE
zstd       TRUE
lz4        TRUE
lz4_frame  TRUE
lzo       FALSE
bz2        TRUE
jemalloc  FALSE
mimalloc   TRUE

Memory:
                  
Allocator mimalloc
Current    0 bytes
Max        0 bytes

Runtime:
                        
SIMD Level          none
Detected SIMD Level none

Build:
                           
C++ Library Version  15.0.0
C++ Compiler          Clang
C++ Compiler Version 16.0.6

@jbedo does it justify an extension of the update process/script? because we would then need to have a specific list of packages from CRAN but with specific source handling.

@philipp-baumann
Copy link
Contributor Author

philipp-baumann commented Apr 7, 2024

Add name = "r-arrow-${pkgs.arrow-cpp.version}" into the overrides to update the path name too.

thanks :-)

> arrow::arrow_info()
Arrow package version: 15.0.0

Capabilities:
               
acero      TRUE
dataset    TRUE
substrait  TRUE
parquet    TRUE
json       TRUE
s3         TRUE
gcs       FALSE
utf8proc   TRUE
re2        TRUE
snappy     TRUE
gzip       TRUE
brotli     TRUE
zstd       TRUE
lz4        TRUE
lz4_frame  TRUE
lzo       FALSE
bz2        TRUE
jemalloc  FALSE
mimalloc   TRUE

Memory:
                  
Allocator mimalloc
Current    0 bytes
Max        0 bytes

Runtime:
                        
SIMD Level          none
Detected SIMD Level none

Build:
                           
C++ Library Version  15.0.0
C++ Compiler          Clang
C++ Compiler Version 16.0.6

The main reason this is easier is the time it needs for merging PRs, when arrow-cpp increases in major release on Nixpkgs. Since this now works i agree its a preferable solution. It looks not much happened at v15 on CRAN except probably fixing build stuff for cran comments. https://github.com/apache/arrow/blob/main/r/NEWS.md https://cran.r-project.org/web/packages/arrow/news/news.html https://cran.r-project.org/web/packages/arrow/news/news.html

@ofborg ofborg bot added the 6.topic: darwin Running or building packages on Darwin label Apr 12, 2024
@philipp-baumann
Copy link
Contributor Author

philipp-baumann commented Apr 12, 2024

nix-shell -p nixpkgs-review --run "nixpkgs-review rev HEAD"

  • on aarch64-darwin
# ...
Auto-merging pkgs/development/r-modules/default.nix
Automatic merge went well; stopped before committing as requested
$ nix-env --extra-experimental-features no-url-literals --option system aarch64-darwin -f <nixpkgs> --nix-path nixpkgs=/Users/philipp/.cache/nixpkgs-review/rev-470a49066c264cd50f197b9817b7101817905205/nixpkgs nixpkgs-overlays=/tmp/tmpcqncjo00 -qaP --xml --out-path --show-trace --no-allow-import-from-derivation --meta
Nothing to be built.
$ /nix/store/97wfvkpc87y75ngqka565wp3djjm072f-nix-2.18.1/bin/nix-shell /Users/philipp/.cache/nixpkgs-review/rev-470a49066c264cd50f197b9817b7101817905205/shell.nix --nix-path nixpkgs=/Users/philipp/.cache/nixpkgs-review/rev-470a49066c264cd50f197b9817b7101817905205/nixpkgs nixpkgs-overlays=/tmp/tmpcqncjo00
  • on x86_64-linux
$ git merge --no-commit --no-ff 470a49066c264cd50f197b9817b7101817905205
Auto-merging pkgs/development/r-modules/default.nix
Automatic merge went well; stopped before committing as requested
$ nix-env --extra-experimental-features no-url-literals --option system x86_64-linux -f /home/spectral-cockpit/.cache/nixpkgs-review/rev-470a49066c264cd50f197b9817b7101817905205/nixpkgs -qaP --xml --out-path --show-trace --no-allow-import-from-derivation --meta
Nothing to be built.

@philipp-baumann
Copy link
Contributor Author

philipp-baumann commented Apr 12, 2024

@jbedo this should be ready like requested, it builds. We do not exactly match CRAN source but it should work well for future releases of arrow-cpp on the development source repo. Thanks!

@jbedo
Copy link
Contributor

jbedo commented Apr 14, 2024

@ofborg build rPackages.arrow

@jbedo jbedo merged commit 7a7403a into NixOS:master Apr 18, 2024
25 checks passed
@philipp-baumann
Copy link
Contributor Author

Thanks @jbedo :-) !

@philipp-baumann philipp-baumann deleted the fix2_rPackages.arrow_14.0.0.2 branch April 21, 2024 17:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants