Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[R] configure script for R fails to set PKG_CFLAGS #35140

Closed
atsuik opened this issue Apr 14, 2023 · 13 comments · Fixed by #35147
Closed

[R] configure script for R fails to set PKG_CFLAGS #35140

atsuik opened this issue Apr 14, 2023 · 13 comments · Fixed by #35147
Assignees
Milestone

Comments

@atsuik
Copy link

atsuik commented Apr 14, 2023

Describe the bug, including details regarding any error messages, version, and platform.

arrow version: 11.0.0.3
R version: 4.2.3
platform: arm64

I found that current r/configure fails to set PKG_CFLAGS parsing ARROW_OPTS_CMAKE file.

This is a log I tried to install arrow to my m1 mac.

> install.packages("arrow")
Installing package into ‘/opt/homebrew/lib/R/4.2/site-library’
(as ‘lib’ is unspecified)
trying URL 'https://cran.ism.ac.jp/src/contrib/arrow_11.0.0.3.tar.gz'
Content type 'application/x-gzip' length 3921484 bytes (3.7 MB)
==================================================
downloaded 3.7 MB

* installing *source* package ‘arrow’ ...
files ‘tools/cpp/src/arrow/Testing/Temporary/CTestCostData.txt’, ‘tools/cpp/src/arrow/Testing/Temporary/LastTest.log’ are missing
** using staged installation
*** Arrow C++ libraries found via pkg-config at /opt/homebrew/Cellar/apache-arrow/11.0.0_3/lib
**** Warning: library version mismatch
**** C++ is 11.0.0 but R is 11.0.0.3
**** If installation fails, upgrade the C++ library to match
**** or retry with ARROW_USE_PKG_CONFIG=false
PKG_CFLAGS= 
PKG_LIBS=-L/opt/homebrew/Cellar/apache-arrow/11.0.0_3/lib -larrow

The log shows PKG_CFLAGS is empty.
This causes that capabilities, such as dataset, parquet, etc..., are FALSE.

> arrow_info()
Arrow package version: 11.0.0.3

Capabilities:
               
dataset   FALSE
substrait FALSE
parquet   FALSE
json      FALSE
s3        FALSE
gcs       FALSE
utf8proc   TRUE
re2        TRUE
snappy     TRUE
gzip       TRUE
brotli     TRUE
zstd       TRUE
lz4        TRUE
lz4_frame  TRUE
lzo       FALSE
bz2        TRUE
jemalloc   TRUE
mimalloc  FALSE

Memory:
                  
Allocator jemalloc
Current    0 bytes
Max        0 bytes

Runtime:
                        
SIMD Level          none
Detected SIMD Level none

Build:
                                    
C++ Library Version           11.0.0
C++ Compiler              AppleClang
C++ Compiler Version 14.0.3.14030022

This is caused by $LIB_DIR is empty at this line

ARROW_OPTS_CMAKE="$LIB_DIR/cmake/Arrow/ArrowOptions.cmake"

I found that the deletion of line at this pull request
https://github.com/apache/arrow/pull/14235/files#diff-089697faebdb7820ca629a2bb316b878cc0ba18a5bfb0b60996f8dbcd1fa11e7L234
causes this problem.

LIB_DIR=`echo $PKG_DIRS | sed -e 's/^-L//'`

I have already checked that to recover this line resolves this problem.

This is the log.

install.packages("./arrow", repos=NULL, type="source")
Installing package into ‘/opt/homebrew/lib/R/4.2/site-library’
(as ‘lib’ is unspecified)
* installing *source* package ‘arrow’ ...
files ‘tools/cpp/src/arrow/Testing/Temporary/CTestCostData.txt’, ‘tools/cpp/src/arrow/Testing/Temporary/LastTest.log’ are missing
file ‘configure’ has the wrong MD5 checksum
** using staged installation
*** Arrow C++ libraries found via pkg-config at /opt/homebrew/Cellar/apache-arrow/11.0.0_3/lib
**** Warning: library version mismatch
**** C++ is 11.0.0 but R is 11.0.0.3
**** If installation fails, upgrade the C++ library to match
**** or retry with ARROW_USE_PKG_CONFIG=false
PKG_CFLAGS=  -DARROW_R_WITH_PARQUET -DARROW_R_WITH_DATASET -DARROW_R_WITH_JSON -DARROW_R_WITH_S3
PKG_LIBS=-L/opt/homebrew/Cellar/apache-arrow/11.0.0_3/lib -larrow_dataset -lparquet -larrow 
** libs
make: Nothing to be done for `all'.
installing to /opt/homebrew/lib/R/4.2/site-library/00LOCK-arrow/00new/arrow/libs
** R
** inst
** byte-compile and prepare package for lazy loading
** help
*** installing help indices
** building package indices
** testing if installed package can be loaded from temporary location
** checking absolute paths in shared objects and dynamic libraries
** testing if installed package can be loaded from final location
** testing if installed package keeps a record of temporary installation path
* DONE (arrow)
> library(arrow)
Some features are not enabled in this build of Arrow. Run `arrow_info()` for more information.

Attaching package: ‘arrow’

The following object is masked from ‘package:utils’:

    timestamp

> arrow_info()
Arrow package version: 11.0.0.3

Capabilities:
               
dataset    TRUE
substrait FALSE
parquet    TRUE
json       TRUE
s3         TRUE
gcs       FALSE
utf8proc   TRUE
re2        TRUE
snappy     TRUE
gzip       TRUE
brotli     TRUE
zstd       TRUE
lz4        TRUE
lz4_frame  TRUE
lzo       FALSE
bz2        TRUE
jemalloc   TRUE
mimalloc  FALSE

Memory:
                  
Allocator jemalloc
Current    0 bytes
Max        0 bytes

Runtime:
                        
SIMD Level          none
Detected SIMD Level none

Build:
                                    
C++ Library Version           11.0.0
C++ Compiler              AppleClang
C++ Compiler Version 14.0.3.14030022

Component(s)

R

atsuik added a commit to atsuik/arrow that referenced this issue Apr 14, 2023
@paleolimbot
Copy link
Member

I wonder if the root of your problem is that you have Arrow installed via homebrew which is getting picked up by pkg-config

Is there a reason you need to install the package from source? The binary install from CRAN is the preferred way to install on MacOS since it already contains all the components noted.

@nealrichardson
Copy link
Member

I found that the deletion of line at this pull request
https://github.com/apache/arrow/pull/14235/files#diff-089697faebdb7820ca629a2bb316b878cc0ba18a5bfb0b60996f8dbcd1fa11e7L234
causes this problem.

If I recall, the point of the change you identified was that we're using the generated cmake configuration to understand the features of the C++ library. So I would guess then that the Homebrew package isn't including these cmake files, and it probably should.

@paleolimbot
Copy link
Member

The homebrew package does, but the pkg-config name for that is arrow-dataset not arrow as we test for:

ls /opt/homebrew/Cellar/apache-arrow/11.0.0_2/lib/pkgconfig/
arrow-compute.pc	arrow-flight-sql.pc	arrow.pc
arrow-csv.pc		arrow-flight.pc		gandiva.pc
arrow-dataset.pc	arrow-json.pc		parquet.pc
arrow-filesystem.pc	arrow-orc.pc		plasma.pc

That's a problem wherever a system install of Arrow exists (e.g., #31989 )

@nealrichardson
Copy link
Member

But this is about cmake and not pkg-config. It turns out the cmake files are present in the homebrew build:

% ls /usr/local/Cellar/apache-arrow/11.0.0_3/lib/cmake/Arrow
ArrowConfig.cmake		FindBrotliAlt.cmake		Findlz4Alt.cmake
ArrowConfigVersion.cmake	FindOpenSSLAlt.cmake		Findre2Alt.cmake
ArrowOptions.cmake		FindProtobufAlt.cmake		Findutf8proc.cmake
ArrowTargets-release.cmake	FindSnappyAlt.cmake		FindzstdAlt.cmake
ArrowTargets.cmake		FindThriftAlt.cmake		arrow-config.cmake
FindAWSSDKAlt.cmake		FindgRPCAlt.cmake

I misread the original report, it's about LIB_DIR not being set in the case where the libs are found by pkg-config, and that's why ArrowOptions.cmake isn't being found. I think restoring the deleted line they mentioned, probably protected inside if [ "$LIB_DIR" = "" ]; for the other cases where LIB_DIR does get set correctly, would fix this (and possibly #31989?).

arrow vs. arrow-dataset isn't relevant here because all of the option flags for all of the libs are in lib/cmake/Arrow/ArrowOptions.cmake.

@nealrichardson
Copy link
Member

conda-forge/r-arrow-feedstock#56 (comment) may be another manifestation of this (or may not, I haven't investigated)

nealrichardson added a commit to nealrichardson/arrow that referenced this issue Apr 25, 2023
nealrichardson added a commit to nealrichardson/arrow that referenced this issue Apr 25, 2023
@nealrichardson
Copy link
Member

In conda-forge/r-arrow-feedstock#63 we found a workaround for this for 11.0.0: in your case, set the env var LIB_DIR=/opt/homebrew/Cellar/apache-arrow/11.0.0_3/lib and it should find the features correctly. It turns out this has already been fixed in main, so it will no longer be an issue in the upcoming release.

@kou
Copy link
Member

kou commented Apr 25, 2023

FYI: We can use LIB_DIR=$(brew --prefix apache-arrow)/lib instead of LIB_DIR=/opt/homebrew/Cellar/apache-arrow/11.0.0_3/lib.

@nealrichardson
Copy link
Member

Oh yes, of course. I wasn't trying to suggest a robust solution since this will be no longer useful after the 12.0.0 release closes.

nealrichardson added a commit to nealrichardson/arrow that referenced this issue Apr 30, 2023
nealrichardson added a commit to nealrichardson/arrow that referenced this issue Apr 30, 2023
nealrichardson added a commit that referenced this issue May 3, 2023
…ched libarrow (#35147)

I've significantly rewritten `r/configure` to make it easier to reason about and harder for issues like #34229 and #35140 to happen. I've also added a version check to make sure that we don't obviously try to use a system C++ library that doesn't match the R package version. Making sure this was applied in all of the right places and handling what to do if the versions didn't match was the impetus for the whole refactor. 

`configure` has been broken up into some functions, and the flow of the script is, as is documented at the top of the file:

```
# * Find libarrow on the system. If it is present, make sure
#   that its version is compatible with the R package.
# * If no suitable libarrow is found, download it (where allowed)
#   or build it from source.
# * Determine what features this libarrow has and what other
#   flags it requires, and set them in src/Makevars for use when
#   compiling the bindings.
# * Run a test program to confirm that arrow headers are found
```

All of the detection of CFLAGS and `-L` dirs etc. happen in one place now, and they all prefer using `pkg-config` to read from the libarrow build what libraries and flags it requires, rather than hard-coding. (autobrew is the only remaining exception, but I didn't feel like messing with that today.) This should make the builds more future proof, should make it so more build configurations work (e.g. I suspect that a static build in ARROW_HOME wouldn't have gotten picked up correctly because it didn't add `-larrow_bundled_dependencies` to the libs, but now it will), and it may eliminate the redundant `-l` and `-D` setting I've observed in some builds (not harmful but definitely sloppy).

Version checking has been added in an R script for ease of testing (and for easier handling of arithmetic), and there is an accompanying `test-check-versions.R` added. These are run on all the builds that use `ci/scripts/r_test.sh`. 

### Behavior changes

* If libarrow is found on the system (via ARROW_HOME, pkg-config, or brew), but the version does not match, it will not be used, and we will try a bundled build. This should mean that users installing a released version will never have libarrow version problems. 
* If both the found C++ library and R package are on matching dev versions (i.e. not identical given the x.y.z.9000 vs x+1.y.z-SNAPSHOT difference), it will proceed with a warning that you may need to rebuild if there are issues. This means that regular developers will see an extra message in the build output.
* autobrew is only used on a release version unless you set FORCE_AUTOBREW=true. This eliminates another source of version mismatches (C++ release version, R dev version).
* The path where you could set `LIB_DIR` and `INCLUDE_DIR` env vars has been removed. Use `ARROW_HOME` instead.

* Closes: #35140
* Closes: #31989

Lead-authored-by: Neal Richardson <neal.p.richardson@gmail.com>
Co-authored-by: Sutou Kouhei <kou@cozmixng.org>
Signed-off-by: Neal Richardson <neal.p.richardson@gmail.com>
@nealrichardson nealrichardson added this to the 13.0.0 milestone May 3, 2023
liujiacheng777 pushed a commit to LoongArch-Python/arrow that referenced this issue May 11, 2023
…mismatched libarrow (apache#35147)

I've significantly rewritten `r/configure` to make it easier to reason about and harder for issues like apache#34229 and apache#35140 to happen. I've also added a version check to make sure that we don't obviously try to use a system C++ library that doesn't match the R package version. Making sure this was applied in all of the right places and handling what to do if the versions didn't match was the impetus for the whole refactor. 

`configure` has been broken up into some functions, and the flow of the script is, as is documented at the top of the file:

```
# * Find libarrow on the system. If it is present, make sure
#   that its version is compatible with the R package.
# * If no suitable libarrow is found, download it (where allowed)
#   or build it from source.
# * Determine what features this libarrow has and what other
#   flags it requires, and set them in src/Makevars for use when
#   compiling the bindings.
# * Run a test program to confirm that arrow headers are found
```

All of the detection of CFLAGS and `-L` dirs etc. happen in one place now, and they all prefer using `pkg-config` to read from the libarrow build what libraries and flags it requires, rather than hard-coding. (autobrew is the only remaining exception, but I didn't feel like messing with that today.) This should make the builds more future proof, should make it so more build configurations work (e.g. I suspect that a static build in ARROW_HOME wouldn't have gotten picked up correctly because it didn't add `-larrow_bundled_dependencies` to the libs, but now it will), and it may eliminate the redundant `-l` and `-D` setting I've observed in some builds (not harmful but definitely sloppy).

Version checking has been added in an R script for ease of testing (and for easier handling of arithmetic), and there is an accompanying `test-check-versions.R` added. These are run on all the builds that use `ci/scripts/r_test.sh`. 

### Behavior changes

* If libarrow is found on the system (via ARROW_HOME, pkg-config, or brew), but the version does not match, it will not be used, and we will try a bundled build. This should mean that users installing a released version will never have libarrow version problems. 
* If both the found C++ library and R package are on matching dev versions (i.e. not identical given the x.y.z.9000 vs x+1.y.z-SNAPSHOT difference), it will proceed with a warning that you may need to rebuild if there are issues. This means that regular developers will see an extra message in the build output.
* autobrew is only used on a release version unless you set FORCE_AUTOBREW=true. This eliminates another source of version mismatches (C++ release version, R dev version).
* The path where you could set `LIB_DIR` and `INCLUDE_DIR` env vars has been removed. Use `ARROW_HOME` instead.

* Closes: apache#35140
* Closes: apache#31989

Lead-authored-by: Neal Richardson <neal.p.richardson@gmail.com>
Co-authored-by: Sutou Kouhei <kou@cozmixng.org>
Signed-off-by: Neal Richardson <neal.p.richardson@gmail.com>
ArgusLi pushed a commit to Bit-Quill/arrow that referenced this issue May 15, 2023
…mismatched libarrow (apache#35147)

I've significantly rewritten `r/configure` to make it easier to reason about and harder for issues like apache#34229 and apache#35140 to happen. I've also added a version check to make sure that we don't obviously try to use a system C++ library that doesn't match the R package version. Making sure this was applied in all of the right places and handling what to do if the versions didn't match was the impetus for the whole refactor. 

`configure` has been broken up into some functions, and the flow of the script is, as is documented at the top of the file:

```
# * Find libarrow on the system. If it is present, make sure
#   that its version is compatible with the R package.
# * If no suitable libarrow is found, download it (where allowed)
#   or build it from source.
# * Determine what features this libarrow has and what other
#   flags it requires, and set them in src/Makevars for use when
#   compiling the bindings.
# * Run a test program to confirm that arrow headers are found
```

All of the detection of CFLAGS and `-L` dirs etc. happen in one place now, and they all prefer using `pkg-config` to read from the libarrow build what libraries and flags it requires, rather than hard-coding. (autobrew is the only remaining exception, but I didn't feel like messing with that today.) This should make the builds more future proof, should make it so more build configurations work (e.g. I suspect that a static build in ARROW_HOME wouldn't have gotten picked up correctly because it didn't add `-larrow_bundled_dependencies` to the libs, but now it will), and it may eliminate the redundant `-l` and `-D` setting I've observed in some builds (not harmful but definitely sloppy).

Version checking has been added in an R script for ease of testing (and for easier handling of arithmetic), and there is an accompanying `test-check-versions.R` added. These are run on all the builds that use `ci/scripts/r_test.sh`. 

### Behavior changes

* If libarrow is found on the system (via ARROW_HOME, pkg-config, or brew), but the version does not match, it will not be used, and we will try a bundled build. This should mean that users installing a released version will never have libarrow version problems. 
* If both the found C++ library and R package are on matching dev versions (i.e. not identical given the x.y.z.9000 vs x+1.y.z-SNAPSHOT difference), it will proceed with a warning that you may need to rebuild if there are issues. This means that regular developers will see an extra message in the build output.
* autobrew is only used on a release version unless you set FORCE_AUTOBREW=true. This eliminates another source of version mismatches (C++ release version, R dev version).
* The path where you could set `LIB_DIR` and `INCLUDE_DIR` env vars has been removed. Use `ARROW_HOME` instead.

* Closes: apache#35140
* Closes: apache#31989

Lead-authored-by: Neal Richardson <neal.p.richardson@gmail.com>
Co-authored-by: Sutou Kouhei <kou@cozmixng.org>
Signed-off-by: Neal Richardson <neal.p.richardson@gmail.com>
rtpsw pushed a commit to rtpsw/arrow that referenced this issue May 16, 2023
…mismatched libarrow (apache#35147)

I've significantly rewritten `r/configure` to make it easier to reason about and harder for issues like apache#34229 and apache#35140 to happen. I've also added a version check to make sure that we don't obviously try to use a system C++ library that doesn't match the R package version. Making sure this was applied in all of the right places and handling what to do if the versions didn't match was the impetus for the whole refactor. 

`configure` has been broken up into some functions, and the flow of the script is, as is documented at the top of the file:

```
# * Find libarrow on the system. If it is present, make sure
#   that its version is compatible with the R package.
# * If no suitable libarrow is found, download it (where allowed)
#   or build it from source.
# * Determine what features this libarrow has and what other
#   flags it requires, and set them in src/Makevars for use when
#   compiling the bindings.
# * Run a test program to confirm that arrow headers are found
```

All of the detection of CFLAGS and `-L` dirs etc. happen in one place now, and they all prefer using `pkg-config` to read from the libarrow build what libraries and flags it requires, rather than hard-coding. (autobrew is the only remaining exception, but I didn't feel like messing with that today.) This should make the builds more future proof, should make it so more build configurations work (e.g. I suspect that a static build in ARROW_HOME wouldn't have gotten picked up correctly because it didn't add `-larrow_bundled_dependencies` to the libs, but now it will), and it may eliminate the redundant `-l` and `-D` setting I've observed in some builds (not harmful but definitely sloppy).

Version checking has been added in an R script for ease of testing (and for easier handling of arithmetic), and there is an accompanying `test-check-versions.R` added. These are run on all the builds that use `ci/scripts/r_test.sh`. 

### Behavior changes

* If libarrow is found on the system (via ARROW_HOME, pkg-config, or brew), but the version does not match, it will not be used, and we will try a bundled build. This should mean that users installing a released version will never have libarrow version problems. 
* If both the found C++ library and R package are on matching dev versions (i.e. not identical given the x.y.z.9000 vs x+1.y.z-SNAPSHOT difference), it will proceed with a warning that you may need to rebuild if there are issues. This means that regular developers will see an extra message in the build output.
* autobrew is only used on a release version unless you set FORCE_AUTOBREW=true. This eliminates another source of version mismatches (C++ release version, R dev version).
* The path where you could set `LIB_DIR` and `INCLUDE_DIR` env vars has been removed. Use `ARROW_HOME` instead.

* Closes: apache#35140
* Closes: apache#31989

Lead-authored-by: Neal Richardson <neal.p.richardson@gmail.com>
Co-authored-by: Sutou Kouhei <kou@cozmixng.org>
Signed-off-by: Neal Richardson <neal.p.richardson@gmail.com>
@vkhodygo
Copy link

vkhodygo commented Jul 3, 2023

@nealrichardson just to make this clear: is it included into 12.0.1? Judging by the timestamps it should be, but the release files do not reflect these changes.

@nealrichardson
Copy link
Member

@vkhodygo no it is not, the PR that resolved this, #35147, was a significant rewrite and not a mere bugfix, so it was not included in the 12.0.1 patch release. But 13.0.0 should be coming in the next couple of weeks, and it will be in that.

@vkhodygo
Copy link

vkhodygo commented Jul 4, 2023

@nealrichardson Thanks for the info, do you think you could provide any more accurate ETA for v13.0.0? Also, is there a temporary fix for this? I could use 13.0.0dev, but I'm not sure it'll work properly.

@nealrichardson
Copy link
Member

@nealrichardson Thanks for the info, do you think you could provide any more accurate ETA for v13.0.0?

No, but it will be discussed on the dev@arrow.apache.org mailing list, so you can follow the discussion there.

Also, is there a temporary fix for this? I could use 13.0.0dev, but I'm not sure it'll work properly.

I suggested a workaround above, does that work for you?

@vkhodygo
Copy link

vkhodygo commented Jul 20, 2023

@nealrichardson

I suggested a workaround above, does that work for you?

Apologies, I decided to put this on hold and wait for the official release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants