Skip to content

Commit

Permalink
ARROW-8103: [R] Make default Linux build more minimal
Browse files Browse the repository at this point in the history
The status quo is that on Linux, R package installation will result in a shell package that tells you to install C++ libraries and reinstall, unless you either (1) have already installed system packages or a dev build, or (2) set an environment variable (either `LIBARROW_DOWNLOAD=true`, or `NOT_CRAN=true` also works because many R users have that set locally already), in which case installation will look for a binary to download and fall back to downloading and building C++ from source.

This patch changes the default behavior when no system packages are found: instead of building without Arrow C++, it will attempt to download and build the C++ dependencies from source, with optional features (compression libraries _but not jemalloc_) turned off. Setting an environment variable `LIBARROW_MINIMAL=false` will turn on those optional features, or they can be toggled individually with cmake-flavored env vars too (e.g. `ARROW_WITH_SNAPPY=ON`). If `LIBARROW_DOWNLOAD=true` or `NOT_CRAN=true` are set, as before it will look for a binary to download, and if not found, it will build from source with `LIBARROW_MINIMAL=false`.

In sum, this means that we will always be attempting to build the C++ libs from source on CRAN (and thus on everyone's Linux machines, assuming they don't already have Arrow C++ libs), as minimally as we can while still resulting in a fully functional `arrow` package. The fallback shell package that doesn't do anything should only happen if the C++ build fails for some reason, which we have found to be highly unlikely now that we've eliminated the flex and bison dependencies--having enough of a C++ toolchain for R and Rcpp is sufficient.

Closes #6647 from nealrichardson/build-on-cran

Authored-by: Neal Richardson <neal.p.richardson@gmail.com>
Signed-off-by: Neal Richardson <neal.p.richardson@gmail.com>
  • Loading branch information
nealrichardson committed Mar 21, 2020
1 parent 54b87c8 commit b57b955
Show file tree
Hide file tree
Showing 13 changed files with 207 additions and 64 deletions.
6 changes: 6 additions & 0 deletions ci/scripts/r_deps.sh
Original file line number Diff line number Diff line change
Expand Up @@ -30,3 +30,9 @@ ${R_BIN} -e "remotes::install_deps(dependencies = TRUE)"
${R_BIN} -e "remotes::install_github('nealrichardson/decor')"

popd

if [ "$RPREFIX" = "/opt/R-devel" ]; then
# We need this on R-devel, which we test on rhub images, which have this env var set
curl -L https://sourceforge.net/projects/checkbaskisms/files/2.0.0.2/checkbashisms/download > /usr/local/bin/checkbashisms
chmod 755 /usr/local/bin/checkbashisms
fi
5 changes: 4 additions & 1 deletion ci/scripts/r_test.sh
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,8 @@ export _R_CHECK_COMPILATION_FLAGS_KNOWN_=${ARROW_R_CXXFLAGS}
if [ "$ARROW_R_DEV" = "TRUE" ]; then
# These are used in the Arrow C++ build and are not a problem
export _R_CHECK_COMPILATION_FLAGS_KNOWN_="${_R_CHECK_COMPILATION_FLAGS_KNOWN_} -Wno-attributes -msse4.2"
# Note that NOT_CRAN=true means (among other things) that optional dependencies are built
export NOT_CRAN=true
fi
export TEST_R_WITH_ARROW=TRUE
export _R_CHECK_TESTS_NLINES_=0
Expand All @@ -46,7 +48,8 @@ BEFORE=$(ls -alh ~/)

# Conditionally run --as-cran because crossbow jobs aren't using _R_CHECK_COMPILATION_FLAGS_KNOWN_
# (maybe an R version thing, needs 3.6.2?)
${R_BIN} -e "rcmdcheck::rcmdcheck(build_args = '--no-build-vignettes', args = c('--no-manual', if (!identical(Sys.getenv('ARROW_R_DEV'), 'TRUE')) '--as-cran', '--ignore-vignettes', '--run-donttest'), error_on = 'warning', check_dir = 'check')"
# Also only --run-donttest if NOT_CRAN because Parquet example requires snappy (optional dependency)
${R_BIN} -e "cran <- !identical(tolower(Sys.getenv('NOT_CRAN')), 'true'); rcmdcheck::rcmdcheck(build_args = '--no-build-vignettes', args = c('--no-manual', '--ignore-vignettes', ifelse(cran, '--as-cran', '--run-donttest')), error_on = 'warning', check_dir = 'check')"

AFTER=$(ls -alh ~/)
if [ "$BEFORE" != "$AFTER" ]; then
Expand Down
3 changes: 2 additions & 1 deletion dev/tasks/r/azure.linux.yml
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,8 @@ jobs:
R_IMAGE={{ r_image }}
R_TAG={{ r_tag }}
# we have to export this (right?) because we need it in the build env
export ARROW_R_DEV={{ verbose }}
export ARROW_R_DEV={{ not_cran }}
# Note that ci/scripts/r_test.sh sets NOT_CRAN=true if ARROW_R_DEV=TRUE
docker-compose run r
displayName: Docker run
Expand Down
67 changes: 67 additions & 0 deletions dev/tasks/r/github.linux.cran.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.

# NOTE: must set "Crossbow" as name to have the badge links working in the
# github comment reports!
name: Crossbow

on:
push:
branches:
- "*-github-*"

jobs:
as-cran:
name: "rhub/{{ MATRIX }}"
runs-on: ubuntu-latest
strategy:
fail-fast: false
matrix:
# See https://hub.docker.com/r/rhub
r_image:
- debian-gcc-devel
- debian-gcc-patched
- debian-gcc-release
- fedora-gcc-devel
- fedora-clang-devel
env:
R_ORG: "rhub"
R_IMAGE: {{ MATRIX }}
R_TAG: "latest"
ARROW_R_DEV: "FALSE"
steps:
- name: Checkout Arrow
run: |
git clone --no-checkout {{ arrow.remote }} arrow
git -C arrow fetch -t {{ arrow.remote }} {{ arrow.branch }}
git -C arrow checkout FETCH_HEAD
git -C arrow submodule update --init --recursive
- name: Fetch Submodules and Tags
shell: bash
run: cd arrow && ci/scripts/util_checkout.sh
- name: Docker Pull
shell: bash
run: cd arrow && docker-compose pull --ignore-pull-failures r
- name: Docker Build
shell: bash
run: cd arrow && docker-compose build r
- name: Docker Run
shell: bash
run: cd arrow && docker-compose run r
- name: Dump install logs on failure
if: failure()
run: cat arrow/r/check/arrow.Rcheck/00install.out
34 changes: 15 additions & 19 deletions dev/tasks/tasks.yml
Original file line number Diff line number Diff line change
Expand Up @@ -104,8 +104,8 @@ groups:
- test-ubuntu-18.04-python-3
- test-fedora-30-python-3
- test-r-rhub-ubuntu-gcc-release
- test-r-rhub-debian-gcc-devel
- test-r-rocker-r-base-latest
- test-r-linux-as-cran
- test-r-rocker-r-base-latest
- test-r-rstudio-r-base-3.6-bionic
- test-r-rstudio-r-base-3.6-centos6
- test-r-rstudio-r-base-3.6-opensuse15
Expand Down Expand Up @@ -149,7 +149,7 @@ groups:
- test-ubuntu-18.04-python-3
- test-fedora-30-python-3
- test-r-rhub-ubuntu-gcc-release
- test-r-rhub-debian-gcc-devel
- test-r-linux-as-cran
- test-r-rocker-r-base-latest
- test-r-rstudio-r-base-3.6-bionic
- test-r-rstudio-r-base-3.6-centos6
Expand Down Expand Up @@ -187,7 +187,7 @@ groups:
- test-conda-r-3.6
- test-ubuntu-18.04-r-sanitizer
- test-r-rhub-ubuntu-gcc-release
- test-r-rhub-debian-gcc-devel
- test-r-linux-as-cran
- test-r-rocker-r-base-latest
- test-r-rstudio-r-base-3.6-bionic
- test-r-rstudio-r-base-3.6-centos6
Expand Down Expand Up @@ -276,7 +276,7 @@ groups:
- test-ubuntu-18.04-python-3
- test-fedora-30-python-3
- test-r-rhub-ubuntu-gcc-release
- test-r-rhub-debian-gcc-devel
- test-r-linux-as-cran
- test-r-rocker-r-base-latest
- test-r-rstudio-r-base-3.6-bionic
- test-r-rstudio-r-base-3.6-centos6
Expand Down Expand Up @@ -1850,15 +1850,12 @@ tasks:
- docker-compose build fedora-python
- docker-compose run fedora-python

test-r-rhub-debian-gcc-devel:
ci: azure
test-r-linux-as-cran:
ci: github
platform: linux
template: r/azure.linux.yml
template: r/github.linux.cran.yml
params:
r_org: rhub
r_image: debian-gcc-devel
r_tag: latest
verbose: "TRUE"
MATRIX: "${{ matrix.r_image }}"

test-r-rhub-ubuntu-gcc-release:
ci: azure
Expand All @@ -1868,7 +1865,7 @@ tasks:
r_org: rhub
r_image: ubuntu-gcc-release
r_tag: latest
verbose: "TRUE"
not_cran: "TRUE"

test-r-rocker-r-base-latest:
ci: azure
Expand All @@ -1878,7 +1875,7 @@ tasks:
r_org: rocker
r_image: r-base
r_tag: latest
verbose: "TRUE"
not_cran: "TRUE"

test-r-rstudio-r-base-3.6-bionic:
ci: azure
Expand All @@ -1888,8 +1885,7 @@ tasks:
r_org: rstudio
r_image: r-base
r_tag: 3.6-bionic
# Turn off verbosity on some tests in order to check compiler flags
verbose: "FALSE"
not_cran: "TRUE"

test-r-rstudio-r-base-3.6-centos6:
ci: azure
Expand All @@ -1899,7 +1895,7 @@ tasks:
r_org: rstudio
r_image: r-base
r_tag: 3.6-centos6
verbose: "FALSE"
not_cran: "TRUE"

test-r-rstudio-r-base-3.6-opensuse15:
ci: azure
Expand All @@ -1909,7 +1905,7 @@ tasks:
r_org: rstudio
r_image: r-base
r_tag: 3.6-opensuse15
verbose: "TRUE"
not_cran: "TRUE"

test-r-rstudio-r-base-3.6-opensuse42:
ci: azure
Expand All @@ -1919,7 +1915,7 @@ tasks:
r_org: rstudio
r_image: r-base
r_tag: 3.6-opensuse24
verbose: "TRUE"
not_cran: "TRUE"

test-ubuntu-18.04-r-3.6:
ci: circle
Expand Down
1 change: 0 additions & 1 deletion docker-compose.yml
Original file line number Diff line number Diff line change
Expand Up @@ -824,7 +824,6 @@ services:
base: ${R_ORG}/${R_IMAGE}:${R_TAG}
shm_size: *shm-size
environment:
NOT_CRAN: "true"
LIBARROW_DOWNLOAD: "false"
ARROW_HOME: "/arrow"
ARROW_USE_PKG_CONFIG: "false"
Expand Down
13 changes: 10 additions & 3 deletions r/R/install-arrow.R
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,9 @@
#' @param use_system logical: Should we use `pkg-config` to look for Arrow
#' system packages? Default is `FALSE`. If `TRUE`, source installation may be
#' faster, but there is a risk of version mismatch.
#' @param minimal logical: If building from source, should we build without
#' optional dependencies (compression libraries, for example)? Default is
#' `FALSE`.
#' @param repos character vector of base URLs of the repositories to install
#' from (passed to `install.packages()`)
#' @param ... Additional arguments passed to `install.packages()`
Expand All @@ -45,12 +48,16 @@
install_arrow <- function(nightly = FALSE,
binary = Sys.getenv("LIBARROW_BINARY", TRUE),
use_system = Sys.getenv("ARROW_USE_PKG_CONFIG", FALSE),
minimal = Sys.getenv("LIBARROW_MINIMAL", FALSE),
repos = getOption("repos"),
...) {
if (tolower(Sys.info()[["sysname"]]) %in% c("windows", "darwin", "linux")) {
Sys.setenv(LIBARROW_DOWNLOAD = "true")
Sys.setenv(LIBARROW_BINARY = binary)
Sys.setenv(ARROW_USE_PKG_CONFIG = use_system)
Sys.setenv(
LIBARROW_DOWNLOAD = "true",
LIBARROW_BINARY = binary,
LIBARRWOW_MINIMAL = minimal,
ARROW_USE_PKG_CONFIG = use_system
)
install.packages("arrow", repos = arrow_repos(repos, nightly), ...)
if ("arrow" %in% loadedNamespaces()) {
# If you've just sourced this file, "arrow" won't be (re)loaded
Expand Down
6 changes: 3 additions & 3 deletions r/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,8 +33,8 @@ install.packages("arrow")
Installing a released version of the `arrow` package should require no
additional system dependencies. For macOS and Windows, CRAN hosts binary
packages that contain the Arrow C++ library. On Linux, source package
installation will download necessary C++ dependencies if you set the
environment variable `LIBARROW_DOWNLOAD=true`.
installation will also build necessary C++ dependencies. For a faster,
more complete installation, set the environment variable `NOT_CRAN=true`.
See `vignette("install", package = "arrow")` for details.

If you install the `arrow` package from source and the C++ library is
Expand All @@ -61,7 +61,7 @@ Conda users on Linux and macOS can install `arrow` from conda-forge with

## Installing a development version

Binary R packages for macOS and Windows are built daily and hosted at
Development versions of the package (binary and source) are built daily and hosted at
<https://dl.bintray.com/ursalabs/arrow-r/>. To install from there:

``` r
Expand Down
19 changes: 15 additions & 4 deletions r/configure
Original file line number Diff line number Diff line change
Expand Up @@ -112,15 +112,26 @@ else
PKG_CFLAGS="-I$BREWDIR/opt/$PKG_BREW_NAME/include"
PKG_LIBS="-L$BREWDIR/opt/$PKG_BREW_NAME/lib $PKG_LIBS"
elif [ "$UNAME" = "Linux" ]; then
# Set some default values/backwards compatibility
if [ "${LIBARROW_DOWNLOAD}" = "" ] && [ "${NOT_CRAN}" != "" ]; then
export LIBARROW_DOWNLOAD=$NOT_CRAN
fi
if [ "${LIBARROW_BINARY}" = "" ] && [ "${NOT_CRAN}" != "" ]; then
export LIBARROW_BINARY=$NOT_CRAN
fi
if [ "${LIBARROW_MINIMAL}" = "" ] && [ "${LIBARROW_DOWNLOAD}" = "true" ]; then
export LIBARROW_MINIMAL=false
fi
if [ "${LIBARROW_MINIMAL}" = "" ] && [ "${NOT_CRAN}" = "true" ]; then
export LIBARROW_MINIMAL=false
fi
${R_HOME}/bin/Rscript tools/linuxlibs.R $VERSION
PKG_CFLAGS="-I$(pwd)/libarrow/arrow-${VERSION}/include $PKG_CFLAGS"
PKG_LIBS="-L$(pwd)/libarrow/arrow-${VERSION}/lib $PKG_LIBS"
# Also enumerate the static libs included in there
# TODO: this should be generated based on what's in the lib dir
PKG_LIBS="$PKG_LIBS -lthrift -lsnappy -lz -lzstd -llz4 -lbrotlidec-static -lbrotlienc-static -lbrotlicommon-static -lboost_filesystem -lboost_regex -lboost_system -ljemalloc_pic"
# Also enumerate the static libs (technically repeating arrow libs so they're in the right order)
# TODO: there must be a better way; also what about non-bundled deps?
BUNDLED_LIBS=`cd libarrow/arrow-${VERSION}/lib && ls *.a`
BUNDLED_LIBS=`echo $BUNDLED_LIBS | sed -E "s/lib(.*)\.a/-l\1/" | sed -e "s/\\.a lib/ -l/g"`
PKG_LIBS="-L$(pwd)/libarrow/arrow-${VERSION}/lib $PKG_LIBS $BUNDLED_LIBS"
fi
fi
fi
Expand Down
32 changes: 24 additions & 8 deletions r/inst/build_arrow_static.sh
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,16 @@ if [ "$CMAKE_GENERATOR" = "" ]; then
fi
fi

if [ "$LIBARROW_MINIMAL" = "false" ]; then
ARROW_JEMALLOC=ON
ARROW_WITH_BROTLI=ON
ARROW_WITH_BZ2=ON
ARROW_WITH_LZ4=ON
ARROW_WITH_SNAPPY=ON
ARROW_WITH_ZLIB=ON
ARROW_WITH_ZSTD=ON
fi

mkdir -p "${BUILD_DIR}"
pushd "${BUILD_DIR}"
${CMAKE} -DARROW_BOOST_USE_SHARED=OFF \
Expand All @@ -52,17 +62,17 @@ ${CMAKE} -DARROW_BOOST_USE_SHARED=OFF \
-DARROW_COMPUTE=ON \
-DARROW_CSV=ON \
-DARROW_DATASET=ON \
-DARROW_DEPENDENCY_SOURCE=BUNDLED \
-DARROW_DEPENDENCY_SOURCE=${ARROW_DEPENDENCY_SOURCE:-AUTO} \
-DARROW_FILESYSTEM=ON \
-DARROW_JEMALLOC=ON \
-DARROW_JEMALLOC=${ARROW_JEMALLOC:-ON} \
-DARROW_JSON=ON \
-DARROW_PARQUET=ON \
-DARROW_WITH_BROTLI=ON \
-DARROW_WITH_BZ2=ON \
-DARROW_WITH_LZ4=ON \
-DARROW_WITH_SNAPPY=ON \
-DARROW_WITH_ZLIB=ON \
-DARROW_WITH_ZSTD=ON \
-DARROW_WITH_BROTLI=${ARROW_WITH_BROTLI:-OFF} \
-DARROW_WITH_BZ2=${ARROW_WITH_BZ2:-OFF} \
-DARROW_WITH_LZ4=${ARROW_WITH_LZ4:-OFF} \
-DARROW_WITH_SNAPPY=${ARROW_WITH_SNAPPY:-OFF} \
-DARROW_WITH_ZLIB=${ARROW_WITH_ZLIB:-OFF} \
-DARROW_WITH_ZSTD=${ARROW_WITH_ZSTD:-OFF} \
-DCMAKE_BUILD_TYPE=Release \
-DCMAKE_INSTALL_LIBDIR=lib \
-DCMAKE_INSTALL_PREFIX=${DEST_DIR} \
Expand All @@ -83,4 +93,10 @@ fi
# Copy the bundled static libs from the build to the install dir
# See https://issues.apache.org/jira/browse/ARROW-7499 for moving this to CMake
find . -regex .*/.*/lib/.*\\.a\$ | xargs -I{} cp -u {} ${DEST_DIR}/lib
# jemalloc makes both libjemalloc.a and libjemalloc_pic.a; we can't use the former, only the latter
rm ${DEST_DIR}/lib/libjemalloc.a || true
# -lbrotlicommon-static needs to come after the other brotli libs, so rename it so alpha sort works
if [ -f "${DEST_DIR}/lib/libbrotlicommon-static.a" ]; then
mv "${DEST_DIR}/lib/libbrotlicommon-static.a" "${DEST_DIR}/lib/libbrotlizzz-static.a"
fi
popd
5 changes: 5 additions & 0 deletions r/man/install_arrow.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading

0 comments on commit b57b955

Please sign in to comment.