Skip to content

Commit

Permalink
ARROW-7833: [R] Make install_arrow() actually install arrow
Browse files Browse the repository at this point in the history
`install_arrow()` now takes a few arguments, which let you

* Install the latest CRAN version (no arguments)
* Install the latest nightly build (`install_arrow(nightly = TRUE)`)
* Install with Linux C++ binaries by default (they are off by default in the current CRAN release), and conveniently change that setting without messing with env vars
* Ignore system-installed arrow packages by default (to ensure that R and C++ versions match)

It will also attempt to reload the package after installation, if `pkgload` is available.

The other important change in this patch is to make `LIBARROW_DOWNLOAD` off by default, in order to appease CRAN. This unfortunately makes Linux installation only "just work" if you have set an env var.

Other improvements in this patch:

* Rename the env var `LIBARROW_BINARY_DISTRO` to `LIBARROW_BINARY` (brevity, and it also takes boolean values to enable or disable binary downloading)
* Fix the default value setting of that variable
* Update installation guide and README accordingly
* Remove README.Rmd and just keep the static README.md. There's no value for us to have an R Markdown readme and have to worry about keeping it in sync; we have vignettes and help pages for examples.

Closes #6406 from nealrichardson/install-arrow-binary and squashes the following commits:

767668c <Neal Richardson> Script fixes
4a1550e <Neal Richardson> LIBARROW_BINARY on should entail download_ok
4710500 <Neal Richardson> Update docs for new configure reality
660d0e7 <Neal Richardson> LIBARROW_DOWNLOAD is false by default now
3e02b72 <Neal Richardson> Reload the package if loaded already
c731542 <Neal Richardson> Fix for the fix
42dc327 <Neal Richardson> Fix test setup
e48ecbb <Neal Richardson> Docs
e92f4b7 <Neal Richardson> Make these tests always run
87951bd <Neal Richardson> Update readme and add message
e886116 <Neal Richardson> Delete README.Rmd (keep static README.md)
bcd0237 <Neal Richardson> Update docs
a459b49 <Neal Richardson> Switch var name to LIBARROW_BINARY
705d28a <Neal Richardson> Fix default value of LIBARROW_BINARY_DISTRO
7ff1365 <Neal Richardson> Change install_arrow() to actually install

Authored-by: Neal Richardson <neal.p.richardson@gmail.com>
Signed-off-by: Neal Richardson <neal.p.richardson@gmail.com>
  • Loading branch information
nealrichardson committed Feb 13, 2020
1 parent 7bcdbb3 commit 1a5cb56
Show file tree
Hide file tree
Showing 11 changed files with 193 additions and 402 deletions.
1 change: 1 addition & 0 deletions r/DESCRIPTION
Expand Up @@ -44,6 +44,7 @@ Suggests:
hms,
knitr,
lubridate,
pkgload,
rmarkdown,
testthat,
tibble
Expand Down
2 changes: 1 addition & 1 deletion r/Makefile
Expand Up @@ -19,7 +19,7 @@ VERSION=$(shell grep ^Version DESCRIPTION | sed s/Version:\ //)
ARROW_R_DEV="TRUE"

doc:
R --slave -e 'rmarkdown::render("README.Rmd"); roxygen2::roxygenize()'
R --slave -e 'roxygen2::roxygenize()'
-git add --all man/*.Rd

test:
Expand Down
2 changes: 1 addition & 1 deletion r/NAMESPACE
Expand Up @@ -242,7 +242,7 @@ importFrom(tidyselect,vars_pull)
importFrom(tidyselect,vars_rename)
importFrom(tidyselect,vars_select)
importFrom(utils,head)
importFrom(utils,packageVersion)
importFrom(utils,install.packages)
importFrom(utils,tail)
importFrom(vctrs,s3_register)
importFrom(vctrs,vec_size)
Expand Down
142 changes: 66 additions & 76 deletions r/R/install-arrow.R
Expand Up @@ -15,95 +15,85 @@
# specific language governing permissions and limitations
# under the License.

#' Help installing the Arrow C++ library
#' Install or upgrade the Arrow library
#'
#' Binary package installations should come with a working Arrow C++ library,
#' but when installing from source, you'll need to obtain the C++ library
#' first. This function offers guidance on how to get the C++ library depending
#' on your operating system and package version.
#' Use this function to install the latest release of `arrow`, to switch to or
#' from a nightly development version, or on Linux to try reinstalling with
#' all necessary C++ dependencies.
#'
#' @param nightly logical: Should we install a development version of the
#' package, or should we install from CRAN (the default).
#' @param binary On Linux, value to set for the environment variable
#' `LIBARROW_BINARY`, which governs how C++ binaries are used, if at all.
#' The default value, `TRUE`, tells the installation script to detect the
#' Linux distribution and version and find an appropriate C++ library. `FALSE`
#' would tell the script not to retrieve a binary and instead build Arrow C++
#' from source. Other valid values are strings corresponding to a Linux
#' distribution-version, to override the value that would be detected.
#' See `vignette("install", package = "arrow")` for further details.
#' @param use_system logical: Should we use `pkg-config` to look for Arrow
#' system packages? Default is `FALSE`. If `TRUE`, source installation may be
#' faster, but there is a risk of version mismatch.
#' @param repos character vector of base URLs of the repositories to install
#' from (passed to `install.packages()`)
#' @param ... Additional arguments passed to `install.packages()`
#' @export
#' @importFrom utils packageVersion
#' @examples
#' install_arrow()
install_arrow <- function() {
os <- tolower(Sys.info()[["sysname"]])
# c("windows", "darwin", "linux", "sunos") # win/mac/linux/solaris
version <- packageVersion("arrow")
message(install_arrow_msg(arrow_available(), version, os))
#' @importFrom utils install.packages
#' @seealso [arrow_available()] to see if the package was configured with
#' necessary C++ dependencies. `vignette("install", package = "arrow")` for
#' more ways to tune installation on Linux.
install_arrow <- function(nightly = FALSE,
binary = TRUE,
use_system = FALSE,
repos = getOption("repos"),
...) {
if (tolower(Sys.info()[["sysname"]]) %in% c("windows", "darwin", "linux")) {
Sys.setenv(LIBARROW_DOWNLOAD = "true")
Sys.setenv(LIBARROW_BINARY = binary)
Sys.setenv(ARROW_USE_PKG_CONFIG = use_system)
install.packages("arrow", repos = arrow_repos(repos, nightly), ...)
if ("arrow" %in% loadedNamespaces()) {
# If you've just sourced this file, "arrow" won't be (re)loaded
reload_arrow()
}
} else {
# Solaris
message(SEE_README)
}
}

install_arrow_msg <- function(has_arrow, version, os) {
# TODO: check if there is a newer version on CRAN?
arrow_repos <- function(repos = getOption("repos"), nightly = FALSE) {
if (length(repos) == 0 || identical(repos, c(CRAN = "@CRAN@"))) {
# Set the default/CDN
repos <- "https://cloud.r-project.org/"
}
bintray <- getOption("arrow.dev.repo", "https://dl.bintray.com/ursalabs/arrow-r")
# Remove it if it's there (so nightly=FALSE won't accidentally pull from it)
repos <- setdiff(repos, bintray)
if (nightly) {
# Add it first
repos <- c(bintray, repos)
}
repos
}

# install_arrow() sends "version" as a "package_version" class, but for
# convenience, this also accepts a string like "0.13.0". Calling
# `package_version` is idempotent so do it again, and then `unclass` to get
# the integers. Then see how many there are.
dev_version <- length(unclass(package_version(version))[[1]]) > 3
# Based on these parameters, assemble a string with installation advice
if (has_arrow) {
# Respond that you already have it
msg <- ALREADY_HAVE
} else if (os == "sunos") {
# Good luck with that.
msg <- c(SEE_DEV_GUIDE, THEN_REINSTALL)
} else if (os == "linux") {
if (dev_version) {
# Point to compilation instructions on readme
msg <- c(SEE_DEV_GUIDE, THEN_REINSTALL)
reload_arrow <- function() {
if (requireNamespace("pkgload", quietly = TRUE)) {
is_attached <- "package:arrow" %in% search()
pkgload::unload("arrow")
if (is_attached) {
require("arrow", character.only = TRUE, quietly = TRUE)
} else {
# Suggest arrow.apache.org/install, or compilation instructions
msg <- c(paste(SEE_ARROW_INSTALL, OR_SEE_DEV_GUIDE), THEN_REINSTALL)
requireNamespace("arrow", quietly = TRUE)
}
} else {
# We no longer allow builds without libarrow on macOS or Windows so this
# case shouldn't happen
msg <- ""
message("Please restart R to use the 'arrow' package.")
}
# Common postscript
msg <- c(msg, SEE_README, REPORT_ISSUE)
paste(msg, collapse="\n\n")
}

ALREADY_HAVE <- paste(
"It appears you already have Arrow installed successfully:",
"are you trying to install a different version of the library?"
)

SEE_DEV_GUIDE <- paste(
"See the Arrow developer guide",
"<https://arrow.apache.org/docs/developers/index.html>",
"for instructions on building the C++ library from source."
)
# Variation of that
OR_SEE_DEV_GUIDE <- paste0(
"Or, s",
substr(SEE_DEV_GUIDE, 2, nchar(SEE_DEV_GUIDE))
)

SEE_ARROW_INSTALL <- paste(
"See the Apache Arrow project installation page",
"<https://arrow.apache.org/install/>",
"to find pre-compiled binary packages for some common Linux distributions,",
"including Debian, Ubuntu, and CentOS. You'll need to install",
"'libparquet-dev' on Debian and Ubuntu, or 'parquet-devel' on CentOS. This",
"will also automatically install the Arrow C++ library as a dependency."
)

THEN_REINSTALL <- paste(
"After you've installed the C++ library,",
"you'll need to reinstall the R package from source to find it."
)

SEE_README <- paste(
"Refer to the R package README",
"<https://github.com/apache/arrow/blob/master/r/README.md>",
"and `vignette('install', package = 'arrow')`",
"for further details."
)

REPORT_ISSUE <- paste(
"If you have other trouble, or if you think this message could be improved,",
"please report an issue here:",
"<https://issues.apache.org/jira/projects/ARROW/issues>"
"for installation guidance."
)
200 changes: 0 additions & 200 deletions r/README.Rmd

This file was deleted.

0 comments on commit 1a5cb56

Please sign in to comment.