Skip to content

Commit

Permalink
apacheGH-37945: [R] Update developer documentation (apache#38220)
Browse files Browse the repository at this point in the history
### Rationale for this change

Several PRs over the last few months have update the build system to be more friendly for developers. During this process it has also come to light that we haven't supported the Windows development setup documented here since R 4.1 (released in spring 2021). I had to remove Windows from the test-r-devdocs job because the approach used there was not compatible with the `setup-r@ v2` action, and the job was failing with the `@ v1` action.

### What changes are included in this PR?

- Updated the sections on using pre-built static libraries and bundled builds
- Removed the Windows section regarding the bundled build. This section would need rewriting to support the last two minor releases of R but in the meantime I think it is mostly confusing.

### Are these changes tested?

They are documentation changes. They are also slightly optimisitc: we can fix problems with the developer setup incrementally between releases, but it's more difficult to update our documentation. This PR documents the intended behaviour after apache#38236 .

### Are there any user-facing changes?

No.
* Closes: apache#37945

Lead-authored-by: Dewey Dunnington <dewey@voltrondata.com>
Co-authored-by: Dewey Dunnington <dewey@fishandwhistle.net>
Co-authored-by: Jacob Wujciak-Jens <jacob@wujciak.de>
Signed-off-by: Dewey Dunnington <dewey@fishandwhistle.net>
  • Loading branch information
2 people authored and dgreiss committed Feb 17, 2024
1 parent 5266051 commit 29e5b01
Showing 1 changed file with 26 additions and 110 deletions.
136 changes: 26 additions & 110 deletions r/vignettes/developers/setup.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -38,50 +38,32 @@ set -e
set -x
```


```{bash, save=run & windows, hide=TRUE}
# For some reason CRAN Mirror goes missing in CI
echo 'options(repos=structure(c(CRAN="https://cloud.r-project.org")))' > $HOME/.Rprofile
```

Windows and macOS users who wish to contribute to the R package and
don't need to alter libarrow (Arrow's C++ library) may be able to obtain a
recent version of the library without building from source.

### Linux

On Linux, you can download a .zip file containing libarrow from the
[nightly repository](https://nightlies.apache.org/arrow/r/libarrow/bin/).

The directory names correspond to the OpenSSL version the binaries built with:
- "linux-openssl-1.0" (OpenSSL 1.0)
- "linux-openssl-1.1" (OpenSSL 1.1)
- "linux-openssl-3.0" (OpenSSL 3.0)

Version numbers in that repository correspond to dates.

You'll need to create a `libarrow` directory inside the R package directory and unzip the zip file containing the compiled libarrow binary files into it.

### macOS
On macOS, you can install libarrow using [Homebrew](https://brew.sh/):

```bash
# For the released version:
brew install apache-arrow
# Or for a development version, you can try:
brew install apache-arrow --HEAD
```

### Windows

On Windows, you can download a .zip file containing libarrow from the
[nightly repository](https://nightlies.apache.org/arrow/r/libarrow/bin/windows/).

Version numbers in that repository correspond to dates.

You can set the `RWINLIB_LOCAL` environment variable to point to the zip file containing libarrow before installing the arrow R package.

## R and C++
The Arrow R package is unique compared to other R packages that you may have
contributed to because it builds on top of the large and feature-rich Arrow C++
implementation. Because the R package integrates tightly with Arrow C++,
it typically requires a dedicated copy of the library (i.e., it is usually
not possible to link to a system version of libarrow during development).

## Option 1: Using nightly libarrow binaries

On Linux, MacOS, and Windows you can use the same workflow you might use for another
package that contains compiled code (e.g., `R CMD INSTALL .` from
a terminal, `devtools::load_all()` from an R prompt, or `Install & Restart` from
RStudio). If the `arrow/r/libarrow` directory is not populated, the configure script will
attempt to download the latest nightly libarrow binary, extract it to the
`arrow/r/libarrow` directory (MacOS, Linux) or `arrow/r/windows`
directory (Windows), and continue building the R package as usual.

Most of the time, you won't need to update your version of libarrow because
the R package rarely changes with updates to the C++ library; however, if you
start to get errors when rebuilding the R package, you may have to remove the
`libarrow` directory (MacOS, Linux) or `windows` directory (Windows)
and do a "clean" rebuild. You can do this from a terminal with
`R CMD INSTALL . --preclean`, from RStudio using the "Clean and Install"
option from "Build" tab, or using `make clean` if you are using the `Makefile`
located in the root of the R package.

## Option 2: Use a local Arrow C++ development build

If you need to alter both libarrow and the R package code, or if you can't get a binary version of the latest libarrow elsewhere, you'll need to build it from source. This section discusses how to set up a C++ libarrow build configured to work with the R package. For more general resources, see the [Arrow C++ developer guide](https://arrow.apache.org/docs/developers/cpp/building.html).

Expand All @@ -103,43 +85,6 @@ sudo apt install -y cmake libcurl4-openssl-dev libssl-dev
brew install cmake openssl
```

#### Windows

The package can be built on Windows using [RTools 4](https://cran.r-project.org/bin/windows/Rtools/). It can be built for mingw32 (i386), mingw64 (x64), or ucrt64 (UCRT x64). mingw64 is the recommended 64-bit installation.

Open the corresponding RTools Bash, for example "Rtools MinGW 64-bit" for mingw64.

Install CMake, ccache, and Ninja with:

```{bash, save=run & windows}
pacman --sync --refresh --noconfirm \
${MINGW_PACKAGE_PREFIX}-{ccache,cmake,ninja,openssl}
export CMAKE_GENERATOR=Ninja
```

You will need to add R to your path. For a user-level installation, R will be at something like `~/Documents/R/R-4.1.2/bin`. For a global installation, R will be at something like `/c/Program\ Files/R/R-4.1.2/bin`. The R on your path needs to match the architecture you are compiling for, so if you are compiling on 32-bit specify `.../bin/i386` instead of `.../bin/x64`.

```{bash}
export PATH=~/Documents/R/R-4.1.2/bin/x64:$PATH
```

You can install additional dependencies like so. Note that you are limited to the packages in [the RTools repo](https://github.com/r-windows/rtools-packages), which does not contain every dependency used by Arrow.

```{bash, save=run & windows}
pacman --sync --refresh --noconfirm \
${MINGW_PACKAGE_PREFIX}-boost \
${MINGW_PACKAGE_PREFIX}-brotli \
${MINGW_PACKAGE_PREFIX}-lz4 \
${MINGW_PACKAGE_PREFIX}-protobuf \
${MINGW_PACKAGE_PREFIX}-snappy \
${MINGW_PACKAGE_PREFIX}-thrift \
${MINGW_PACKAGE_PREFIX}-zlib \
${MINGW_PACKAGE_PREFIX}-zstd \
${MINGW_PACKAGE_PREFIX}-aws-sdk-cpp \
${MINGW_PACKAGE_PREFIX}-re2 \
${MINGW_PACKAGE_PREFIX}-libutf8proc
```

### Step 2 - Configure the libarrow build

We recommend that you configure libarrow to be built to a user-level directory rather than a system directory for your development work. This is so that the development version you are using doesn't overwrite a released version of libarrow you may already have installed, and so that you are also able work with more than one version of libarrow (by using different `ARROW_HOME` directories for the different versions).
Expand All @@ -158,13 +103,6 @@ export LD_LIBRARY_PATH=$ARROW_HOME/lib:$LD_LIBRARY_PATH
echo "export LD_LIBRARY_PATH=$ARROW_HOME/lib:$LD_LIBRARY_PATH" >> ~/.bash_profile
```

_Special instructions on Windows:_ You will need to add `$ARROW_HOME/bin` to your `PATH` if you are using dynamic libraries (which is recommended).

```{bash, save=run & windows}
export PATH=$ARROW_HOME/bin:$PATH
echo "export PATH=\"$ARROW_HOME/bin:$PATH\"" >> ~/.bash_profile
```

Start by navigating in a terminal to the arrow repository. You will need to create a directory into which the C++ build will put its contents. We recommend that you make a `build` directory inside of the `cpp` directory of the Arrow git repository (it is git-ignored, so you won't accidentally check it in). Next, change directories to be inside `cpp/build`:

```{bash, save=run & !sys_install}
Expand Down Expand Up @@ -197,32 +135,10 @@ cmake \
..
```

##### Windows

```{bash, save=run & !sys_install & windows}
cmake \
-DCMAKE_INSTALL_PREFIX=$ARROW_HOME \
-DCMAKE_INSTALL_LIBDIR=lib \
-DARROW_COMPUTE=ON \
-DARROW_CSV=ON \
-DARROW_DATASET=ON \
-DARROW_EXTRA_ERROR_CONTEXT=ON \
-DARROW_FILESYSTEM=ON \
-DARROW_MIMALLOC=ON \
-DARROW_JSON=ON \
-DARROW_PARQUET=ON \
-DARROW_WITH_SNAPPY=OFF \
-DARROW_WITH_ZLIB=ON \
..
```

#### {-}

`..` refers to the C++ source directory: you're in `cpp/build` and the source is in `cpp`.

**For Windows**: some options, including `-DARROW_JEMALLOC`, are not supported on Windows.


```{bash, save=run & !sys_install, hide=TRUE}
# For testing purposes, build with only shared libraries
cmake \
Expand Down

0 comments on commit 29e5b01

Please sign in to comment.