Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion r/R/record-batch.R
Original file line number Diff line number Diff line change
Expand Up @@ -46,9 +46,10 @@
#' - `$column(i)`: Extract an `Array` by integer position from the batch
#' - `$column_name(i)`: Get a column's name by integer position
#' - `$names()`: Get all column names (called by `names(batch)`)
#' - `$RenameColumns(value)`: Set all column names (called by `names(batch) <- value`)
#' - `$GetColumnByName(name)`: Extract an `Array` by string name
#' - `$RemoveColumn(i)`: Drops a column from the batch by integer position
#' - `$selectColumns(indices)`: Return a new record batch with a selection of columns, expressed as 0-based integers.
#' - `$SelectColumns(indices)`: Return a new record batch with a selection of columns, expressed as 0-based integers.
#' - `$Slice(offset, length = NULL)`: Create a zero-copy view starting at the
#' indicated integer offset and going for the given length, or to the end
#' of the table if `NULL`, the default.
Expand Down
1 change: 1 addition & 0 deletions r/R/table.R
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,7 @@
#'
#' - `$column(i)`: Extract a `ChunkedArray` by integer position from the table
#' - `$ColumnNames()`: Get all column names (called by `names(tab)`)
#' - `$RenameColumns(value)`: Set all column names (called by `names(tab) <- value`)
#' - `$GetColumnByName(name)`: Extract a `ChunkedArray` by string name
#' - `$field(i)`: Extract a `Field` from the table schema by integer position
#' - `$SelectColumns(indices)`: Return new `Table` with specified columns, expressed as 0-based integers.
Expand Down
10 changes: 6 additions & 4 deletions r/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -97,7 +97,7 @@ For the R package, you'll need to enable several features in the C++ library
using `-D` flags:

```
cmake
cmake \
-DARROW_COMPUTE=ON \
-DARROW_CSV=ON \
-DARROW_DATASET=ON \
Expand All @@ -106,6 +106,7 @@ cmake
-DARROW_JSON=ON \
-DARROW_PARQUET=ON \
-DCMAKE_BUILD_TYPE=release \
-DARROW_INSTALL_NAME_RPATH=OFF \
..
```

Expand All @@ -125,7 +126,6 @@ If you want to enable support for compression libraries, add some or all of thes
Other flags that may be useful:

* `-DARROW_EXTRA_ERROR_CONTEXT=ON` makes errors coming from the C++ library point to files and line numbers
* `-DARROW_INSTALL_NAME_RPATH=OFF` may be needed on macOS if there are problems at link time
* `-DBOOST_SOURCE=BUNDLED`, for example, or any other dependency `*_SOURCE`, if you have a system version of a C++ dependency that doesn't work correctly with Arrow. This tells the build to compile its own version of the dependency from source.

Note that after any change to the C++ library, you must reinstall it and
Expand Down Expand Up @@ -161,8 +161,10 @@ If the package fails to install/load with an error like this:
unable to load shared object '/Users/you/R/00LOCK-r/00new/arrow/libs/arrow.so':
dlopen(/Users/you/R/00LOCK-r/00new/arrow/libs/arrow.so, 6): Library not loaded: @rpath/libarrow.14.dylib

try setting the environment variable `R_LD_LIBRARY_PATH` to wherever
Arrow C++ was put in `make install`, e.g. `export
ensure that `-DARROW_INSTALL_NAME_RPATH=OFF` was passed (this is important on
macOS to prevent problems at link time and is a no-op on other platforms).
Alternativelly, try setting the environment variable `R_LD_LIBRARY_PATH` to
wherever Arrow C++ was put in `make install`, e.g. `export
R_LD_LIBRARY_PATH=/usr/local/lib`, and retry installing the R package.

When installing from source, if the R and C++ library versions do not
Expand Down
9 changes: 8 additions & 1 deletion r/_pkgdown.yml
Original file line number Diff line number Diff line change
Expand Up @@ -85,7 +85,7 @@ reference:
- title: C++ reader/writer interface
contents:
- ParquetFileReader
- ParquetReaderProperties
- ParquetArrowReaderProperties
- ParquetFileWriter
- ParquetWriterProperties
- FeatherReader
Expand Down Expand Up @@ -143,10 +143,17 @@ reference:
- compression
- Codec
- codec_is_available
- title: Computation
contents:
- match_arrow
- title: Configuration
contents:
- arrow_info
- cpu_count
- arrow_available
- install_arrow
- install_pyarrow

repo:
url:
source: https://github.com/apache/arrow/blob/master/r/
3 changes: 2 additions & 1 deletion r/man/RecordBatch.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions r/man/Table.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

98 changes: 49 additions & 49 deletions r/vignettes/dataset.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -131,31 +131,31 @@ ds
```
```{r, echo = FALSE, eval = !file.exists("nyc-taxi")}
cat("
## FileSystemDataset with 125 Parquet files
## vendor_id: string
## pickup_at: timestamp[us]
## dropoff_at: timestamp[us]
## passenger_count: int8
## trip_distance: float
## pickup_longitude: float
## pickup_latitude: float
## rate_code_id: string
## store_and_fwd_flag: string
## dropoff_longitude: float
## dropoff_latitude: float
## payment_type: string
## fare_amount: float
## extra: float
## mta_tax: float
## tip_amount: float
## tolls_amount: float
## total_amount: float
## improvement_surcharge: float
## pickup_location_id: int32
## dropoff_location_id: int32
## congestion_surcharge: float
## year: int32
## month: int32
FileSystemDataset with 125 Parquet files
vendor_id: string
pickup_at: timestamp[us]
dropoff_at: timestamp[us]
passenger_count: int8
trip_distance: float
pickup_longitude: float
pickup_latitude: float
rate_code_id: string
store_and_fwd_flag: string
dropoff_longitude: float
dropoff_latitude: float
payment_type: string
fare_amount: float
extra: float
mta_tax: float
tip_amount: float
tolls_amount: float
total_amount: float
improvement_surcharge: float
pickup_location_id: int32
dropoff_location_id: int32
congestion_surcharge: float
year: int32
month: int32

See $metadata for additional Schema metadata
")
Expand Down Expand Up @@ -212,22 +212,22 @@ system.time(ds %>%

```{r, echo = FALSE, eval = !file.exists("nyc-taxi")}
cat("
## # A tibble: 10 x 3
## passenger_count tip_pct n
## <int> <dbl> <int>
## 1 0 9.84 380
## 2 1 16.7 143087
## 3 2 16.6 34418
## 4 3 14.4 8922
## 5 4 11.4 4771
## 6 5 16.7 5806
## 7 6 16.7 3338
## 8 7 16.7 11
## 9 8 16.7 32
## 10 9 16.7 42
##
## user system elapsed
## 4.436 1.012 1.402
# A tibble: 10 x 3
passenger_count tip_pct n
<int> <dbl> <int>
1 0 9.84 380
2 1 16.7 143087
3 2 16.6 34418
4 3 14.4 8922
5 4 11.4 4771
6 5 16.7 5806
7 6 16.7 3338
8 7 16.7 11
9 8 16.7 32
10 9 16.7 42

user system elapsed
4.436 1.012 1.402
")
```

Expand All @@ -246,14 +246,14 @@ ds %>%

```{r, echo = FALSE, eval = !file.exists("nyc-taxi")}
cat("
## FileSystemDataset (query)
## tip_amount: float
## total_amount: float
## passenger_count: int8
##
## * Filter: ((total_amount > 100:double) and (year == 2015:double))
## * Grouped by passenger_count
## See $.data for the source Arrow object
FileSystemDataset (query)
tip_amount: float
total_amount: float
passenger_count: int8

* Filter: ((total_amount > 100:double) and (year == 2015:double))
* Grouped by passenger_count
See $.data for the source Arrow object
")
```

Expand Down