Skip to content

Commit

Permalink
ARROW-17954: [R] Update news for 10.0 (#14337)
Browse files Browse the repository at this point in the history
Lead-authored-by: Will Jones <willjones127@gmail.com>
Co-authored-by: Neal Richardson <neal.p.richardson@gmail.com>
Co-authored-by: Dewey Dunnington <dewey@fishandwhistle.net>
Signed-off-by: Neal Richardson <neal.p.richardson@gmail.com>
  • Loading branch information
3 people authored and zeroshade committed Oct 20, 2022
1 parent 4a9bf70 commit 8b18060
Show file tree
Hide file tree
Showing 3 changed files with 67 additions and 2 deletions.
65 changes: 65 additions & 0 deletions r/NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,71 @@

# arrow 9.0.0.9000

## Arrow dplyr queries

Several new functions can be used in queries:

* `dplyr::across()` can be used to apply the same computation across multiple
columns, and the `where()` selection helper is supported in `across()`;
* `add_filename()` can be used to get the filename a row came from (only
available when querying `?Dataset`);
* Added five functions in the `slice_*` family: `dplyr::slice_min()`,
`dplyr::slice_max()`, `dplyr::slice_head()`, `dplyr::slice_tail()`, and
`dplyr::slice_sample()`.

The package now has documentation that lists all `dplyr` methods and R function
mappings that are supported on Arrow data, along with notes about any
differences in functionality between queries evaluated in R versus in Acero, the
Arrow query engine. See `?acero`.

A few new features and bugfixes were implemented for joins:

* Extension arrays are now supported in joins, allowing, for example, joining
datasets that contain [geoarrow](https://paleolimbot.github.io/geoarrow/) data.
* The `keep` argument is now supported, allowing separate columns for the left
and right hand side join keys in join output. Full joins now coalesce the
join keys (when `keep = FALSE`), avoiding the issue where the join keys would
be all `NA` for rows in the right hand side without any matches on the left.

A few breaking changes that improve the consistency of the API:

* Calling `dplyr::pull()` will return a `?ChunkedArray` instead of an R vector.
* Calling `dplyr::compute()` on a query that is grouped
returns a `?Table`, instead of a query object.

Finally, long-running queries can now be cancelled and will abort their
computation immediately.

## Arrays and tables

`as_arrow_array()` can now take `blob::blob` and `?vctrs::list_of`, which
convert to binary and list arrays, respectively. Also fixed an issue where
`as_arrow_array()` ignored type argument when passed a `StructArray`.

The `unique()` function works on `?Table`, `?RecordBatch`, `?Dataset`, and
`?RecordBatchReader`.

## Reading and writing

`write_feather()` can take `compression = FALSE` to choose writing uncompressed files.

Also, a breaking change for IPC files in `write_dataset()`: passing
`"ipc"` or `"feather"` to `format` will now write files with `.arrow`
extension instead of `.ipc` or `.feather`.

## Installation

As of version 10.0.0, `arrow` requires C++17 to build. This means that:

* On Windows, you need `R >= 4.0`. Version 9.0.0 was the last version to support
R 3.6.
* On CentOS 7, you can build the latest version of `arrow`,
but you first need to install a newer compiler than the default system compiler,
gcc 4.8. See `vignette("install", package = "arrow")` for guidance.
Note that you only need the newer compiler to build `arrow`:
installing a binary package, as from RStudio Package Manager,
or loading a package you've already installed works fine with the system defaults.

# arrow 9.0.0

## Arrow dplyr queries
Expand Down
2 changes: 1 addition & 1 deletion r/R/dplyr-funcs-doc.R
Original file line number Diff line number Diff line change
Expand Up @@ -83,7 +83,7 @@
#' Functions can be called either as `pkg::fun()` or just `fun()`, i.e. both
#' `str_sub()` and `stringr::str_sub()` work.
#'
#' In addition to these functions, you can call any of Arrow's 244 compute
#' In addition to these functions, you can call any of Arrow's 243 compute
#' functions directly. Arrow has many functions that don't map to an existing R
#' function. In other cases where there is an R function mapping, you can still
#' call the Arrow function directly if you don't want the adaptations that the R
Expand Down
2 changes: 1 addition & 1 deletion r/man/acero.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

0 comments on commit 8b18060

Please sign in to comment.