ARROW-16276: [R] Arrow 8.0 News #13005

wjones127 · 2022-04-26T22:46:39Z

Let me know if I've missed anything important in this release!

github-actions · 2022-04-26T22:47:01Z

https://issues.apache.org/jira/browse/ARROW-16272

github-actions · 2022-04-26T22:47:03Z

⚠️ Ticket has not been started in JIRA, please click 'Start Progress'.

github-actions · 2022-04-26T22:48:24Z

https://issues.apache.org/jira/browse/ARROW-16276

github-actions · 2022-04-26T22:48:26Z

⚠️ Ticket has no components in JIRA, make sure you assign one.

eitsupi · 2022-04-27T10:25:11Z

It might be better to describe function and package names as `{dplyr}` or `write_dataset()` so that pkgdown can create the links automatically.

https://github.com/r-lib/pkgdown/blob/feb91bf46c3ea78f8a03aead9f9a4934e3965ba4/vignettes/linking.Rmd?rgh-link-date=2022-02-09T10%3A52%3A57Z#L20-L36

paleolimbot

This is awesome...thank you for putting it together!

r/NEWS.md

nealrichardson

Thanks a lot for doing this! A few suggestions inline, and one general one: see if you can remove the word "now" in as many places as you can. It's redundant: we're describing what's in the release, so of course we mean "now".

r/NEWS.md

nealrichardson · 2022-04-29T13:16:40Z

r/NEWS.md

+
+Custom extension arrays can be created and registered, allowing other packages to
+define their own array types. Extension arrays wrap regular Arrow array types and
+provide customized behavior and/or storage. A common use-case for extension types 


Should we link to the format docs on extension types? https://arrow.apache.org/docs/format/Columnar.html#extension-types

There are also some use cases described there.

Also, I'm not sure the use case here is correct. If it's just about custom serialization of R objects, isn't that what as_arrow_array is for? Extension types are about when you need to define a standard outside of just this implementation, like when you want to have Python and R both understand the semantics of the data. If you're just trying to round trip data with R, the regular R metadata mechanism works for you, and if you need to serialize/deserialize the data differently, define an S3 method.

Good point, and I think I agree with you. I borrowed this language from the extension types R docs:

arrow/r/R/extension.R

Lines 262 to 268 in d6ca3e2

#' Extension arrays are wrappers around regular Arrow [Array] objects

#' that provide some customized behaviour and/or storage. A common use-case

#' for extension types is to define a customized conversion between an

#' an Arrow [Array] and an R object when the default conversion is slow

#' or looses metadata important to the interpretation of values in the array.

#' For most types, the built-in

#' [vctrs extension type][vctrs_extension_type] is probably sufficient.

@paleolimbot Any thoughts on that?

I'm can simple cut that part out for now and link to the existing docs on extension types:

Custom extension arrays can be created and registered, allowing other packages to define their own array types. Extension arrays wrap regular Arrow array types and provide customized behavior and/or storage. See further description and an example with ?new_extension_type.

The existing documentation is correct, although a little confusing: as_arrow_array() is about converting to an Arrow Array, but you need to use an ExtensionType subclass in order to customize converting from an Arrow Array. Another compelling use of ExtensionType is, as Neal mentioned, where the type is defined in a Python package as well.

Perhaps a solution here is to group the S3 Generics heading and the ExtensionType heading, because they're both under the theme of extensibility? Maybe:

Extensibility

Added S3 generic methods to create the core Arrow object types. In particular, packages can define the as_arrow_array() generic to ensure that a custom vector type is converted to an Arrow Array in a particular way (e.g., when converting a data.frame to an Arrow Table). Packages can also define an as_arrow_table() method to customize conversion of a table-like object (e.g., when an object is passed to write_parquet() or write_feather()).

Custom ExtensionTypes can be created and registered, allowing other packages to define their own array types and/or conversions from Arrow Arrays to R vectors. Extension arrays wrap regular Arrow array types and provide customized behavior and/or storage. See documentation for new_extension_type() for details.

Implemented a generic extension type and as_arrow_array() methods for all objects where vctrs::vec_is() returns TRUE (i.e., any object that can be used as a column in a tibble::tibble()), provided that the underlying vctrs::vec_data() can be converted to an Arrow Array.

(feel free to mix/match/scramble/disregard this with what you've written already!)

Yeah agreed that combining the sections makes sense.

r/NEWS.md

nealrichardson · 2022-04-29T13:32:04Z

r/NEWS.md

+  - now can take a list of datasets with differing schemas and attempt to unify the 
+    schemas to produce a `UnionDataset`.
+* Arrow `{dplyr}` queries:
+  - are now supported on `RecordBatchReader`. This allows results from DuckDB


This is just one example of use case, there are/will be others (for example, you can pass a RecordBatchReader over the C interface, so you can get one from wherever in pyarrow, including Flight, and do dplyr on it)

Co-authored-by: Dewey Dunnington <dewey@fishandwhistle.net> Co-authored-by: Neal Richardson <neal.p.richardson@gmail.com>

r/NEWS.md

Co-authored-by: Neal Richardson <neal.p.richardson@gmail.com>

r/NEWS.md

nealrichardson

Some minor notes but this looks good, let's get this finished and merged before the next RC is cut if we can

nealrichardson · 2022-05-03T16:49:34Z

r/NEWS.md

+    * `lubridate::tz()` (timezone),
+    * `lubridate::semester()` (semester), 
+    * `lubridate::dst()` (daylight savings time indicator),
+    * `lubridate::date()` (extract date), 


is this correct?

Suggested change

* `lubridate::date()` (extract date),

* `lubridate::date()` (extract date from timestamp),

nealrichardson · 2022-05-03T16:49:47Z

r/NEWS.md

+  instead of `timestamp(unit = "s")`.
+* For Arrow dplyr queries, added additional `{lubridate}` features and fixes:
+  * New component extraction functions: 
+    * `lubridate::tz()` (timezone),


?

Suggested change

* `lubridate::tz()` (timezone),

* `lubridate::tz()` (string timezone),

I think I'd rather limit these parenthetical to just explain abbreviations (tz, dst, epiyear), rather than try to function as docs. We link to the lubridate function docs directly for each bullet, so more detail is readily available to the reader.

nealrichardson · 2022-05-03T16:50:05Z

r/NEWS.md

+  * New component extraction functions: 
+    * `lubridate::tz()` (timezone),
+    * `lubridate::semester()` (semester), 
+    * `lubridate::dst()` (daylight savings time indicator),


this?

Suggested change

* `lubridate::dst()` (daylight savings time indicator),

* `lubridate::dst()` (daylight savings time indicator, logical/boolean),

nealrichardson · 2022-05-03T16:50:45Z

r/NEWS.md

+    * `lubridate::semester()` (semester), 
+    * `lubridate::dst()` (daylight savings time indicator),
+    * `lubridate::date()` (extract date), 
+    * `lubridate::epiyear()` (epiyear),


what is epiyear? drop the parenthetical if we don't have anything to clarify

"year according to epidemilogical week calendar".

nealrichardson · 2022-05-03T16:51:01Z

r/NEWS.md

+    * `lubridate::date()` (extract date), 
+    * `lubridate::epiyear()` (epiyear),
+  * `lubridate::month()` works with integer inputs.
+  * Added `lubridate::make_date()` & `lubridate::make_datetime()` + 


Drop "Added" from all of these, seems inconsistent with the ones above

nealrichardson · 2022-05-03T16:52:33Z

r/NEWS.md

+   and chunking is acceptable, using `ChunkedArray$create()`.
+ * ChunkedArrays can be concatenated with `c()`.
+ * RecordBatches and Tables support `cbind()`.
+ * Tables support `rbind()`. `concat_tables()` is also provided to


This is correct, no rbind for RecordBatch? wasn't there some alternative to concatenate batches?

The alternative is to make it a Table, but that's not really new IMO. https://github.com/apache/arrow/blob/master/r/R/record-batch.R#L195

r/NEWS.md

Co-authored-by: Neal Richardson <neal.p.richardson@gmail.com>

nealrichardson

Thanks a lot!

ursabot · 2022-05-07T11:44:01Z

Benchmark runs are scheduled for baseline = 6b32c30 and contender = 526fa07. 526fa07 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Finished ⬇️0.0% ⬆️0.0%] ec2-t3-xlarge-us-east-2
[Finished ⬇️0.08% ⬆️0.0%] test-mac-arm
[Finished ⬇️0.0% ⬆️0.0%] ursa-i9-9960x
[Finished ⬇️0.28% ⬆️0.04%] ursa-thinkcentre-m75q
Buildkite builds:
[Finished] 526fa070 ec2-t3-xlarge-us-east-2
[Finished] 526fa070 test-mac-arm
[Finished] 526fa070 ursa-i9-9960x
[Finished] 526fa070 ursa-thinkcentre-m75q
[Finished] 6b32c300 ec2-t3-xlarge-us-east-2
[Finished] 6b32c300 test-mac-arm
[Finished] 6b32c300 ursa-i9-9960x
[Finished] 6b32c300 ursa-thinkcentre-m75q
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

github-actions bot added the Component: R label Apr 26, 2022

wjones127 changed the title ~~ARROW-16272: Arrow 8.0 News~~ ARROW-16276: Arrow 8.0 News Apr 26, 2022

wjones127 added 2 commits April 27, 2022 14:01

Go through commits and draft news

337cb46

Enhance links and add a few c++ changes

671a7f8

wjones127 force-pushed the ARROW-16276-r-news-8 branch from d995012 to 671a7f8 Compare April 27, 2022 21:01

wjones127 marked this pull request as ready for review April 27, 2022 21:01

Revert version

968eb0a

wjones127 changed the title ~~ARROW-16276: Arrow 8.0 News~~ ARROW-16276: [R] Arrow 8.0 News Apr 27, 2022

paleolimbot approved these changes Apr 29, 2022

View reviewed changes

r/NEWS.md Outdated Show resolved Hide resolved

r/NEWS.md Outdated Show resolved Hide resolved

r/NEWS.md Show resolved Hide resolved

nealrichardson reviewed Apr 29, 2022

View reviewed changes

wjones127 and others added 3 commits April 29, 2022 08:05

Apply suggestions from code review

2e09fb0

Co-authored-by: Dewey Dunnington <dewey@fishandwhistle.net> Co-authored-by: Neal Richardson <neal.p.richardson@gmail.com>

No more 'now'

e375f58

Combine into on extensibility section

b62cdf8

eitsupi reviewed Apr 30, 2022

View reviewed changes

r/NEWS.md Show resolved Hide resolved

nealrichardson reviewed May 2, 2022

View reviewed changes

r/NEWS.md Outdated Show resolved Hide resolved

nealrichardson reviewed May 2, 2022

View reviewed changes

r/NEWS.md Outdated Show resolved Hide resolved

wjones127 and others added 3 commits May 2, 2022 08:43

Apply suggestions from code review

73f1e9a

Co-authored-by: Neal Richardson <neal.p.richardson@gmail.com>

Remove duplicated lines caused by git merge

c9d6c93

A couple more dupes removed

bb62e10

nealrichardson reviewed May 3, 2022

View reviewed changes

r/NEWS.md Show resolved Hide resolved

nealrichardson reviewed May 3, 2022

View reviewed changes

wjones127 and others added 3 commits May 3, 2022 10:55

Apply suggestions from code review

b942c28

Co-authored-by: Neal Richardson <neal.p.richardson@gmail.com>

Clean up date and time news

22cfd27

Lubridate actually mispells this word in their docs

b666173

wjones127 requested a review from nealrichardson May 3, 2022 18:01

nealrichardson approved these changes May 3, 2022

View reviewed changes

nealrichardson closed this in 526fa07 May 3, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ARROW-16276: [R] Arrow 8.0 News #13005

ARROW-16276: [R] Arrow 8.0 News #13005

wjones127 commented Apr 26, 2022

github-actions bot commented Apr 26, 2022

github-actions bot commented Apr 26, 2022

github-actions bot commented Apr 26, 2022

github-actions bot commented Apr 26, 2022

eitsupi commented Apr 27, 2022

paleolimbot left a comment

nealrichardson left a comment

nealrichardson Apr 29, 2022

wjones127 Apr 29, 2022

paleolimbot Apr 29, 2022 •

edited

Loading

wjones127 Apr 29, 2022

nealrichardson Apr 29, 2022

nealrichardson left a comment

nealrichardson May 3, 2022

wjones127 May 3, 2022

nealrichardson May 3, 2022

wjones127 May 3, 2022

nealrichardson May 3, 2022

nealrichardson May 3, 2022

wjones127 May 3, 2022

nealrichardson May 3, 2022

wjones127 May 3, 2022

nealrichardson May 3, 2022

wjones127 May 3, 2022

nealrichardson left a comment

ursabot commented May 7, 2022

	#' Extension arrays are wrappers around regular Arrow [Array] objects
	#' that provide some customized behaviour and/or storage. A common use-case
	#' for extension types is to define a customized conversion between an
	#' an Arrow [Array] and an R object when the default conversion is slow
	#' or looses metadata important to the interpretation of values in the array.
	#' For most types, the built-in
	#' [vctrs extension type][vctrs_extension_type] is probably sufficient.

	* `lubridate::date()` (extract date),
	* `lubridate::date()` (extract date from timestamp),

	* `lubridate::tz()` (timezone),
	* `lubridate::tz()` (string timezone),

	* `lubridate::dst()` (daylight savings time indicator),
	* `lubridate::dst()` (daylight savings time indicator, logical/boolean),

ARROW-16276: [R] Arrow 8.0 News #13005

ARROW-16276: [R] Arrow 8.0 News #13005

Conversation

wjones127 commented Apr 26, 2022

github-actions bot commented Apr 26, 2022

github-actions bot commented Apr 26, 2022

github-actions bot commented Apr 26, 2022

github-actions bot commented Apr 26, 2022

eitsupi commented Apr 27, 2022

paleolimbot left a comment

Choose a reason for hiding this comment

nealrichardson left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

paleolimbot Apr 29, 2022 • edited Loading

Choose a reason for hiding this comment

Extensibility

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nealrichardson left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nealrichardson left a comment

Choose a reason for hiding this comment

ursabot commented May 7, 2022

paleolimbot Apr 29, 2022 •

edited

Loading