ARROW-17188: [R] Update news for 9.0.0 #13726

wjones127 · 2022-07-27T15:29:37Z

No description provided.

dragosmg · 2022-07-27T15:32:52Z

r/NEWS.md

+  * Aggregations over partition columns return correct results. (ARROW-16700)
+* `dplyr::union` and `dplyr::union_all` are supported. (ARROW-15622)
+* `dplyr::glimpse` is supported. (ARROW-16776)
+* `show_exec_plan()` can be added to the end of a dplyr pipeline to show the underlying plan, similar to `dplyr::show_query`. (ARROW-15016)


Do we want to add here that both dplyr::show_query() and dplyr::explain() also work?

We now support namespacing (pkg:: prefixing) => something like the chunk below works.

mtcars %>% mutate(make_model = rownames(.)) %>% arrow_table() %>% mutate(name_length = stringr::str_length(make_model)) %>% collect()

raulcd · 2022-07-27T15:34:32Z

r/NEWS.md

@@ -19,19 +19,54 @@

 # arrow 8.0.0.9000


Shouldn't this be 9.0.0?

Suggested change

# arrow 8.0.0.9000

# arrow 9.0.0

ok, I see this is supposed to be done on the utils-prepare.sh script as with the other versions.

Correct, we don't make this change manually

nealrichardson

Thanks for doing this. A few suggestions

nealrichardson · 2022-07-27T18:05:51Z

r/NEWS.md

-* `lubridate::parse_date_time()` datetime parser:
-  * `orders` with year, month, day, hours, minutes, and seconds components are supported.
-  * the `orders` argument in the Arrow binding works as follows: `orders` are transformed into `formats` which subsequently get applied in turn. There is no `select_formats` parameter and no inference takes place (like is the case in `lubridate::parse_date_time()`).
+## Arrays and tables


Let's reorder: First dplyr, then reading/writing, then this (or general assorted bugfixes), then packaging

nealrichardson · 2022-07-27T18:07:42Z

r/NEWS.md

+* UnionDatasets can unify schemas of multiple InMemoryDatasets with varying
+  schemas. (ARROW-16085)
+* `write_dataset()` preserves all schema metadata again. In 8.0.0, it would drop most metadata, breaking packages such as sfarrow. (ARROW-16511)
+* Reading and writing functions (such as `write_csv_arrow()`) will automatically (de-)compress data if the file path contains a compression extension (e.g. `"data.csv.gz"`). This works locally as well as on remote filesystems like S3 and GCS. (ARROW-16144)


This was already sorta the case for csv and json, but there were some bugs. But parquet and feather don't automatically do anything with the file path

nealrichardson · 2022-07-27T18:08:06Z

r/NEWS.md

+* Reading and writing functions (such as `write_csv_arrow()`) will automatically (de-)compress data if the file path contains a compression extension (e.g. `"data.csv.gz"`). This works locally as well as on remote filesystems like S3 and GCS. (ARROW-16144)
+* `FileSystemFactoryOptions` can be provided to `open_dataset()`, allowing you to pass options such as which file prefixes to ignore. (ARROW-15280)
+* By default, `S3FileSystem` will not create or delete buckets. To enable that, pass the configuration option `allow_bucket_creation` or `allow_bucket_deletion`. (ARROW-15906)
+* `GcsFileSystem` and `gs_bucket()` allow connecting to Google Cloud Storage. (ARROW-13404, ARROW-16887)


Maybe lead with this one? We should sort the section based on relevance/priority

nealrichardson · 2022-07-27T18:08:35Z

r/NEWS.md

+
+## Arrow dplyr queries
+
+* Bugfixes:


Likewise let's lead with major new features (new dplyr verbs, then new functions) and put bug fixes at the end

Sure, though I was putting these at the top because they seemed like critical bugfixes.

nealrichardson · 2022-07-27T18:08:55Z

r/NEWS.md

+* Functions can be called with package namespace prefixes (e.g. `stringr::`, `lubridate::`) within queries. For example, `stringr::str_length` will now dispatch to the same kernel as `str_length`. (ARROW-14575)
+* User-defined functions are supported in queries. Use `register_scalar_function()` to create them. (ARROW-16444)
+* `lubridate::parse_date_time()` datetime parser: (ARROW-14848, ARROW-16407)
+  * `orders` with year, month, day, hours, minutes, and seconds components are supported.


Are some orders not supported?

Yes, #13506 adds the remaining formats/ orders. So far the focus was on supporting the orders that would enable the higher level parsers (e.g. ymd_hms()).

nealrichardson · 2022-07-27T18:09:51Z

r/NEWS.md

+  * the `orders` argument in the Arrow binding works as follows: `orders` are transformed into `formats` which subsequently get applied in turn. There is no `select_formats` parameter and no inference takes place (like is the case in `lubridate::parse_date_time()`).
+* `lubridate::ymd()` and related string date parsers supported. (ARROW-16394). Month (`ym`, `my`) and quarter (`yq`) resolution parsers are also added. (ARROW-16516)
+* lubridate family of `ymd_hms` datetime parsing functions are supported. (ARROW-16395)
+* `lubridate::fast_strptime()` supported. (ARROW-16439)


I'm not sure we need a separate bullet point for every function that just says "supported". We can group them as is relevant, and we don't need to include all of the JIRA issue ids.

I second ☝🏻 that.

Yeah I'll consolidate. Wrote those bullets in a hurry :)

I include the JIRA IDs because it makes it way easier for me to revise the news. We can strip them at the end if people don't want them shown publicly.

nealrichardson · 2022-07-27T18:10:27Z

r/NEWS.md

+* `dplyr::glimpse` is supported. (ARROW-16776)
+* `show_exec_plan()` can be added to the end of a dplyr pipeline to show the underlying plan, similar to `dplyr::show_query()`. `dplyr::show_query()` and `dplyr::explain()` also work in Arrow dplyr pipelines. (ARROW-15016)
+* Functions can be called with package namespace prefixes (e.g. `stringr::`, `lubridate::`) within queries. For example, `stringr::str_length` will now dispatch to the same kernel as `str_length`. (ARROW-14575)
+* User-defined functions are supported in queries. Use `register_scalar_function()` to create them. (ARROW-16444)


This should go higher up. Also should discuss map_batches() alongside this since they're both kinds of UDF

nealrichardson · 2022-07-27T18:10:36Z

r/NEWS.md

+* `dplyr::union` and `dplyr::union_all` are supported. (ARROW-15622)
+* `dplyr::glimpse` is supported. (ARROW-16776)
+* `show_exec_plan()` can be added to the end of a dplyr pipeline to show the underlying plan, similar to `dplyr::show_query()`. `dplyr::show_query()` and `dplyr::explain()` also work in Arrow dplyr pipelines. (ARROW-15016)
+* Functions can be called with package namespace prefixes (e.g. `stringr::`, `lubridate::`) within queries. For example, `stringr::str_length` will now dispatch to the same kernel as `str_length`. (ARROW-14575)


This is also significant

nealrichardson · 2022-07-27T18:11:26Z

r/NEWS.md

+  * Count distinct now gives correct result across multiple row groups. (ARROW-16807)
+  * Aggregations over partition columns return correct results. (ARROW-16700)
+* `dplyr::union` and `dplyr::union_all` are supported. (ARROW-15622)
+* `dplyr::glimpse` is supported. (ARROW-16776)


Can we say more than just "supported"?

github-actions · 2022-07-27T18:46:58Z

https://issues.apache.org/jira/browse/ARROW-17188

Authored-by: Will Jones <willjones127@gmail.com> Signed-off-by: Krisztián Szűcs <szucs.krisztian@gmail.com>

ursabot · 2022-07-28T06:41:21Z

Benchmark runs are scheduled for baseline = 71ccff9 and contender = a5f0c56. a5f0c56 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Failed ⬇️0.0% ⬆️0.0%] ec2-t3-xlarge-us-east-2
[Failed ⬇️0.24% ⬆️0.0%] test-mac-arm
[Finished ⬇️0.0% ⬆️0.0%] ursa-i9-9960x
[Finished ⬇️0.21% ⬆️0.07%] ursa-thinkcentre-m75q
Buildkite builds:
[Failed] a5f0c56a ec2-t3-xlarge-us-east-2
[Failed] a5f0c56a test-mac-arm
[Finished] a5f0c56a ursa-i9-9960x
[Finished] a5f0c56a ursa-thinkcentre-m75q
[Failed] 71ccff9c ec2-t3-xlarge-us-east-2
[Finished] 71ccff9c test-mac-arm
[Finished] 71ccff9c ursa-i9-9960x
[Finished] 71ccff9c ursa-thinkcentre-m75q
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

doc: write news

d18702d

dragosmg reviewed Jul 27, 2022

View reviewed changes

raulcd reviewed Jul 27, 2022

View reviewed changes

docs: cleanup

9cc9caa

wjones127 marked this pull request as ready for review July 27, 2022 15:48

nealrichardson reviewed Jul 27, 2022

View reviewed changes

github-actions bot added Component: Documentation Component: R labels Jul 27, 2022

docs: reorder and consolidate

2084df2

wjones127 requested a review from nealrichardson July 27, 2022 19:16

kszucs approved these changes Jul 28, 2022

View reviewed changes

kszucs merged commit a5f0c56 into apache:master Jul 28, 2022

kszucs pushed a commit that referenced this pull request Jul 28, 2022

ARROW-17188: [R] Update news for 9.0.0 (#13726)

2e96455

Authored-by: Will Jones <willjones127@gmail.com> Signed-off-by: Krisztián Szűcs <szucs.krisztian@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ARROW-17188: [R] Update news for 9.0.0 #13726

ARROW-17188: [R] Update news for 9.0.0 #13726

wjones127 commented Jul 27, 2022

dragosmg Jul 27, 2022 •

edited

raulcd Jul 27, 2022

raulcd Jul 27, 2022

nealrichardson Jul 27, 2022

nealrichardson left a comment

nealrichardson Jul 27, 2022

nealrichardson Jul 27, 2022

nealrichardson Jul 27, 2022

nealrichardson Jul 27, 2022

wjones127 Jul 27, 2022

nealrichardson Jul 27, 2022

dragosmg Jul 27, 2022

nealrichardson Jul 27, 2022

dragosmg Jul 27, 2022

wjones127 Jul 27, 2022

nealrichardson Jul 27, 2022

nealrichardson Jul 27, 2022

nealrichardson Jul 27, 2022

github-actions bot commented Jul 27, 2022

ursabot commented Jul 28, 2022

ARROW-17188: [R] Update news for 9.0.0 #13726

ARROW-17188: [R] Update news for 9.0.0 #13726

Conversation

wjones127 commented Jul 27, 2022

dragosmg Jul 27, 2022 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nealrichardson left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

github-actions bot commented Jul 27, 2022

ursabot commented Jul 28, 2022

dragosmg Jul 27, 2022 •

edited