Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARROW-8300: [R] Documentation and changelog updates for 0.17 #6833

Closed
wants to merge 8 commits into from

Conversation

nealrichardson
Copy link
Member

This edits the NEWS.md and adds a python.Rmd (static) example of using the reticulate bindings.

cc @wesm @pitrou

@github-actions
Copy link

github-actions bot commented Apr 3, 2020

Copy link
Member

@wesm wesm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you also want to make some statements about the state of memory allocators in each platform? If any UNIX platform is not using jemalloc, and if Windows is not using mimalloc, we are not yet delivering the ultimate performance to R users. Might be worth stating that we're aware of the issue, working on it, and expect performance to improve once the packaging issues are resolved.

That being said I might try to use the air cover of this week to address the static lib bundling at least for the private allocators

r/NEWS.md Outdated Show resolved Hide resolved
r/NEWS.md Outdated
## Python interoperability

The 0.17 Apache Arrow release includes a C data interface that allows you to
share or pass data from one runtime to another without copying it. This enables
Copy link
Member

@wesm wesm Apr 6, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might provoke comments like "Wait, couldn't Arrow already do that?"

I would say that the "C data face allows exchanging Arrow data in-process at the C level without copying and without libraries having an build or runtime dependency on the other". The real game changer is the lack of any dependency / coupling, code or otherwise.

The key differentiators with the C interface to help people not get confused:

  • Exchanging data in-process from one library/function to another
  • Projects have zero knowledge of each other except for their use of the C interface
  • Neither project is required to depend on the other at build-time or run-time

By contrast, the IPC format:

  • Truly language-independent (e.g. does not presume any C ABI)
  • Allows exchanging data between processes
  • Can be stored on disk or transferred over TCP

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, revised slightly, but I think most of that comment belongs in @pitrou's blog post, not here.

r/_pkgdown.yml Show resolved Hide resolved
Copy link
Member

@wesm wesm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments on the vignette

r/vignettes/python.Rmd Outdated Show resolved Hide resolved
r/vignettes/python.Rmd Outdated Show resolved Hide resolved
We can apply R methods on it:

```r
a[a > 1]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is magical, how does this work?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First, > and similar operators dispatch through the Ops generic, which here is implemented basically to capture the parse tree.

Then, this function implements [, and (lacking any better options at this point) calls as.vector() to evaluate in R the expression we captured before. So in this case, that results in an R logical selection vector, and that gets passed to Array::Filter here.

r/vignettes/python.Rmd Outdated Show resolved Hide resolved
@wesm
Copy link
Member

wesm commented Apr 6, 2020

Another small pedantic comment (thinking about the peanut gallery) on the vignette is the "transferring" terminology. After the export to the C interface, the data is still available in the origin interpreter, but ownership is now shared.

@nealrichardson
Copy link
Member Author

Do you also want to make some statements about the state of memory allocators in each platform?

Not really :) I mean, nothing has changed in 0.17 wrt to that.

Another small pedantic comment (thinking about the peanut gallery) on the vignette is the "transferring" terminology. After the export to the C interface, the data is still available in the origin interpreter, but ownership is now shared.

If you have better language, I'll gladly use it. My approach was to try to describe naively what was going on, in hopes that you and Antoine would correct me.

@nealrichardson nealrichardson marked this pull request as ready for review April 6, 2020 20:45
@nealrichardson
Copy link
Member Author

I'm done polishing docs for now, so whenever you're satisfied with the language @wesm, I'll merge.

@wesm
Copy link
Member

wesm commented Apr 6, 2020

OK @nealrichardson I'll tweak the words a bit per the comment above and then this can be merged

@wesm
Copy link
Member

wesm commented Apr 7, 2020

Hm well this is a mystery

-- 1. Failure: filter() on is.na() (@test-dplyr.R#102)  ------------------------
##[error]Error: R CMD check found ERRORs
`via_table` not equal to `expected`.
Execution halted
Incompatible type for column `chr`: x vctrs_unspecified, y character
Incompatible type for column `int`: x logical, y integer

@nealrichardson
Copy link
Member Author

There are no R code changes in this PR, so that's a flake, though there's no reason I can see that that test should be non-deterministic. There's something fishy going on with the Windows builds in general, given all of the failures we've seen lately. Maybe one of these failures will give us a clue someday.

I'll merge this, and if master is broken, well, I'll fix that tomorrow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants