-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARROW-8300: [R] Documentation and changelog updates for 0.17 #6833
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you also want to make some statements about the state of memory allocators in each platform? If any UNIX platform is not using jemalloc, and if Windows is not using mimalloc, we are not yet delivering the ultimate performance to R users. Might be worth stating that we're aware of the issue, working on it, and expect performance to improve once the packaging issues are resolved.
That being said I might try to use the air cover of this week to address the static lib bundling at least for the private allocators
r/NEWS.md
Outdated
## Python interoperability | ||
|
||
The 0.17 Apache Arrow release includes a C data interface that allows you to | ||
share or pass data from one runtime to another without copying it. This enables |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might provoke comments like "Wait, couldn't Arrow already do that?"
I would say that the "C data face allows exchanging Arrow data in-process at the C level without copying and without libraries having an build or runtime dependency on the other". The real game changer is the lack of any dependency / coupling, code or otherwise.
The key differentiators with the C interface to help people not get confused:
- Exchanging data in-process from one library/function to another
- Projects have zero knowledge of each other except for their use of the C interface
- Neither project is required to depend on the other at build-time or run-time
By contrast, the IPC format:
- Truly language-independent (e.g. does not presume any C ABI)
- Allows exchanging data between processes
- Can be stored on disk or transferred over TCP
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, revised slightly, but I think most of that comment belongs in @pitrou's blog post, not here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some comments on the vignette
We can apply R methods on it: | ||
|
||
```r | ||
a[a > 1] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is magical, how does this work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First, >
and similar operators dispatch through the Ops
generic, which here is implemented basically to capture the parse tree.
Then, this function implements [
, and (lacking any better options at this point) calls as.vector()
to evaluate in R the expression we captured before. So in this case, that results in an R logical
selection vector, and that gets passed to Array::Filter here.
1d641bc
to
5e75792
Compare
Another small pedantic comment (thinking about the peanut gallery) on the vignette is the "transferring" terminology. After the export to the C interface, the data is still available in the origin interpreter, but ownership is now shared. |
Not really :) I mean, nothing has changed in 0.17 wrt to that.
If you have better language, I'll gladly use it. My approach was to try to describe naively what was going on, in hopes that you and Antoine would correct me. |
I'm done polishing docs for now, so whenever you're satisfied with the language @wesm, I'll merge. |
OK @nealrichardson I'll tweak the words a bit per the comment above and then this can be merged |
Hm well this is a mystery
|
There are no R code changes in this PR, so that's a flake, though there's no reason I can see that that test should be non-deterministic. There's something fishy going on with the Windows builds in general, given all of the failures we've seen lately. Maybe one of these failures will give us a clue someday. I'll merge this, and if master is broken, well, I'll fix that tomorrow. |
This edits the NEWS.md and adds a python.Rmd (static) example of using the
reticulate
bindings.cc @wesm @pitrou