Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

R arrow package #81761

Closed
jtrakk opened this issue Mar 4, 2020 · 9 comments
Closed

R arrow package #81761

jtrakk opened this issue Mar 4, 2020 · 9 comments

Comments

@jtrakk
Copy link

jtrakk commented Mar 4, 2020

Project description

The Apache Arrow serialization library for R.

Metadata

There is already rPackages.arrow but it's missing runtime libraries. Attempting to use it gives an error:

> arrow::read_feather('my.feather')
Error in io___MemoryMappedFile__Open(path, mode) : 
  Cannot call io___MemoryMappedFile__Open(). Please use arrow::install_arrow() to install required runtime libraries. 
@Zhen-hao
Copy link

I encountered the same issue. I think the first thing to check is how the R library finds the C++ library.
I asked for help on the arrow project: apache/arrow#7034

@Zhen-hao
Copy link

reading https://arrow.apache.org/docs/r/articles/install.html#how-dependencies-are-resolved, I don't think it is possible to build rPackages.arrow in such a way that it always finds the arrow-cpp installation.

I have a workaround now. https://gist.github.com/Zhen-hao/8bdc7b2afe10f270f3f7e280a89c5ef0
if the arrow library is not working in the shell, just manually install the same version one more time.

@stale
Copy link

stale bot commented Oct 22, 2020

Hello, I'm a bot and I thank you in the name of the community for opening this issue.

To help our human contributors focus on the most-relevant reports, I check up on old issues to see if they're still relevant. This issue has had no activity for 180 days, and so I marked it as stale, but you can rest assured it will never be closed by a non-human.

The community would appreciate your effort in checking if the issue is still valid. If it isn't, please close it.

If the issue persists, and you'd like to remove the stale label, you simply need to leave a comment. Your comment can be as simple as "still important to me". If you'd like it to get more attention, you can ask for help by searching for maintainers and people that previously touched related code and @ mention them in a comment. You can use Git blame or GitHub's web interface on the relevant files to find them.

Lastly, you can always ask for help at our Discourse Forum or at #nixos' IRC channel.

@stale stale bot added the 2.status: stale https://github.com/NixOS/nixpkgs/blob/master/.github/STALE-BOT.md label Oct 22, 2020
@TikhonJelvis
Copy link
Contributor

Is anyone actively looking at this? If not, I can try putting together a PR fixing this—I got an override working on my local machine on both NixOS and macOS.

@stale stale bot removed the 2.status: stale https://github.com/NixOS/nixpkgs/blob/master/.github/STALE-BOT.md label Jan 2, 2021
@jtrakk
Copy link
Author

jtrakk commented Feb 1, 2021

Still have this issue on NixOS unstable.

@TikhonJelvis
Copy link
Contributor

Okay. I am trying to fix this.

Last time I tried, I was held up because the R library was at version 1.0.0, but the Arrow C++ library in Nixpkgs was 2.0.0. To fix that, I opened #108268 so that the R Arrow package was brought up to 2.0.0. I had to focus on other things after that got merged; in the meantime, the Arrow C++ library in Nixpkgs got upgraded to 3.0.0.

I had a fix working locally based on @Zhen-hao's examples, but it fails to build now. I'm pretty sure it's because of the 2.0.0 vs 3.0.0 version mismatch.

What's the best way to proceed? Should I do another CRAN/etc version bump? Is there some lighter-weight way to only upgrade the R Arrow package? Is there a way to point the R package at an older version of the Arrow C++ library? Going forwards, if the C++ library has breaking releases relatively often, would we need to bump the entire R package set often to keep them in sync? Does bumping the R package set more often create an undue maintenance or testing burden?

Would love some advice from somebody more experienced with Nixpkgs and the R package system in particular. @peti or @SuperSandro2000?

@SuperSandro2000
Copy link
Member

I have no idea about R.

@jtrakk
Copy link
Author

jtrakk commented Feb 3, 2021

@TikhonJelvis I'm guessing this GitHub issue has a small viewership; maybe some people who know the answer follow the Discourse forum.

peti added a commit that referenced this issue Feb 16, 2021
peti added a commit that referenced this issue Feb 16, 2021
@peti
Copy link
Member

peti commented Feb 16, 2021

rPackages.arrow should work fine in master starting after 64b5504 and c4d8bd3.

To answer @TikhonJelvis's questions:

What's the best way to proceed? Should I do another CRAN/etc version bump? Is there some lighter-weight way to only upgrade the R Arrow package?

I have updated only the arrow package now because I didn't have much time and I just wanted to get results as quickly as possible, but updating the package set in its entirety would be the better solution here, even if it is more effort. Our package set is pretty old already. Anyhow, it is possible to cherry-pick certain updates for R if a complete update is undesirable for some reason.

Is there a way to point the R package at an older version of the Arrow C++ library?

If we would have an older version around, then it would be possible to point the R build to it by adding an appropriate override:

arrow = [ pkgs.pkgconfig pkgs.arrow-cpp ];

Going forwards, if the C++ library has breaking releases relatively often, would we need to bump the entire R package set often to keep them in sync?

That would be the best solution, IMHO. Updating arrow comes with the risk that dependencies or arrow become outdated and we end up losing features or performance or maybe even introducing bugs. It's not a major risk, I suppose, but still updating the package set would be preferable. The maintenance effort involved in such an update is moderate, IMHO.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants