Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: Make RCall depend dynamically on R_HOME #513

Open
schlichtanders opened this issue Nov 22, 2023 · 8 comments
Open

Feature Request: Make RCall depend dynamically on R_HOME #513

schlichtanders opened this issue Nov 22, 2023 · 8 comments

Comments

@schlichtanders
Copy link

schlichtanders commented Nov 22, 2023

I am using RCall with CondaPkg. I know there is currently a pullrequest which uses preferences to set R_HOME. I realized that this would still not be enough for certain setups. I want to sketch why it would be good if RCall's build does not depend on R_HOME.

Ideal World

When installing something which depends on RCall with R provided by CondaPkg, then what you would like to do is

  1. instantiate project
  2. import CondaPkg first and instantiate all CondaPkg dependencies (via CondaPkg.resolve())
  3. set R_HOME (via ENV["R_HOME"] = joinpath(CondaPkg.envdir(), "lib", "R"))
  4. import RCall and use RCall

Current World

The above unfortunately does not work but is immensely more complicated. The reason is that building RCall depends on R_HOME and a valid R_HOME. Hence step 1. the instantiation will fail, as R cannot find a valid R_HOME.

So instead currently you need to do

  1. start Julia in a dummy project environment
  2. find the CondaPkg version in the project (manually parsing the Manifest.toml is the simplest way I could find)
  3. Pkg.add(name="CondaPkg", version="the version we found")
  4. switch project environments so that CondaPkg thinks it is in the final project environment
  5. instantiate all CondaPkg dependencies (via CondaPkg.resolve())
  6. output joinpath(CondaPkg.envdir(), "lib", "R") and finish julia
  7. set R_HOME to the to the output R_HOME
  8. start julia in the actual project
  9. instantiate project
  10. import RCall and use RCall

[EDIT added]
Probably, this approach has still difficulties instantiating CondaPkg conda packages correctly, because it should also grab CondaPkg.tomls from depend projects in the julia load path. But as they are not yet instantiated, it probably does not pick them up. So I guess instead of this step 2, what you actually would need to do is to write your own instantiate method which instantiates everything but RCall and those depending on RCall. And then still CondaPkg would probably miss the conda dependencies of those who depend on RCall - they would need to be manually integrated without triggering a build of RCall.

Conclusion

It would be so much cleaner and simpler if RCall's build would not depend on R_HOME.
RCall's __init__ method can of course still happily depend on R_HOME.

@schlichtanders schlichtanders changed the title Feature Request: Make RCall build without needing R_HOME. Make RCall depend dynamically on R_HOME Feature Request: Make RCall depend dynamically on R_HOME Nov 22, 2023
@frankier
Copy link
Contributor

frankier commented Nov 23, 2023

Probably you will need to make the changes for this yourself. It will require some restructuring of the package to not use stuff from libR during the precompile stage, so I think it might require figuring out some package internals which may take some time. You may have better luck getting a PR merged.

That said, I didn't really understand under what circumstances the preferences approach does not work. Did you try installing my PR? I have been using it together with CondaPkg just fine in my own projects. What steps did you attempt?

Hence step 1. the instantiation will fail, as R cannot find a valid R_HOME

My PR solves this problem by aborting precompilation if there is no valid R_HOME. This will not cause a failure, merely a warning. Once you set things up with preferences, the preferences system will automatically trigger a recompilation of RCall.jl.

Currently, you have to set the preferences up manually or using a helper script (I have posted the latter in the PR), but once the PR is merged we can either a package extension or create a mini-helper package which will set up the preferences automatically giving a higher level of convenience.

@frankier
Copy link
Contributor

frankier commented Nov 23, 2023

You might find it informative to read through my PR and my comments since it contains a lot of relevant information to what you are writing. With regards to

RCall's __init__ method can of course still happily depend on R_HOME.

Please see:

when other modules which import RCall are precompiled they will run the __init__ . I tried an approach that checked for currently_compiling() = ccall(:jl_generating_output, Cint, ()) != 0 in __init__ and skipped the init in that case, however this will mean an R interpreter is not started, and apparently this is needed even during precompliation time for the R_str macro. The resulting segfault was a bit surprising to me -- but I suppose the R runtime is needed even at this stage for such a close integration.

@frankier
Copy link
Contributor

Instructions on how to test my branch together with ConadPkg are here: JuliaPy/CondaPkg.jl#100 (comment)

Please let me know if you have any problems or have identified any issues, and I will be happy to discuss and attempt to address. In case you have looked more closely at the different approaches and happen to decide my PR is a reasonable approach, getting behind it could help it get merged.

@schlichtanders
Copy link
Author

schlichtanders commented Nov 23, 2023

Thank you frankier for all your comments and help.

What you write sounds pretty good. (Only that other packages call RCall's __init__ method during precompilation sounds quite horrifying actually... is this really so?)

My biggest wish is that

  1. Pkg.instantiate() should work, i.e. compiling RCall without R_HOME being setup correctly.
    • Assume that the LocalPreferences.toml exists in the same folder and sets R_HOME to the place it later will find an R installation
    • but as of instantiation time, no such R installation is available yet (will be installed via CondaPkg).
  2. afterwards I would like to call import CondaPkg; CondaPkg.resolve(); import RCall
    • and it should somehow trigger the correct R build
    • such that it won't rebuild everytime I run this second step again, but rather reuse the compiled version.

It seems to me that the preferences approach has the difficulty of managing the interaction between step 1. and step 2.
Given the same LocalPreferences, the first time the build should silently fail, while the second time the build should be retriggered (but only retriggered the very first time step 2. is run, subsequent runs shouldn't need a rebuild).

@schlichtanders
Copy link
Author

I guess you could solve this by having a dummy variable in LocalPreferences.toml which indicates whether this is the first instantiation or a subsequent normal build...

Not sure whether this would work.

@frankier
Copy link
Contributor

Only that other packages call RCall's init method during precompilation sounds quite horrifying actually... is this really so?

This surprised me at first, but this is in fact always the case. It's not spelled out in https://docs.julialang.org/en/v1/manual/modules/#Module-initialization-and-precompilation - which only mentions that __init__ is called during using, which it seems does in fact include when precompiling depending packages. I asked about this on Slack but didn't get a response, however a minimal test reveals this is always the case. As mentioned, you could try and abort __init__ during precompiling another package, but then we will have problems with R_str.

My biggest wish is that

  1. Pkg.instantiate() should work, i.e. compiling RCall without R_HOME being setup correctly.

    • Assume that the LocalPreferences.toml exists in the same folder and sets R_HOME to the place it later will find an R installation
    • but as of instantiation time, no such R installation is available yet (will be installed via CondaPkg).
  2. afterwards I would like to call import CondaPkg; CondaPkg.resolve(); import RCall

    • and it should somehow trigger the correct R build
    • such that it won't rebuild everytime I run this second step again, but rather reuse the compiled version.

It seems to me that the preferences approach has the difficulty of managing the interaction between step 1. and step 2. Given the same LocalPreferences, the first time the build should silently fail, while the second time the build should be retriggered (but only retriggered the very first time step 2. is run, subsequent runs shouldn't need a rebuild).

I definitely see what you're getting at -- that it would be convenient to set-up LocalPreferences.toml beforehand and retrigger things automatically whenever R is updated. This isn't how it works at the moment -- and to me it seems logical that by specifying the libR preference, you are also saying that it actually exists and is usable. The solution is therefore to run CondaPkg first and then set up the LocalPreferences.toml after R has been installed.

In the future, it should be possible to create some kind of post-install CondaPkg hook to automate this, so that the preference is only set up when R actually becomes available. However, using the preference system to configure libR is generally useful beyond CondaPkg. For example, with the current approach, changing R_HOME in one project, might affect another project on your machine if they happen to be using the same version of RCall. Using preferences fixes this problem, and this is why I am trying to get the preference PR merged as a first step.

@schlichtanders
Copy link
Author

I agree that Preferences improve the situation. It is just that dynamic resolution is still more handy than Preferences.

@frankier
Copy link
Contributor

Sure. One concrete advantage I can see is that we wouldn't end up with one copies of RCall.jl bytecode for each R installation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants