New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conda as package manager for R in rstudio-server as replacement for packrat #9423
Comments
|
Hi, Is there any suggestion about how to fix this error? I would like to use Rstudio server in remote control. Thanks! |
|
Would this work for a server not connected to the internet? |
|
Linking the results from my experiments here: https://github.com/grst/rstudio-server-conda Using I use this approach in production for a while now and have not encountered any major issues. |
|
@grst - cool! :) Not sure if you're up for it, but in case you're interested - there's also an attempt to build rstudio server directly for conda-forge: conda-forge/staged-recipes#13760 If you want to contribute, you'd be more than welcome - currently it's a bit stalled. |
|
Hi @h-vetinari, that looks interesting! Do I get it right this approach would require rstudio server to be installed within the same conda env that is used for the analysis? |
Not necessarily - conda supports stacking environments - but generally, the idea would be that everything (including rstudio-server) could be installed by conda and work "out of the box". |
As suggessted by @mingwandroid I'm moving this post here:
We are currently working on the combination of
rstudio-serverandcondaas well as we plan to usecondafor python package management (one environment.yml per Data Science application). There are a lot of questions/answers scattered around the internet about this topic, but there was nowhere completely described how to do it and how to troubleshoot or solve different troubles that might come up on the way.We would like to share our experience and hear opinions and critics for our approach and rise a discussion on the topic. Also, we would like to encourage everyone to collect references to other material about the topic on the question and finally create some complete guide that will help whoever will want to do the same.
Environment and overview
For each user in our system we have independent containers with
rstudio-server. Data Scientists are using bothR(andpython) for development. ForRwe’ve usedpackratas package manager, but following concerns arises:Packratwill compile each package first time – that means that recreation of environment for the project usually takes long time. Also, creation of clean testing environment is time consuming.Packratis usable only withR–pythondevelopers can’t use it - that means that different projects are using different package managers.Packratdoes not handle package dependencies on system libraries (e.g.RODBCpackage depends onunixodbc)Packratcannot handle binary packages that are not available incran(such ascatboost).Since
condais language agnostic package and environment manager that supports number ofRpackages, it gains influence inpythonworld and it has constantly growing community of contributors who add new packages tocondawe decided to try it out. The solution we’ve thought about looks like this:The following questions we have formulated:
condaand the containerizedrstudio-server?conda?condawithinstall.packages()ordevtools::install_github()for missing packages at least to tryout packages?condawithpackrat?condawithspark(sparklyr, pyspark)?Below are the answers we have for the questions
1. How to combine conda and the containerized rstudio-server?
This is quite simple:
First put following line
.libPaths(paste0(R.home(), "/library"))inside~/.Rprofilefile. This will exclude paths to any library directory, but the one that is located inRhome directory. This way wheneverRhome directory is switched –Rlibrary directory will also be changed.Second – substitute line
rsession-which-r=path/to/r/homein/etc/rstudio/rserver.conffile to path toRwithincondaenvironment.We’ve developed following script to do it automatically:
This will successfully switch the R interpreter and libraries to a selected conda environment.
2. How to handle packages that are not available in conda?
Here are 2 solution that we’ve tried:
Conda skeleton – for packages that are available in
cranor some other package managers. It will automatically create and compile package for you and will let you to (or even automatically) upload the package to anaconda.org own channel. To useconda skeletonis easy:The guide on how to use it is here.
The error we’ve got doing it is:
This issue can be resolved by creating file with address of the cran mirror:
Add argument to build command:
For the packages that are not available in
cranit’s possible to useconda-forgeand contribute to community by creating receipts for the missing packages. The process is well documented and available here.3. Is it possible to combine
condawithinstall.packages()ordevtools::install_github()for missing packages at least to tryout packages?It seems that is should be possible for purely
Rpackages, but there are currently 2 errors with to be compiled packages, as thecondabuild environment is not utilized automatically:x86_64-conda_cos6-linux-gnu-cc not found– when trying to compilepackage. There is a workaround for it - using
Sys.setenv()prependpath to the conda environment bin folder to PATH variable:
Sys.setenv(PATH=paste0("path/to/conda/env/bin:", Sys.getenv(“PATH”)))<somlibrary> library that is required to build <someotherlibrary> was not found.– There is no currently solutionhow to fix it, but I assume, that there should be some complete
solution how to fix both, this and the previous error.
4. Is it possible to combine
condawithpackrat?Does not look like as
packratdoes not recognize packages already installed viacondaand it faces the same build issues as 3.5. How to use
condawithspark(sparklyr,pyspark)?This question is how to bring
condaenvironment to the executors. Some similar question was already discussed here, so that’s where I’ll start my investigation.6. Is it feasible to use
conda?Seems like yes?!
>90-95%of the requiredRpackages are anyway available viaconda-forge. The rest can be handled locally viaconda skeleton cran & buildeventually followed by an upload to ownanacondachannel or by contributing toconda-forge.Conda + packratseems not to work and maybe also not be required.devtools::install_github()would be nice to have, but how to get it running if the code needs to be compiled?Is there any other solution to handle additional packages on top of
conda-shellorRsetup script?The text was updated successfully, but these errors were encountered: