-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question about R packages #2951
Comments
On second thought, packages like lava make me not want to add suggested dependencies. Most of its suggested dependencies aren't in Spack, and concretization of R packages is incredibly slow as it is. |
I would imagine that which suggested dependencies you add should depend on your users' requirements. |
I'd say I know lots about R - it's been my main development environment the last 16+ years.
These packages are part of the core R distribution and are tied to the R version installed. You can basically consider these to be "R itself" (non-official https://github.com/wch/r-source/tree/trunk/src/library). These are so essential to R so it would not make sense that they could be updated via CRAN. If so, you would basically get a different version of R. Thus, they're update when R is updated.
These are so called "Recommended" packages (legacy term). They are for historical reasons quite tied to the core R distribution, developed by the R core team or people closely related to it. The R core distribution "knows" about these package (cf. https://github.com/wch/r-source/blob/trunk/share/make/vars.mk), but they are indeed distributed via CRAN. Because they're distributed via CRAN, they can also be updated between R version releases.
Basically, all of
The
No, I would not install As a rule of thumb, you can assume that dependencies under (*) Both CRAN and Bioconductor are actually considered mainstream R repositories, so you can have dependencies from a CRAN package to a Bioconductor package, e.g. https://cran.r-project.org/package=PSCBS (imports DNAcopy from Bioconductor - yeah, I'm guilty to that one). Now to something I wanted to ask for a while. As you might be aware, CRAN hit 10,000 R packages yesterday. Do you intend to provide Spack There exists a few efforts targeting reproducibility (package version dependencies etc) for R that builds on top of the R framework to handle this (https://cran.r-project.org/web/views/ReproducibleResearch.html), e.g.
It might be a better idea to have the R community worry about this, especially since that's where you most likely where users are going to get support for these type of needs. The way I can see Spack being most valuable for R, in addition to install R itself, is to provide / install necessary compilers and libraries needed on the PATH / LD_LIBRARY_PATH / ... for those packages to install out of the box when the user calls: install.packages("foo") I'd assume this will be the 99.9% use case everyone has. Personally, I don't think that I'll be installing R packages via |
This is a really good writeup. I've tagged this thread so it can make it into our documentation. |
@HenrikBengtsson Thanks for the thorough explanation! So it sounds like as far as Spack is concerned, depends/imports/linkingTo are all basically equivalent, and all of these dependencies are needed at build/run time but aren't needed for linking (with RPATH). I agree with you that it is a total pain in the ass to package all of these dependencies manually. I do plan on making automatic creation of R packages simpler, where you could run something like:
and Spack would create something that looked like:
Then, you would just have to add a description and add dependencies. This really isn't that bad, but updating all of the packages to the latest version is kind of annoying. We do need a better way of automating that. I'm very new to the R ecosystem (like about a week into it), but I see a lot of parallels between the difficulty I've had with installing Python packages with Spack. Basially, Spack is an incredible build system when it comes to C/C++/Fortran packages that need to be compiled and linked, but it's a bit overboard for packages that don't require linking (Python, R). For these packages, it isn't hard to upgrade a single package without breaking others, and I've yet to have a user who really cared exactly how it was built or what version it was built with. For things like HDF5 or NetCDF, people really care about building with certain compilers or MPI libraries, but for Python and R, no one really cares. I'm torn between investing time in Spack's R/Python packages and just giving up and using Anaconda. Commands like We've flirted with the idea of using other package managers like pip internally in Spack to install things, but we've never committed. One thing I can say is that for users who do not have internet access (on restricted clusters), Spack makes it easy for them to download an entire mirror of R packages and install them on their own. And for R packages that require non-R dependencies (not sure how common this is) Spack makes it easy to install them. As someone much more familiar with R, how likely is it that different users will want different versions of an R package, or ones built in a different way? Can I safely assume that they really don't care and just want the latest and greatest of everything they require? Can I run |
CRAN provides a single file https://cran.r-project.org/src/contrib/PACKAGES, which is used by R's
This should allow you to pull down dependencies too. I'm actually surprise that it does not provide full package A more modern alternative that also provides this information would be to use the METACRAN (https://r-pkg.org/services#api & https://github.com/metacran/crandb#readme) API. For instance, compare https://cran.r-project.org/package=RcppArmadillo with: curl https://crandb.r-pkg.org/RcppArmadillo
{"Package":"RcppArmadillo","Type":"Package","Title":"'Rcpp' Integration for the 'Armadillo' Templated Linear Algebra\u000aLibrary","Version":"0.7.600.1.0","Date":"2016-12-16","Author":"Dirk Eddelbuettel, Romain Francois and Doug Bates","Maintainer":"Dirk Eddelbuettel <edd@debian.org>","Description":"'Armadillo' is a templated C++ linear algebra library (by Conrad\u000aSanderson) that aims towards a good balance between speed and ease of use. Integer,\u000afloating point and complex numbers are supported, as well as a subset of\u000atrigonometric and statistics functions. Various matrix decompositions are\u000aprovided through optional integration with LAPACK and ATLAS libraries.\u000aThe 'RcppArmadillo' package includes the header files from the templated\u000a'Armadillo' library. Thus users do not need to install 'Armadillo' itself in\u000aorder to use 'RcppArmadillo'. 'Armadillo' is licensed under the MPL 2.0, while\u000a'RcppArmadillo' (the 'Rcpp' bindings/bridge to Armadillo) is licensed under the\u000aGNU GPL version 2 or later, as is the rest of 'Rcpp'. Note that Armadillo\u000arequires a fairly recent compiler; for the g++ family at least version 4.6.*\u000ais required.","License":"GPL (>= 2)","LazyLoad":"yes","LinkingTo":{"Rcpp":"*"},"Imports":{"Rcpp":">= 0.11.0","stats":"*","utils":"*"},"Suggests":{"RUnit":"*","Matrix":"*","pkgKitten":"*"},"URL":"http://dirk.eddelbuettel.com/code/rcpp.armadillo.html","BugReports":"https://github.com/RcppCore/RcppArmadillo/issues","NeedsCompilation":"yes","Packaged":"2016-12-16 11:55:10.107195 UTC; edd","Repository":"CRAN","Date/Publication":"2016-12-18 10:31:12","crandb_file_date":"2016-12-18 09:32:52","date":"2016-12-18T09:31:12+00:00","releases":[]} The guy behind METACRAN is a solid active long-term contributor to the R community. I'd consider this source reliable and sustainable.
Yes, there are quite a few people who use different versions of R in parallel. For instance, the Bioconductor Project provides a "release" and "devel" branch of R packages. The "release" set is frozen twice a year (and only allows bug fixes). People who need access to the latest bioinformatics methods are likely to use the "devel" branch. Now, the "release" branch is tied to the most recent R version (e.g. R 3.3.2) whereas the "devel" branch requires the user to run the developers version of R (e.g. R 3.4.0 devel). (it's actually a bit more complicated than this depending on time of the year, but lets ignore that). When a user installs a package in R, it defaults to installing under
R will automatically take care to which a package is installed. The user don't have to worry about that, e.g. $ Rscript --version
R scripting front-end version 3.3.2 (2016-10-31)
$ Rscript -e ".libPaths()[1]"
[1] "/home/hb/R/x86_64-pc-linux-gnu-library/3.3" It is possible to control these paths via different environment variables, so you could imagine that you extend the above directory structure to reflect various compiler options etc.
Yes, that will automatically install dependencies. install.packages("rminer") defaults to install.packages("rminer", dependencies = c("Depends", "Imports", "LinkingTo")) An R user can update all installed R packages using: update.packages(ask = FALSE) That's all. Note that this will only updated packages with R x.y.*. That is, I can update my R 3.3.* packages this way, but when R is updated to the next major release (e.g. R 3.4.0), I have to reinstall all packages again. 2017-01-29: Updated a link to point directly to https://r-pkg.org/services#api and https://github.com/metacran/crandb#readme |
@HenrikBengtsson A couple more questions for you.
For example, Autotools has:
while Python has:
We are currently using:
but I noticed there are a few other phases as well. There is a
for each package to separate things out? I'm hoping the check phase could be a reliable way to tell whether or not the package was built correctly.
I saw somewhere that someone recommended: MAKE='make -j8' R CMD INSTALL How reliable is this? |
It looks like R has some built-in support for testing installed packages. We might want to add that. We can also attempt to import (require/library) these packages after installation. |
I finally got all of my packages to install, only to find that I can't activate them. R comes with a few pre-installed libraries:
Due to this, I can't activate any of these packages, or packages that depend on them. Perhaps we should remove these packages from Spack and always depend on the versions that come with R? |
When you install an R package from source, your install the
If you want a specific version, you can pass the tar.gz URL as in To run tests post-installation, you would run them on on the download tar.gz file, i.e. I'm not aware of a way to install in parallel from the command line using Rscript -e "install.packages('<url>', type = 'source', Ncpus = 4)" For more information on how R does the parallel builds, see FYI, |
@HenrikBengtsson @tgamblin @glennpj @JavierCVilla Ok, I need an executive decision here. R comes with a few pre-installed packages:
These packages are also available on CRAN, and some of them are in Spack. The problem is that since the packages are already present in the R installation, I am unable to activate any of them or any of the packages that depend on them due to conflicts. I can think of two options: 1. Remove these packages from SpackNever depend on Spack-built versions of these packages. Instead of removing them completely, I might just leave them present but raise an error during install saying to never depend on this fake package. That will prevent this problem from creeping back. Pros: Less building, less concretization, 2. Keep them but ignore everything when symlinkingUsers can still link to whatever version they want, but when they activate the package, they'll get the versions that come with R. Pros: Can pick a specific version if you really want to, no fake packages Until we make a decision on this, |
Option 1. It was mentioned above that people typically upgrade R when they
want a newer care package. Does mix 'n match of core packages even work in
a practical sense?
…On Feb 2, 2017 4:32 PM, "Adam J. Stewart" ***@***.***> wrote:
@HenrikBengtsson <https://github.com/HenrikBengtsson> @tgamblin
<https://github.com/tgamblin> @glennpj <https://github.com/glennpj>
@JavierCVilla <https://github.com/JavierCVilla> Ok, I need an executive
decision here. R comes with a few pre-installed packages:
$ ls /blues/gpfs/home/software/spack-0.10.0/opt/spack/linux-centos6-x86_64/gcc-6.1.0/r-3.3.2-puezz6voxkdfcnjbq7jxcmraojulsw72/rlib/R/library/
base codetools graphics lattice mgcv rpart stats4 translations
boot compiler grDevices MASS nlme spatial survival utils
class datasets grid Matrix nnet splines tcltk
cluster foreign KernSmooth methods parallel stats tools
These packages are also available on CRAN, and some of them are in Spack.
The problem is that since the packages are already present in the R
installation, I am unable to activate any of them or any of the packages
that depend on them due to conflicts. I can think of two options:
1. Remove these packages from Spack
Never depend on Spack-built versions of these packages. Instead of
removing them completely, I might just leave them present but raise an
error during install saying to never depend on this fake package. That will
prevent this problem from creeping back.
*Pros*: Less building, less concretization,
*Cons*: Can't pick a specific version for these packages, ugly fake
packages
2. Keep them but ignore everything when symlinking
Users can still link to whatever version they want, but when they activate
the package, they'll get the versions that come with R.
*Pros*: Can pick a specific version if you really want to, no fake
packages
*Cons*: Non-deterministic behavior? If you activate a package, you will
get the version from R, not the version from Spack. May be more trouble
than it's worth
Until we make a decision on this, spack activate + R is a no-go.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#2951 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AB1cd2RU3IPgUeoOWtXxp7509WdeM9vxks5rYktagaJpZM4LwPxk>
.
|
I'd say also Option 1. Since these packages work as R extensions I think there's no reason to provide them twice as R core and R packages.
In case that a user needs an specific version for one of these packages Spack should propose him to use a different R version that may include it. Giving the option to specify a version and them activate a different one could end up in a mess for the user. |
Technically R will automatically (try to) install all R package dependencies on CRAN but does not automatically install the (non-R) system requirements listed in some packages' DESCRIPTION file in the Many R packages don't have non-R system requirements or depend on any R packages with system requirements in which case |
I've never used R before, but a user asked for a few R modules, which happen to depend on hundreds of others, so I've found myself adding a lot of new packages. Since I don't really know much about R, I have a few questions on how I should proceed with these packages:
I noticed there are some modules that come with R (methods, grids, stats) that don't exist on cran and can be imported easily. But there are also a few modules that come with R (rpart, survival) that are on cran and print a message when imported. Should I add packages for/dependencies on the latter?
I noticed there are several types of dependencies (depends, imports, linkingTo, suggests). What are the differences? If I had to guess, I would say depends==build/run, imports==run, linkingTo==build/link, and suggests==build/run but is optional. Is this correct?
Should I add suggested dependencies as long as they don't create a circular dependency? I imagine that build failures are rare and things build quickly, just like with Python, so I'm inclined to add them.
@glennpj @JavierCVilla
The text was updated successfully, but these errors were encountered: