Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

options("repos") only changed to MRAN when scanForPackages = TRUE and there is a new package to install #274

Closed
markusdumke opened this issue Nov 9, 2018 · 5 comments

Comments

@markusdumke
Copy link

@markusdumke markusdumke commented Nov 9, 2018

First of all thanks for the checkpoint package, I am using this a lot to ensure reproducibility of my analyses!

Recently I found some (for me) surprising behaviour of checkpoint, which seems to be a bug to me.

What I thought calling checkpoint::checkpoint would do:

  • Change the .libPaths so new packages are loaded and installed to the checkpoint folder
  • Set options("repos") to the MRAN snapshot, so calling install.packages() will install from the MRAN website instead of CRAN.

But the second point only seems to be TRUE if I run checkpoint with scanForPackages = TRUE and there is a new package found, which is not already installed. Else option("repos") is not changed, so install.packages will install the latest package from CRAN into the checkpoint folder.
I think this is very confusing and probably has negative effects on reproducibility.

I see this code inside the checkpoint function:

if(length(packages.to.install) > 0) {
    # set repos
    setMranMirror(snapshotUrl = snapshoturl)

So repos is only changed when there are new packages to install. Wouldn't it be better to change this independently even if there are no new packages to install? Because users will still install new packages with install.packages and if these packages are installed from cran.rstudio.com the whole point of reproducibility with checkpoint is contradicted.

Here is example code to reproduce the problem:

.libPaths()
#> [1] "C:/ProgrammePAM/R-3.5.1/library"

options("repos")
#> $repos
#> [1] "https://cran.rstudio.com/"   "https://cloud.r-project.org"

checkpoint::checkpoint("2018-06-01",
                       checkpointLocation = "C:/R",
                       scanForPackages = FALSE)
#> Skipping package scanning
#> checkpoint process complete
#> ---

.libPaths()
#> [1] "C:/R/.checkpoint/2018-06-01/lib/x86_64-w64-mingw32/3.5.1"
#> [2] "C:/R/.checkpoint/R-3.5.1"                                
#> [3] "C:/PROGRA~4/R-35~1.1/library"

# repos is not changed to MRAN!
options("repos")
#> $repos
#> [1] "https://cran.rstudio.com/"   "https://cloud.r-project.org"

checkpoint::checkpoint("2018-06-01",
                       checkpointLocation = "C:/R",
                       scanForPackages = TRUE)
#> Scanning for packages used in this project
#> No file at path 'C:\Users\QXV6024\AppData\Local\Temp\Rtmpek7pGt\file344416693e26.Rmd'.
#> - Discovered 3 packages
#> Installing packages used in this project
#>  - Installing 'A3'
#> A3
#> also installing the dependency 'pbapply'
#> checkpoint process complete
#> ---

library(A3)
#> Loading required package: xtable
#> Loading required package: pbapply

.libPaths()
#> [1] "C:/R/.checkpoint/2018-06-01/lib/x86_64-w64-mingw32/3.5.1"
#> [2] "C:/R/.checkpoint/R-3.5.1"                                
#> [3] "C:/PROGRA~4/R-35~1.1/library"

# Now repos is changed to mran!
options("repos")
#> $repos
#> [1] "https://mran.microsoft.com/snapshot/2018-06-01"
@martincadek
Copy link

@martincadek martincadek commented Mar 13, 2019

I agree with the comment above, I've been scratching my head with similar behaviour. After I've installed my packages I've realised that when I started the project again, checkpoint date doesn't get updated automatically.

I though something is wrong but it's probably the expected behaviour as suggested in the comment above.

    library("checkpoint")
# Create a checkpoint by specifying a snapshot date
checkpoint("2019-03-10", scanForPackages = TRUE) # R version 3.5.1 (2018-07-02)

Outputs:

Scanning for packages used in this project
|==============================================================================| 100%
- Discovered 8 packages
All detected packages already installed
checkpoint process complete
---
# Check that CRAN mirror is set to MRAN snapshot
getOption("repos")

Outputs: (note: I am using Open R)

 CRAN 
"https://mran.microsoft.com/snapshot/2018-08-01" 
                                       CRANextra 
            "http://www.stats.ox.ac.uk/pub/RWin" 

However, I would have expected: "https://mran.microsoft.com/snapshot/2019-03-10" as this is THE checkpoint date I've specified. Is there a rationale behind this behaviour? It would be helpful to describe it in help file.

@markusdumke
Copy link
Author

@markusdumke markusdumke commented Mar 13, 2019

Yes, I agree this is a confusing and it would help a lot if it would be clarified in the checkpoint documentation.

The second point you have to think about are the library paths where R looks for packages. checkpoint will put the path to the checkpoint library in the first place. But your normal user library is still there in the second position. This means if a package is missing in your checkpoint library (e.g. because installation failed), but it is installed in your normal user library (with any package version) it will just use it. This is also very dangerous in terms of reproducibility. So I am using now a solution similar to this:

checkpoint::checkpoint("2019-03-13", scanForPackages = TRUE)

# To change the CRAN mirror to MRAN mirror of specified date
checkpoint::setSnapshot("2019-03-13")

# Make sure that packages are loaded from checkpoint directory
library(data.table, lib.loc = .libPaths()[1])

@martincadek
Copy link

@martincadek martincadek commented Mar 14, 2019

So I am using now a solution similar to this:

checkpoint::checkpoint("2019-03-13", scanForPackages = TRUE)

# To change the CRAN mirror to MRAN mirror of specified date
checkpoint::setSnapshot("2019-03-13")

# Make sure that packages are loaded from checkpoint directory
library(data.table, lib.loc = .libPaths()[1])

This seems like a good solution to ensure your collaborators use appropriate libraries. I'd probably even put the .lib.Paths in .Rprofile of the project as suggested here for example. Right now I've decided to just lazily use what you suggest above assign(".lib.loc", .libPaths()[1], envir = environment(.libPaths)) but edited it to assign the path in .libPaths as the only path in the current environment. Probably safe to check that .libPaths() is still set correctly but it saves time. Maybe this could be implemented in checkpoint as setLibrary (to complement setSnapshot). This would assume all packages are in checkpoint lib though.

@vspinu
Copy link

@vspinu vspinu commented Nov 18, 2019

Any news on this?

One currently needs a hole set of workarounds to make it work as advertised.

This is what I have currently:

  snapshot <- "2019-11-01"
  # set it by default; otherwise pinging takes ages
  options(checkpoint.mranUrl = "https://mran.microsoft.com/")
  # Scanning takes ages (due to slow url checks), but we need to scan if the
  # repo doesn't exist
  # https://github.com/RevolutionAnalytics/checkpoint/issues/281
  do_scan <- !snapshot %in% checkpoint::checkpointArchives() 
  checkpoint::checkpoint(snapshot, scanForPackages = do_scan, verbose = interactive())
  ## https://github.com/RevolutionAnalytics/checkpoint/issues/274
  checkpoint::setSnapshot(snapshot, FALSE)
 

@hongooi73
Copy link
Contributor

@hongooi73 hongooi73 commented Mar 31, 2020

This should be resolved in the new v1.0 checkpoint, just pushed to master. If you want to use an existing checkpoint without installing any packages:

use_checkpoint("snapshot_date")

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants