Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
7 changed files
with
1,024 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
NOT_CRAN="true" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,67 @@ | ||
## ----setup, include=FALSE, fig.asp=4/3----------------------------------- | ||
|
||
## Create temporary project and set working directory | ||
|
||
|
||
example_project <- tempdir() | ||
|
||
dir.create(example_project, recursive = TRUE, showWarnings = FALSE) | ||
oldRepos <- getOption("repos") | ||
oldLibPaths <- .libPaths() | ||
|
||
|
||
## Write dummy code file to project | ||
|
||
example_code <- ' | ||
library(checkpoint) | ||
checkpoint("2015-04-26", checkpointLocation = tempdir()) | ||
# Example from ?darts | ||
library(darts) | ||
x = c(12,16,19,3,17,1,25,19,17,50,18,1,3,17,2,2,13,18,16,2,25,5,5, | ||
1,5,4,17,25,25,50,3,7,17,17,3,3,3,7,11,10,25,1,19,15,4,1,5,12,17,16, | ||
50,20,20,20,25,50,2,17,3,20,20,20,5,1,18,15,2,3,25,12,9,3,3,19,16,20, | ||
5,5,1,4,15,16,5,20,16,2,25,6,12,25,11,25,7,2,5,19,17,17,2,12) | ||
mod = simpleEM(x, niter=100) | ||
e = simpleExpScores(mod$s.final) | ||
oldpar <- par(mfrow=c(1, 2)) | ||
drawHeatmap(e) | ||
drawBoard(new=TRUE) | ||
drawAimSpot(e, cex = 5) | ||
par(oldpar) | ||
' | ||
|
||
cat(example_code, file = file.path(example_project, "checkpoint_example_code.R")) | ||
|
||
|
||
## ----checkpoint, warning=FALSE------------------------------------------- | ||
## Create a folder to contain the checkpoint | ||
## This is optional - the default is to use ~/.checkpoint | ||
|
||
dir.create(file.path(tempdir(), ".checkpoint"), recursive = TRUE, showWarnings = FALSE) | ||
|
||
## Create a checkpoint by specifying a snapshot date | ||
|
||
library(checkpoint) | ||
checkpoint("2017-04-01", project = example_project, | ||
checkpointLocation = tempdir()) | ||
|
||
## ----inspect-1----------------------------------------------------------- | ||
getOption("repos") | ||
|
||
## ----inspect-2----------------------------------------------------------- | ||
normalizePath(.libPaths(), winslash = "/") | ||
|
||
## ----inspect-3, eval=FALSE----------------------------------------------- | ||
# installed.packages(.libPaths()[1])[, "Package"] | ||
|
||
## ----cleanup, include=TRUE----------------------------------------------- | ||
## cleanup | ||
|
||
unlink(example_project, recursive = TRUE) | ||
unlink(file.path(tempdir(), "checkpoint_example_code.R")) | ||
unlink(file.path(tempdir(), ".checkpoint"), recursive = TRUE) | ||
options(repos = oldRepos) | ||
unCheckpoint(oldLibPaths) | ||
.libPaths() | ||
|
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,204 @@ | ||
# Using checkpoint for reproducible research | ||
Andrie de Vries | ||
`r Sys.Date()` | ||
|
||
# The Reproducible R Toolkit (RRT) | ||
|
||
The **Reproducible R Toolkit** provides tools to ensure the results of R code are repeatable over time, by anyone. Most R scripts rely on packages, but new versions of packages are released daily. To ensure that results from R are reproducible, it's important to run R scripts using exactly the same package version in use when the script was written. | ||
|
||
The Reproducible R Toolkit provides an R function checkpoint, which ensures that all of the necessary R packages are installed with the correct version. This makes it easy to reproduce your results at a later date or on another system, and makes it easier to share your code with the confidence that others will get the same results you did. | ||
|
||
The Reproducible R Toolkit also works in conjunction with the "checkpoint-server", which makes a daily copy of all CRAN packages, to guarantee that every package version is available to all R developers thereby ensuring reproducibility. | ||
|
||
## Components of RRT | ||
|
||
RRT is a collection of R packages and the checkpoint server that together make your work with R packages more reproducible over time by anyone. | ||
|
||
### The checkpoint server | ||
|
||
To achieve reproducibility, daily snapshots of CRAN are stored on our checkpoint server. At midnight UTC each day, we refresh our mirror of [CRAN](https://cran.r-project.org/) is refreshed. When the rsync process is complete, the checkpoint server takes and stores a snapshot of the CRAN mirror as it was at that very moment. These daily snapshots can then be accessed on the [MRAN website](https://mran.microsoft.com/snapshot) or using the `checkpoint` package, which installs and consistently use these packages just as they existed at midnight UTC on a specified snapshot date. Daily snapshots are available as far back as `2014-09-17`. For more information, visit the [checkpoint server GitHub site](https://github.com/RevolutionAnalytics/checkpoint-server). | ||
|
||
![checkpoint server](checkpoint-server.png) | ||
|
||
### The checkpoint package | ||
|
||
The goal of the `checkpoint` package is to solve the problem of package reproducibility in R. Since packages get updated on CRAN all the time, it can be difficult to recreate an environment where all your packages are consistent with some earlier state. To solve this issue, `checkpoint` allows you to install packages locally as they existed on a specific date from the corresponding snapshot (stored on the checkpoint server) and it configures your R session to use only these packages. Together, the `checkpoint` package and the checkpoint server act as a CRAN time machine so that anyone using `checkpoint()` can ensure the reproducibility of their scripts or projects at any time. | ||
|
||
![checkpoint package](checkpoint-pkg.png) | ||
|
||
|
||
|
||
# Using checkpoint | ||
|
||
Using `checkpoint` is simple: | ||
|
||
- `checkpoint` has only a single function, `checkpoint()` where you specify the snapshot date. | ||
|
||
## Using the checkpoint function | ||
|
||
When you create a checkpoint, the `checkpoint()` function: | ||
|
||
- Creates a snapshot folder to install packages. This library folder is located at `~/.checkpoint` | ||
- Scans your project folder for all packages used. Specifically, it searches for all instances of `library()` and `require()` in your code. | ||
- Installs these packages from the MRAN snapshot into your snapshot folder using `install.packages()` | ||
- Sets options for your CRAN mirror to point to a MRAN snapshot, i.e. modify `options(repos)` | ||
|
||
This means the remainder of your script will run with the packages from a specific date. | ||
|
||
|
||
## Sharing your scripts for reproducibility | ||
|
||
Sharing your script to be reproducible is as easy as: | ||
|
||
- Load the `checkpoint` package using `library(checkpoint)` | ||
- Ensure you specify `checkpoint()` with your checkpoint date, e.g. `checkpoint("2014-09-17")` | ||
|
||
Then send this script to your collaborators. When they run this script on their machine, `checkpoint` will perform the same steps of installing the necessary packages, creating the `checkpoint` snapshot folder and producing the same results. | ||
|
||
|
||
## Resetting the checkpoint | ||
|
||
To reset the checkpoint, simply restart your R session. | ||
|
||
|
||
|
||
## Worked example | ||
|
||
To create a checkpoint project, you do: | ||
|
||
1. Create a new folder and change your working directory to this folder. If you use an IDE like RStudio, this is identical to creating a new RStudio project. | ||
2. Add your R script files to this folder. | ||
3. Add a checkpoint to the top of the script: | ||
|
||
```r | ||
library(checkpoint) | ||
checkpoint("2015-04-26", checkpointLocation = tempdir()) | ||
``` | ||
|
||
4. Run the script. | ||
|
||
### Create a checkpoint project | ||
|
||
For example, your script may look like this: | ||
|
||
```r | ||
library(checkpoint) | ||
checkpoint("2015-04-26", checkpointLocation = tempdir()) | ||
|
||
# Example from ?darts | ||
library(darts) | ||
x = c(12,16,19,3,17,1,25,19,17,50,18,1,3,17,2,2,13,18,16,2,25,5,5, | ||
1,5,4,17,25,25,50,3,7,17,17,3,3,3,7,11,10,25,1,19,15,4,1,5,12,17,16, | ||
50,20,20,20,25,50,2,17,3,20,20,20,5,1,18,15,2,3,25,12,9,3,3,19,16,20, | ||
5,5,1,4,15,16,5,20,16,2,25,6,12,25,11,25,7,2,5,19,17,17,2,12) | ||
mod = simpleEM(x, niter=100) | ||
e = simpleExpScores(mod$s.final) | ||
oldpar <- par(mfrow=c(1, 2)) | ||
drawHeatmap(e) | ||
drawBoard(new=TRUE) | ||
drawAimSpot(e, cex = 5) | ||
par(oldpar) | ||
``` | ||
|
||
|
||
### Run the checkpoint code | ||
|
||
Next you want to run the script. Here is what `checkpoint` does: | ||
|
||
|
||
|
||
|
||
|
||
```r | ||
## Create a folder to contain the checkpoint | ||
## This is optional - the default is to use ~/.checkpoint | ||
|
||
dir.create(file.path(tempdir(), ".checkpoint"), recursive = TRUE, showWarnings = FALSE) | ||
|
||
## Create a checkpoint by specifying a snapshot date | ||
|
||
library(checkpoint) | ||
checkpoint("2017-04-01", project = example_project, | ||
checkpointLocation = tempdir()) | ||
``` | ||
|
||
``` | ||
## Scanning for packages used in this project | ||
``` | ||
|
||
``` | ||
## - Discovered 2 packages | ||
``` | ||
|
||
``` | ||
## Installing packages used in this project | ||
``` | ||
|
||
``` | ||
## - Installing 'darts' | ||
``` | ||
|
||
``` | ||
## darts | ||
``` | ||
|
||
``` | ||
## checkpoint process complete | ||
``` | ||
|
||
``` | ||
## --- | ||
``` | ||
|
||
### Inspecting the results | ||
|
||
Now inspect the results. First, check that your CRAN mirror is set to MRAN snapshot: | ||
|
||
|
||
|
||
```r | ||
getOption("repos") | ||
``` | ||
|
||
``` | ||
## [1] "https://mran.microsoft.com/snapshot/2017-04-01" | ||
``` | ||
|
||
|
||
Next, check that the library path is set to `~/.checkpoint`: | ||
|
||
|
||
```r | ||
normalizePath(.libPaths(), winslash = "/") | ||
``` | ||
|
||
``` | ||
## [1] "C:/Users/adevries/AppData/Local/Temp/RtmpINl2hR/.checkpoint/2017-04-01/lib/x86_64-w64-mingw32/3.4.1" | ||
## [2] "C:/Users/adevries/AppData/Local/Temp/RtmpINl2hR/.checkpoint/R-3.4.1" | ||
## [3] "C:/R/R-3.4.1/library" | ||
``` | ||
|
||
Finally, check which packages are installed in checkpoint library: | ||
|
||
|
||
```r | ||
installed.packages(.libPaths()[1])[, "Package"] | ||
``` | ||
|
||
|
||
|
||
```r | ||
## cleanup | ||
|
||
unlink(example_project, recursive = TRUE) | ||
unlink(file.path(tempdir(), "checkpoint_example_code.R")) | ||
unlink(file.path(tempdir(), ".checkpoint"), recursive = TRUE) | ||
options(repos = oldRepos) | ||
unCheckpoint(oldLibPaths) | ||
.libPaths() | ||
``` | ||
|
||
``` | ||
## [1] "C:/Users/adevries/Documents/R/win-library/3.4" | ||
## [2] "C:/R/R-3.4.1/library" | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,69 @@ | ||
## ----setup-1, include=FALSE---------------------------------------------- | ||
|
||
## Write dummy code file to project | ||
example_code <- ' | ||
library(darts) | ||
' | ||
dir.create(tempdir(), recursive = TRUE) | ||
cat(example_code, file = file.path(tempdir(), "managing_checkpoint_example_code.R")) | ||
|
||
## ----checkpoint, results="hide", message=FALSE, warning=FALSE------------ | ||
dir.create(file.path(tempdir(), ".checkpoint"), recursive = TRUE, showWarnings = FALSE) | ||
options(install.packages.compile.from.source = "no") | ||
oldLibPaths <- .libPaths() | ||
|
||
## Create a checkpoint by specifying a snapshot date | ||
library(checkpoint) | ||
checkpoint("2015-04-26", project = tempdir(), checkpointLocation = tempdir()) | ||
|
||
|
||
## ----archives-1---------------------------------------------------------- | ||
# List checkpoint archives on disk. | ||
checkpointArchives(tempdir()) | ||
|
||
## ----archives-2---------------------------------------------------------- | ||
checkpointArchives(tempdir(), full.names = TRUE) | ||
|
||
## ----access-------------------------------------------------------------- | ||
# Returns the date the snapshot was last accessed. | ||
getAccessDate(tempdir()) | ||
|
||
|
||
## ----remove-1, eval=FALSE------------------------------------------------ | ||
# # Remove singe checkpoint archive from disk. | ||
# checkpointRemove("2015-04-26") | ||
|
||
## ----remove-2, eval=FALSE------------------------------------------------ | ||
# # Remove range of checkpoint archives from disk. | ||
# checkpointRemove("2015-04-26", allSinceSnapshot = TRUE) | ||
# checkpointRemove("2015-04-26", allUntilSnapshot = = TRUE) | ||
# | ||
|
||
## ----remove-3, eval=FALSE------------------------------------------------ | ||
# # Remove snapshot archives that have not been used recently | ||
# checkpointRemove("2015-04-26", notUsedSince = TRUE) | ||
# | ||
|
||
## ----logfile-1----------------------------------------------------------- | ||
dir(file.path(tempdir(), ".checkpoint")) | ||
|
||
## ----logfile-2----------------------------------------------------------- | ||
|
||
log_file <- file.path(tempdir(), ".checkpoint", "checkpoint_log.csv") | ||
log <- read.csv(log_file) | ||
head(log) | ||
|
||
## ----uncheckpoint-1------------------------------------------------------ | ||
.libPaths() | ||
|
||
## ----uncheckpoint-2------------------------------------------------------ | ||
# Note this is still experimental | ||
unCheckpoint(oldLibPaths) | ||
.libPaths() | ||
|
||
## ----cleanup, include=TRUE----------------------------------------------- | ||
## cleanup | ||
unlink("manifest.R") | ||
unlink(file.path(tempdir(), "managing_checkpoint_example_code.R")) | ||
unlink(file.path(tempdir(), ".checkpoint"), recursive = TRUE) | ||
|
Oops, something went wrong.