A template project for challenge analysis in R and Python
The motivation for this project is to encourage the use of portable development environments in research and engineering. The environment should be intuitive to use so that anyone can deploy it and reproduce your results - even you six months from now!
This project provides a portable development environment that enables you to run the analysis included in this repository (see Notebooks). The Docker image sagebionetworks/challenge-analysis provided by this project and that enables to run the notebooks seamlessly is based on the image sagebionetworks/rstudio.
For more information on how to use this repository to develop and publish your own analysis, please read the section How to use this repository.
All packages:
- R (see renv.lock).
- Python (see conda/sage-bionetworks/environment.yml).
- Docker Engine >=19.03.0
The notebooks below are rendered to HTML and published to GitHub Pages by the CI/CD workflow of this repository.
Rmd Notebook | Description | HTML Notebook |
---|---|---|
compare-models-to-baseline.Rmd | A simple description of a bootstrap analysis to determine the performance of participants relative to a comparator model. | |
determine-top-performers.Rmd | A simple description of a bootstrap analysis to determine the top performers in a challenge. | |
ensemble-analysis.Rmd | A simple description of an ensemble analysis for a challenge. | |
survey-analysis.Rmd | A simple description of a post-challenge survey analysis. |
Important: Please make sure when you write your own notebooks that no sensitive information ends up being publicly available. Please check with the information security officer of your organization to confirm that the approach described here can be applied to your use case.
-
Create and edit the configuration file. You can initially start RStudio using this configuration as-is.
cp .env.example .env
-
Start RStudio. Add the option
-d
or--detach
to run in the background.docker compose up
RStudio is now available at http://localhost. On the login page, enter the
default username (rstudio
) and the password specified in .env
.
To stop RStudio, enter Ctrl+C
followed by docker compose down
. If running
in detached mode, you will only need to enter docker compose down
.
You can use the image sagebionetworks/challenge-analysis as-is to start an instance of RStudio and develop tools that interact with Sage Bionetworks services, e.g. Synapse.
If you want to create a portable development environment, start by creating a new GitHub repository from this template. You can then customize your environment by specifying the R and Python packages to include with your image. Finally, edit the the GitHub workflow .github/workflows/ci.yml to indicates the Docker repository where the image should be pushed (see Section Versioning).
Example projects that use this repository / image:
- TBA
In RStudio, use the following options to add and update libraries:
Tools
>Install Packages...
Tools
>Check for Package Updates...
Run the command renv::snapshot()
to update the file renv.lock
, which is used
in Dockerfile
to install the required R libraries.
See the content of the folder conda
for an example of how to define a conda
environment. The packages to add to this environment must be added to the file
requirements.txt
. The creation of one or more Conda environments can be
specified in Dockerfile
.
Set the environment variables SYNAPSE_TOKEN
to the value of one of your
Synapse Personal Access Tokens. If this variable is set, it will be used to
create the configuration file ~/.synapseConfig
when the container starts.
This Docker image comes with Miniconda installed (see below) and an example
Conda environment named challenge-analysis
. This environment includes packages
used to interact with the collaborative platform Synapse developed by Sage
Bionetworks.
Attach to the RStudio container (here assuming that challenge-analysis
is the
name of the container). For better safety, it is recommended to work as a
non-root user. You can then list the environments available, activate an
existing environment or create a new one.
$ docker exec -it challenge-analysis bash
container # su yourusername
container $ conda env list
container $ conda activate challenge-analysis
The R code below lists the environment available before activating the existing
environment named challenge-analysis
.
> library(reticulate)
> conda_list()
name python
1 miniconda /opt/miniconda/bin/python
2 challenge-analysis /opt/miniconda/envs/challenge-analysis/bin/python
> use_condaenv("challenge-analysis", required = TRUE)
When using Docker volumes, permissions issues can arise between the host OS and
the container. You can avoid these issues by letting RStudio know the User ID
(UID) and Group ID (GID) it should use when creating and editting files so that
these IDs match yours, which you can get using the command id
:
$ id
uid=1000(kelsey) gid=1000(kelsey) groups=1000(kelsey)
In this example, we would set USERID=1000
and GROUPID=1000
.
Set the environment variable ROOT=TRUE
(default is FALSE
).
docker logs --follow challenge-analysis
This Docker image provides the command render
that generates an HTML or PDF
notebook from an R notebook (.Rmd). Run the command below from the host to
mount the directory $(pwd)/notebooks
where the R notebook is and generate the
HTML notebook that will be saved to the same directory with the extension
.nb.html
.
docker run --rm \
--env-file .env \
-v $(pwd)/notebooks:/notebooks \
sagebionetworks/challenge-analysis:latest \
render /notebooks/*.Rmd
This repository uses semantic versioning to track the releases of this project. This repository uses "non-moving" GitHub tags, that is, a tag will always point to the same git commit once it has been created.
The artifact published by this repository is the Docker image sagebionetworks/challenge-analysis. The versions of the image are aligned with the versions of R/RStudio, not the GitHub tags of this repository.
The table below describes the image tags available.
Tag name | Moving | Description |
---|---|---|
latest |
Yes | Latest stable release. |
edge |
Yes | Lastest commit made to the default branch. |
weekly |
Yes | Weekly release from the default branch. |
<major> |
Yes | Latest stable major release of this analysis. |
<major>.<minor> |
Yes | Latest stable minor release of this analysis. |
<major>.<minor>.<patch> |
Yes | Latest stable patch release of this analysis. |
<major>.<minor>.<patch>-<sha> |
No | Same as above but with the reference to the git commit. |
You should avoid using a moving tag like latest
when deploying containers in
production, because this makes it hard to track which version of the image is
running and hard to roll back.