Skip to content

arraytools/RinDocker

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

This framework here provides a way to reproduce an analysis for codes written in R.

File Structure

$ tree ~/RinDocker
├── 01_data
│   └── us-500.csv
├── 02_code
│   ├── install_packages.R
│   └── myScript.R
├── 03_output
│   ├── myplot.png
│   └── plot_data.csv
├── Dockerfile
├── Dockerfile_base
└── Readme.md

Instruction

  1. Create a project directory (e.g. ~/RinDocker). Create 3 new subdirectories to store data, code and output files separately.
mkdir -p ~/RinDocker/{01_data,02_code,03_output}
  1. Create an intermediate image containing most useful OS level tools and R packages (myname/base-r-tidyverse:3.5.2). Customize the tools by editing Dockerfile_base file and R packages by editing 02_code/install_packages.R file. In this example, 3 R packages (readr, dplyr, gplot2) are installed.
# Remove an old image if it has existed already
# docker rmi myname/r-tidyverse:3.5.2
docker build --network=host -t myname/r-tidyverse:3.5.2 -f Dockerfile_base .
  1. Create a project specific image (myname/project001) which will host R code for the analysis. The R code is stored under 02_code/myScript.R. Note any missing R packages (forcats in this case) can be installed here through Dockerfile file.
docker build --network=host -t myname/project001 -f Dockerfile .
# If something myScript.R is changed, run the following again
#   docker rmi myname/project001
#   docker build -t myname/project001 -f Dockerfile .
  1. Run the R code in a disposable container. The results will be saved under 03_output directory.
docker run -it --rm \
  -v ~/RinDocker/01_data:/01_data \
  -v ~/RinDocker/03_output:/03_output \
   myname/project001

# Change the owership of output files
sudo chown -R $USER:$USER 03_output/*

Files

Dockerfile_base

We use R 3.5.2 as the base. The R version will be to tag our Docker image.

FROM r-base:3.5.2

## install debian packages
RUN apt-get update -qq && apt-get -y --no-install-recommends install \
  libxml2-dev \
  libcairo2-dev \
  libcurl4-openssl-dev \
  libssl-dev

## copy files
COPY 02_code/install_packages.R /install_packages.R

## install R-packages
RUN Rscript /install_packages.R

Dockerfile (project specific)

We assume the intermediate docker image containing necessary packages for our analysis.

FROM myname/r-tidyverse:3.5.2

## create directories
RUN mkdir -p /01_data
RUN mkdir -p /02_code
RUN mkdir -p /03_output

RUN R -e "install.packages(c('forcats'))"

## copy files
COPY 02_code/myScript.R /02_code/myScript.R

## run the script
CMD Rscript /02_code/myScript.R

myScript.R (under 02_code directory)

  • Put the input files in /01_data directory
  • Save the output files in /03_output directory

To Dos

  • Make the R script independent of the Docker image. This problem is less important since building the project specific image is quick.
  • Integrate packrat in order to lock R package versions.

References

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published