Permalink
Switch branches/tags
Nothing to show
Find file Copy path
Fetching contributors…
Cannot retrieve contributors at this time
256 lines (196 sloc) 10.5 KB
title output
Reproducible Research using Docker and R
ioslides_presentation

Motives for using Docker

  • Difficulty of managing dependencies
  • Maximizing isolation and transparency
  • Portability of computational environment
  • Make extendibility and reuse easy
  • Ease of use generally

Limitations of VMs

  • Size: VMs are large files which makes them impractical to store and transfer
  • Performance: running VMs consumes significant CPU and memory

VM vs Docker

Drawing

What Docker is

  • A shipping container for the online universe: hardware-agnostic and platform-agnostic
  • A tool that lets programmers neatly package software and move it from machine to machine.
  • Released as open source in March 2013, a big deal on github: 18.6k stars, 3.8k forks

Basic ingredients

Docker's limitations

  • Security: it is possible for a hosted image to be written with some malicious intent
  • Limited to 64-bit host machines, making it impossible to run on older hardware
  • Does not provide complete virtualization but relies on the Linux kernel provided by the host
  • On OSX and Windows this means a VM must be present (boot2docker installs VirtualBox for this)

Getting started on OSX & Windows

  • Install & start boot2docker
  • docker pull <username>/<image_name> gets an existing image from registry
  • eg. docker pull ubuntu notice there's no username here, because this is an 'official repo'
  • after docker pull then docker run
  • or simply docker run, which will pull, create and run in one step

Common arguments for docker run:

Argument Explanation
-i Interactive (usually used with -t)
-t Give a terminal interface for a CLI
-p Publish Ports: -p <host port>:<container port>
-d Detached mode: run the container in the background (opposite of -i -t)
-v Mount a volume from inside your container (that has been specified with the VOLUME instruction in the Dockerfile)
-rm=true Remove your container from the host when it stops running (only available with -it)

Examples of docker run

  • docker run -it ubuntu
  • gets ubuntu and gives us a terminal for interaction
  • docker run -dp 8787:8787 rocker/rstudio
  • gets R & RStudio and opens port 8787 for using RStudio server in a web browser at localhost:8787 (linux) or 192.168.59.103:8787 (Windows, OSX)

Interacting with docker at the command line

Command Explanation
docker ps list all the running containers on the host
docker ps -a list all the containers on the host, including those that have stopped
docker exec -it <container-id> bash opens bash shell for a currently running container
docker stop <container-id> stop a running container
docker kill <container-id> force stop a running container

Interacting with docker at the command line

Command Explanation
docker rm <container-id> removes (deletes) a container
docker rmi <container-id> removes (deletes) an image
docker rm -f $(docker ps -a -q) remove all current containers
docker rmi -f $(docker images -q) remove all images, even those not in use

Writing a Dockerfile

  • It is possible to use docker commit <container> to commit a container's file changes or settings into a new image
  • But it is better to use Dockerfiles & git to manage your images in a documented and maintainable way
  • A Dockerfile is a short plain text file that is a recipie for making a docker image

Some common Dockerfile elements

  • FROM specifies which base image your image is built on (ultimately back to Debian)
  • MAINTAINER specifies who created and maintains the image.
  • CMD specifies the command to run immediately when a container is started from this image, unless you specify a different command.
  • ADD will copy new files from a source and add them to the containers filesystem path
  • RUN does just that: It runs a command inside the container (eg. apt-get)
  • EXPOSE tells Docker that the container will listen on the specified port when it starts
  • VOLUME will create a mount point with the specified name and tell Docker that the volume may be mounted by the host

Using Dockerfiles

  • To build an image from a dockerfile: docker build --rm -t <username>/<image_name> <dockerfile>
  • simple and moderately complex examples
  • To send an image to the registry: docker push <username>/<image_name> You need to be registered at the hub bfore pushing

Automated Docker image build testing

Doing research with RStudio and Docker

  • The rocker project provides images that include R, key packages and other dependencies (RStudio, pandoc, LaTeX, etc.), and has excellent documentation on the github wiki
  • I run RStudio server in the browser, with host folder as volume, very easy to use
  • I store scripts on host volume because VC is simpler this way, but do development and analysis in container for isolation

I get started with...

docker run -dp 8787:8787 -v /c/Users/marwick/docker:/home/rstudio/ -e ROOT=TRUE rocker/hadleyverse

  • -dp 8787:8787 gives me a port for the web browser to access RStudio
  • -v /c/Users/marwick/docker:/home/rstudio/ gives me read and write access both ways between Windows (C:/Users/marwick/docker) and RStudio
  • -e ROOT=TRUE sets an environment variable to enable root access for me so I can manage dependencies
  • I can access the docker (Debian) shell via RStudio for file manipulation, etc. (or docker exec -it <container-id> bash)

...and IPython

  • Choose your favourite from the registry
  • the IPython project have a few images, and there are many user-contributed ones

Cloud computing with Docker is widely supported

  • Amazon EC2 Container Service: docker clusters in the cloud (no registry)
  • Google Compute Engine: has container-optimized VMs
  • Google container registry: secure private docker image storage on google cloud platform
  • Microsoft Azure supports docker containers (docker hub is integrated)

References & further reading

Colophon

Presentation written in R Markdown using ioslides

Compiled into HTML5 using RStudio & knitr

Source code hosting: https://github.com/benmarwick/UW-eScience-docker-for-reproducibility

ORCID: http://orcid.org/0000-0001-7879-4531

Licensing: