Skip to content
This repository has been archived by the owner on Jul 31, 2018. It is now read-only.

Reproduce PMID 28904013 supplementary code output

Notifications You must be signed in to change notification settings

coregenomics/suppl-pmid-28904013

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Setup

The original paper used R 3.3.0 on Debian 7 Wheezy, but because CRAN only released R version 3.2.5 for Wheezy, we use a Singularity image to replicate the R executable.

Install the latest version of singularity for your operating system.

Then run the bootstrap.sh script in the cloned directory to:

  1. Download the supplement data to downloads/,
  2. Install the R packages sandboxed under r-lib/ in the current directory, and finally
  3. Run the analysis.

Why do we specifically need R 3.3.0?

The R minor release version matters because Bioconductor package releases are pinned against them.

Why use a Singularityfile instead of a Dockerfile?

Singularity produces portable, read-only, containers based on a non-proprietary image format and suited to scientific workflows.

Docker has lately been favored by the academic community, but has several troubling drawbacks in practice:

  • A reason many academics use Docker is for Makefile-like caching long operations.
  • But without continuous integration, caching does not make the work reproducible, and there are more sensible language specific memoization operations available such as Biobase::cache() in R.
  • Docker images have also become a crutch for distributing poorly packaged software. In other words, distributing containers has become customary instead of writing the traditional language-specific software package and minimal analysis scripts that rely on package controlled code.
  • Images also encourage bad practices of not separating data and code.
  • Docker has non-sensical default security, such as root privileged containers without user namespace remapping (see NCC Group whitepaper).
  • Docker has poor community support with nearly all questions on the docker forum unanswered.

In industry settings, having the smallest possible infrastructure footprint using containers is important, but less so in academia given the drawbacks above. Preferring language-specific software packaging, and minimizing the user of containers is more extensible, maintainable and reproducibile.

About

Reproduce PMID 28904013 supplementary code output

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published