This directory contains Dockerfiles for building Docker images, encapsulating steps of the MMAP analysis pipeline into containers.
Each image implements a step in the pipeline, and should be treated as a single command. Parameters are sent to the containers by setting environment variables, taking cues from bioboxes.
See README.md in each subdirectory for further details.
- Genovo - Genovo assembler
- Glimmer - Glimmer gene finder
- go-blast - NCBI BLAST+ configured to use a local BLAST database containing Gene Ontology terms
- makeblastdb-go - Downloads GO annotated sequences from archive.geneontology.org and converts sequence to an NCBI BLAST+ compatible format.
- extract-go-terms - Simple python script that counts/extracts the GO terms from the go-blast results
- mine - MINE - maximal information-based nonparametric exploration
- CSV merging - preprocessing for mine
These images are designed to be run within docker-pipeline
, and treated as simple Unix-style command-line tools. Information is passed to container execution by configuring volumes and setting environment variables. For example:
# Running genovo to assemble reads into contigs
docker run \
-v /Users/dcl9/Data/reads:/mnt/input:ro \
-v /Users/dcl9/Data/contigs:/mnt/output \
-e CONT_INPUT_READS_FILE=/mnt/input/reads.fasta \
-e CONT_OUTPUT_CONTIGS_FILE=/mnt/output/contigs.fasta \
-e CONT_INPUT_ASSEMBLE_ITERATIONS=10 \
dleehr/genovo
Note that the input volume is mounted read-only, and the file paths passed in the environment variables are paths from inside the container.
The environment variables are validated and interpreted by simple wrapper scripts inside the image, which execute the underlying command. These scripts are configured as the image's CMD
, allowing each image to be treated like a Unix-style command-line tool.
The wrapper scripts output the commands they will execute, as well as underyling tool versions.