Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue 129 #151

Merged
merged 11 commits into from
Oct 22, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ install:
- docker build -t abricate:1.0.0 abricate/1.0.0
- docker build -t ariba:2.14.4 ariba/2.14.4
- docker build -t ivar:1.2.2_artic20200528 ivar/1.2.2_artic20200528
- docker build -t metaphlan:3.0.3 metaphlan/3.0.3
script:
- bash tests/mash-test.sh
- bash tests/mashtree-test.sh
Expand All @@ -23,3 +24,4 @@ script:
- bash tests/abricate.sh
- bash tests/ariba.sh
- bash tests/ivar.sh
- bash tests/metaphlan.sh
1 change: 1 addition & 0 deletions Program_Licenses.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,7 @@ The licenses of the open-source software that is contained in these Docker image
| Mash | non-standard license (see link) | https://github.com/marbl/Mash/blob/master/LICENSE.txt |
| mashtree | GNU GPLv3 | https://github.com/lskatz/mashtree/blob/master/LICENSE |
| Medaka | Mozilla Public License 2.0 | https://github.com/nanoporetech/medaka/blob/master/LICENSE.md |
| Metaphlan | MIT | https://github.com/biobakery/MetaPhlAn/blob/3.0/license.txt |
| minimap2 | MIT | https://github.com/lh3/minimap2/blob/master/LICENSE.txt |
| mlst | GNU GPLv2 | https://github.com/tseemann/mlst/blob/master/LICENSE |
| Mugsy | Artistic License 2.0 | Archived in: <br/> https://sourceforge.net/projects/mugsy/files/mugsy_x86-64-v1r2.3.tgz |
Expand Down
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,7 @@ For many people Docker is not an option, but Singularity is. Most Docker contain
| [Mash](https://hub.docker.com/r/staphb/mash/) <br/> [![docker pulls](https://img.shields.io/docker/pulls/staphb/mash.svg?style=popout)](https://hub.docker.com/r/staphb/mash) | <ul><li>2.1</li><li>2.2</li></ul> | https://github.com/marbl/Mash |
| [mashtree](https://hub.docker.com/r/staphb/mashtree) <br/> [![docker pulls](https://img.shields.io/docker/pulls/staphb/mashtree.svg?style=popout)](https://hub.docker.com/r/staphb/mashtree) | <ul><li>0.52.0</li><li>0.57.0</li><li>1.0.4</li><li>1.2.0</li></ul> | https://github.com/lskatz/mashtree |
| [medaka](https://hub.docker.com/r/staphb/medaka) <br/> [![docker pulls](https://img.shields.io/docker/pulls/staphb/medaka.svg?style=popout)](https://hub.docker.com/r/staphb/medaka) | <ul><li>0.8.1</li><li>1.0.1</li></ul> | https://github.com/nanoporetech/medaka |
| [metaphlan](https://hub.docker.com/r/staphb/metaphlan) <br/> [![docker pulls](https://img.shields.io/docker/pulls/staphb/metaphlan.svg?style=popout)](https://hub.docker.com/r/staphb/metaphlan) | <ul><li>3.0.3-no-db (no database)</li><li> 3.0.3 (~3GB db) | https://github.com/biobakery/MetaPhlAn/tree/3.0 |
| [minimap2](https://hub.docker.com/r/staphb/minimap2) <br/> [![docker pulls](https://img.shields.io/docker/pulls/staphb/minimap2.svg?style=popout)](https://hub.docker.com/r/staphb/minimap2) | <ul><li>2.17</li></ul> | https://github.com/lh3/minimap2 |
| [mlst](https://hub.docker.com/r/staphb/mlst) <br/> [![docker pulls](https://img.shields.io/docker/pulls/staphb/mlst.svg?style=popout)](https://hub.docker.com/r/staphb/mlst) | <ul><li>2.16.2</li><li>2.17.6</li><li>2.19.0</li></ul> | https://github.com/tseemann/mlst |
| [Mugsy](https://hub.docker.com/r/staphb/mugsy) <br/> [![docker pulls](https://img.shields.io/docker/pulls/staphb/mugsy.svg?style=popout)](https://hub.docker.com/r/staphb/mugsy) | <ul><li>1r2.3</li></ul> | http://mugsy.sourceforge.net/ |
Expand Down
87 changes: 87 additions & 0 deletions metaphlan/3.0.3-no-db/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,87 @@
FROM ubuntu:bionic AS builder_metaphlan

# multstage build
# labels associated with final docker image are further down
# label the intermediate image so we can delete later
LABEL stage=builder_metaphlan_nodb

# install python (>3.6) and other dependencies
# R necessary if user wants to run unifrac function in metaphlan

RUN apt-get update && apt-get install -y software-properties-common && \
add-apt-repository ppa:deadsnakes/ppa && \
apt-get update && apt-get install -y --no-install-recommends --no-install-suggests \
gcc \
wget \
python3.7 \
python3.7-dev \
python3-distutils \
python3-setuptools \
python3-pip \
unzip \
r-base=3.4.4-1ubuntu1 && \
python3.7 -m pip install pip --force-reinstall && \
python3.7 -m pip install numpy Cython six --force-reinstall && \
ln -s /usr/bin/python3.7 /usr/bin/python

# bowtie 2 dependency
RUN mkdir /usr/bin/bowtie2 && \
cd /usr/bin/bowtie2 && \
wget https://sourceforge.net/projects/bowtie-bio/files/bowtie2/2.3.3.1/bowtie2-2.3.3.1-linux-x86_64.zip/download && \
unzip download && \
rm download

ENV PATH="$PATH:/usr/bin/bowtie2/bowtie2-2.3.3.1-linux-x86_64" \
LC_ALL=C

# install metaphlan 3
RUN python3.7 -m pip install metaphlan==3.0.3 && \
mkdir /data

# don't install metaphlan database, user will do. see README.md
# RUN metaphlan --install

# build onto first stage
# this will make the final docker image ~0.5 GB smaller
# after the build completes, recommend deleting the intermediate image generated from builder stage
# can do this with the flag label=stage=filter:
# docker image ls --filter "label=stage=builder_metaphlan_nodb"
# docker image prune --filter "label=stage=builder_metaphlan_nodb"

FROM ubuntu:bionic

# labels for final image
LABEL base.image="ubuntu:bionic"
LABEL dockerfile.version="1"
LABEL software="MetaPhlAn3.0"
LABEL software.version="3.0.3-no-db"
LABEL description="microbial composition of metagenomes"
LABEL website="https://github.com/biobakery/MetaPhlAn/tree/3.0"
LABEL maintainer="Tara Gallagher"
LABEL maintainer.email="tgallagher@utah.gov"

# copy over necessary bin and packages

COPY --from=builder_metaphlan /usr/local/bin /usr/local/bin
COPY --from=builder_metaphlan /usr/bin/bowtie2/ /usr/bin/bowtie2/
COPY --from=builder_metaphlan /usr/bin/python3.7 /usr/bin/python3.7
COPY --from=builder_metaphlan /usr/lib/python3.7 /usr/lib/python3.7
COPY --from=builder_metaphlan /usr/lib/x86_64-linux-gnu/libexpat.so /usr/lib/x86_64-linux-gnu/libexpat.so
COPY --from=builder_metaphlan /lib/x86_64-linux-gnu/ /lib/x86_64-linux-gnu/
COPY --from=builder_metaphlan /usr/local/lib/python3.7/dist-packages/ /usr/local/lib/python3.7/dist-packages/
COPY --from=builder_metaphlan /usr/lib/python3/dist-packages/* /usr/lib/python3.7/dist-packages/
COPY --from=builder_metaphlan /usr/bin/R /usr/bin/R
COPY --from=builder_metaphlan /usr/bin/Rscript /usr/bin/Rscript
COPY --from=builder_metaphlan /usr/lib/R /usr/lib/R
COPY --from=builder_metaphlan /usr/local/lib/R /usr/local/lib/R
COPY --from=builder_metaphlan /etc/R /etc/R
COPY --from=builder_metaphlan /usr/lib/libR.so /usr/lib/libR.so

ENV PATH="$PATH:/usr/bin/bowtie2/bowtie2-2.3.3.1-linux-x86_64" \
LC_ALL=C

WORKDIR /data

# make dir for metaphlan database
RUN mkdir /usr/local/lib/python3.7/dist-packages/metaphlan/metaphlan_databases && \
ln -s /usr/bin/python3.7 /usr/bin/python
178 changes: 178 additions & 0 deletions metaphlan/3.0.3-no-db/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,178 @@
# MetaPhlAn3 docker image (without database already built in image)

Main tool: [MetaPhlAn/3.0](https://github.com/biobakery/MetaPhlAn/tree/3.0)

This docker image contains the metaphlan3 program and its dependencies. It does not contain the metaphlan3 database, and the user will have to download the database to their machine, index the database, and mount to the Docker or Singularity container (see below).

# Example: Downloading the database
```bash
# have to download database on your machine to run this image
# need to download the following files from metaphlan's dropbox or googledrive (see https://github.com/biobakery/MetaPhlAn/wiki/MetaPhlAn-3.0#installation). recommend putting them in their own directory, i.e. named "metaphlan_database"
# note, I ran into problems when downloading db from googledrive - the md5 didn't match, which resulted in empty bowtie2 build indices. dropbox downloads worked fine.
# file_list.txt
# mpa_latest
# mpa_v30_CHOCOPhlAn_201901.tar (or whichever version of database you prefer)
# mpa_v30_CHOCOPhlAn_201901_marker_info.txt.bz2
# mpa_v30_CHOCOPhlAn_201901.md5


# extract database
$ tar -xvf mpa_v30_CHOCOPhlAn_201901.tar
$ bunzip2 mpa_v30_CHOCOPhlAn_201901.fna.bz2
$ ls
file_list.txt
mpa_latest
mpa_v30_CHOCOPhlAn_201901.fna
mpa_v30_CHOCOPhlAn_201901_marker_info.txt.bz2
mpa_v30_CHOCOPhlAn_201901.md5
mpa_v30_CHOCOPhlAn_201901.pkl
mpa_v30_CHOCOPhlAn_201901.tar
````
After downloading the database, need to index the files using bowtie2 in the Docker or Singularity image (see examples below). As long as you keep the indexed database stored in your local machine, the bowtie2-build step only needs to be performed once. Then, can run metaphlan in Docker or Singularity (see examples below).

## Example Usage: Docker

### Download Docker image:
```bash
$ docker pull staphb/metaphlan:3.0.3-no-db
```
### Indexing database & running metaphlan in Docker:
```bash
# next, use bowtie2 to build and index the database. can use the metaphlan docker for this!
# note this will take ~15 minutes
# change directory to directory with database files
$ cd ./metaphlan_database/
$ docker run -v $PWD:/usr/local/lib/python3.7/dist-packages/metaphlan/metaphlan_databases/ -u $(id -u):$(id -g) \
--rm=True \
staphb/metaphlan:3.0.3-no-db \
bowtie2-build /usr/local/lib/python3.7/dist-packages/metaphlan/metaphlan_databases/mpa_v30_CHOCOPhlAn_201901.fna /usr/local/lib/python3.7/dist-packages/metaphlan/metaphlan_databases/mpa_v30_CHOCOPhlAn_201901
# after bowtie2-build completes, should have indexed files in your metaphlan_databases directory
$ ls
file_list.txt
mpa_latest
mpa_v30_CHOCOPhlAn_201901.1.bt2
mpa_v30_CHOCOPhlAn_201901.2.bt2
mpa_v30_CHOCOPhlAn_201901.3.bt2
mpa_v30_CHOCOPhlAn_201901.4.bt2
mpa_v30_CHOCOPhlAn_201901.fna
mpa_v30_CHOCOPhlAn_201901_marker_info.txt.bz2
mpa_v30_CHOCOPhlAn_201901.md5
mpa_v30_CHOCOPhlAn_201901.pkl
mpa_v30_CHOCOPhlAn_201901.rev.1.bt2
mpa_v30_CHOCOPhlAn_201901.rev.2.bt2
mpa_v30_CHOCOPhlAn_201901.tar

# to run metaphlan
# for this example, stool metagenomes downloaded from SRA
# if you have SRA toolkit, can do:
$ fastq-dump --outdir ./data --skip-technical --readids --split-files --clip SRX2474191
$ ls
SRX2474191_1.fastq

# in this example, 2 directories to mount: /data contains the .fastq and /metaphlan_database contains the indexed database
# check to make sure these 2 directories are in current working directory
$ ls
data metaphlan_database

# run docker:
docker run -v $PWD/metaphlan_database:/usr/local/lib/python3.7/dist-packages/metaphlan/metaphlan_databases/ \
-v $PWD/data:/data \
-u $(id -u):$(id -g) \
--rm=True \
staphb/metaphlan:3.0.3-no-db metaphlan /data/SRX2474191_1.fastq --input_type fastq -o profiled_metagenome.txt


# OUTPUT:
WARNING: The metagenome profile contains clades that represent multiple species merged into a single representant.
An additional column listing the merged species is added to the MetaPhlAn output.
$ head data/profiled_metagenome.txt
#mpa_v30_CHOCOPhlAn_201901
#/usr/local/bin/metaphlan /data/SRX2474191_1.fastq --input_type fastq -o profiled_metagenome.txt
#SampleID Metaphlan_Analysis
#clade_name NCBI_tax_id relative_abundance additional_species
k__Bacteria 2 100.0
k__Bacteria|p__Bacteroidetes 2|976 52.90346
k__Bacteria|p__Firmicutes 2|1239 45.08223
k__Bacteria|p__Actinobacteria 2|201174 2.01431
k__Bacteria|p__Bacteroidetes|c__Bacteroidia 2|976|200643 52.90346
k__Bacteria|p__Firmicutes|c__Clostridia 2|1239|186801 45.08223

```

## Example usage: Singularity

### Build Singularity image

```bash
# build singularity image
$ singularity build ~/metaphlan-no-db-3.0.3.simg docker://staphb/metaphlan:3.0.3-no-db
# last couple lines of OUTPUT:
INFO: Creating SIF file...
INFO: Build complete: /home/tgallagher/metaphlan-no-db-3.0.3.simg
```

### Index database and run metaphlan in Singularity

```bash
# next, use bowtie2 to build and index the database. can use the metaphlan singularity for this!
# note this will take ~15 minutes
# change directory to directory with database files
$ cd ./metaphlan_database/
$ singularity exec --no-home -B $PWD:/usr/local/lib/python3.7/dist-packages/metaphlan/metaphlan_databases/ \
/home/tgallagher/metaphlan-no-db-3.0.3.simg \
bowtie2-build /usr/local/lib/python3.7/dist-packages/metaphlan/metaphlan_databases/mpa_v30_CHOCOPhlAn_201901.fna \
/usr/local/lib/python3.7/dist-packages/metaphlan/metaphlan_databases/mpa_v30_CHOCOPhlAn_201901

$ ls -l | awk '{print $5,"\t", $9'} #look at database index files and sizes
3526 file_list.txt
26 mpa_latest
629609227 mpa_v30_CHOCOPhlAn_201901.1.bt2
299330364 mpa_v30_CHOCOPhlAn_201901.2.bt2
10314872 mpa_v30_CHOCOPhlAn_201901.3.bt2
299330358 mpa_v30_CHOCOPhlAn_201901.4.bt2
1427036330 mpa_v30_CHOCOPhlAn_201901.fna
16175165 mpa_v30_CHOCOPhlAn_201901_marker_info.txt.bz2
64 mpa_v30_CHOCOPhlAn_201901.md5
25998762 mpa_v30_CHOCOPhlAn_201901.pkl
629609227 mpa_v30_CHOCOPhlAn_201901.rev.1.bt2
299330364 mpa_v30_CHOCOPhlAn_201901.rev.2.bt2
384430080 mpa_v30_CHOCOPhlAn_201901.tar

# to run metaphlan
# for this example, stool metagenomes downloaded from SRA
# if you have SRA toolkit, can do:
$ fastq-dump --outdir ./data --skip-technical --readids --split-files --clip SRX2474191
$ ls
SRX2474191_1.fastq

# in this example, 2 directories to mount: /data contains the .fastq and /metaphlan_database contains the indexed database
# check to make sure these 2 directories are in current working directory
$ ls
data metaphlan_database

# run singularity:

$ singularity exec --no-home \
-B $PWD/metaphlan_database:/usr/local/lib/python3.7/dist-packages/metaphlan/metaphlan_databases/ \
-B $PWD/data:/data \
/home/tgallagher/metaphlan-no-db-3.0.3.simg \
metaphlan /data/SRX2474191_1.fastq \
--input_type fastq -o /data/profiled_metagenome.txt

# OUTPUT:
WARNING: The metagenome profile contains clades that represent multiple species merged into a single representant.
An additional column listing the merged species is added to the MetaPhlAn output.
$ head data/profiled_metagenome.txt
#mpa_v30_CHOCOPhlAn_201901
#/usr/local/bin/metaphlan /data/SRX2474191_1.fastq --input_type fastq -o /data/profiled_metagenome.txt
#SampleID Metaphlan_Analysis
#clade_name NCBI_tax_id relative_abundance additional_species
k__Bacteria 2 100.0
k__Bacteria|p__Bacteroidetes 2|976 52.90346
k__Bacteria|p__Firmicutes 2|1239 45.08223
k__Bacteria|p__Actinobacteria 2|201174 2.01431
k__Bacteria|p__Bacteroidetes|c__Bacteroidia 2|976|200643 52.90346
k__Bacteria|p__Firmicutes|c__Clostridia 2|1239|186801 45.08223

```

86 changes: 86 additions & 0 deletions metaphlan/3.0.3/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,86 @@
FROM ubuntu:bionic AS builder_metaphlan

# multstage build
# labels associated with final docker image are further down
# label the intermediate image so we can delete later
LABEL stage=builder_metaphlan

# install python (>3.6) and other dependencies
# R necessary if user wants to run unifrac function in metaphlan

RUN apt-get update && apt-get install -y software-properties-common && \
add-apt-repository ppa:deadsnakes/ppa && \
apt-get update && apt-get install -y --no-install-recommends --no-install-suggests \
gcc \
wget \
python3.7 \
python3.7-dev \
python3-distutils \
python3-setuptools \
python3-pip \
unzip \
r-base=3.4.4-1ubuntu1 && \
python3.7 -m pip install pip --force-reinstall && \
python3.7 -m pip install numpy Cython six --force-reinstall && \
ln -s /usr/bin/python3.7 /usr/bin/python

# bowtie 2 dependency
RUN mkdir /usr/bin/bowtie2 && \
cd /usr/bin/bowtie2 && \
wget https://sourceforge.net/projects/bowtie-bio/files/bowtie2/2.3.3.1/bowtie2-2.3.3.1-linux-x86_64.zip/download && \
unzip download && \
rm download

ENV PATH="$PATH:/usr/bin/bowtie2/bowtie2-2.3.3.1-linux-x86_64" \
LC_ALL=C

# install metaphlan 3
RUN python3.7 -m pip install metaphlan==3.0.3

# download metaphlan database
RUN metaphlan --install && \
mkdir /data

# build onto first stage
# this will make the final docker image ~0.5 GB smaller
# after the build completes, recommend deleting the intermediate image generated from builder stage
# can do this with the flag label=stage=filter:
# docker image ls --filter "label=stage=builder_metaphlan"
# docker image prune --filter "label=stage=builder_metaphlan"

FROM ubuntu:bionic

# labels for final image
LABEL base.image="ubuntu:bionic"
LABEL dockerfile.version="1"
LABEL software="MetaPhlAn3.0"
LABEL software.version="3.0.3"
LABEL description="microbial composition of metagenomes"
LABEL website="https://github.com/biobakery/MetaPhlAn/tree/3.0"
LABEL maintainer="Tara Gallagher"
LABEL maintainer.email="tgallagher@utah.gov"

# copy over necessary bin and packages

COPY --from=builder_metaphlan /usr/local/bin /usr/local/bin
COPY --from=builder_metaphlan /usr/bin/bowtie2/ /usr/bin/bowtie2/
COPY --from=builder_metaphlan /usr/bin/python3.7 /usr/bin/python3.7
COPY --from=builder_metaphlan /usr/lib/python3.7 /usr/lib/python3.7
COPY --from=builder_metaphlan /usr/lib/x86_64-linux-gnu/libexpat.so /usr/lib/x86_64-linux-gnu/libexpat.so
COPY --from=builder_metaphlan /lib/x86_64-linux-gnu/ /lib/x86_64-linux-gnu/
COPY --from=builder_metaphlan /usr/local/lib/python3.7/dist-packages/ /usr/local/lib/python3.7/dist-packages/
COPY --from=builder_metaphlan /usr/lib/python3/dist-packages/* /usr/lib/python3.7/dist-packages/
COPY --from=builder_metaphlan /usr/bin/R /usr/bin/R
COPY --from=builder_metaphlan /usr/bin/Rscript /usr/bin/Rscript
COPY --from=builder_metaphlan /usr/lib/R /usr/lib/R
COPY --from=builder_metaphlan /usr/local/lib/R /usr/local/lib/R
COPY --from=builder_metaphlan /etc/R /etc/R
COPY --from=builder_metaphlan /usr/lib/libR.so /usr/lib/libR.so

ENV PATH="$PATH:/usr/bin/bowtie2/bowtie2-2.3.3.1-linux-x86_64" \
LC_ALL=C

WORKDIR /data

# link to python, soft link doesn't get copied from intermed stage
RUN ln -s /usr/bin/python3.7 /usr/bin/python
Loading