Skip to content
Genscale Team edited this page Jun 26, 2017 · 18 revisions

Introduction

This project mostly consists in a Dockerfile to be used to create a single Docker image called "gatb_tools_machine":

  • based on a Debian 8 Linux system
  • containing GATB-Tools; list of available tools is here

Building the "gatb_tools_machine" Docker Image

As stated in the README.md file of this project:

# prepare a directory
cd <some-working-directory>
# clone this project locally
git clone https://github.com/GATB/gatb-tools-machine.git
# enter the project
cd gatb-tools-machine
# build the images 
# (do not forget ending '.': it is part of the docker build command)
docker build -f Dockerfile -t gatb_tools_machine .

Testing the Docker Image

This project contains a 'data' directory that can be used to illustrate how to execute a GATB-Tool inside the Docker container named 'gatb_tools_machine' with real data located outside the container.

First of all, create a 'tmp' (i.e. working) directory:

cd <gatb-tools-machine Github home project>
mkdir tmp

Now, within "gatb-tools-machine Github home project", we have:

./tmp: a working directory
./data: test data for each tool; in turn, we have a sub-folder for each tool to test

Here are commands to execute to run a test of:

Simka:

docker run --rm -i -t -v $PWD/tmp/:/tmp -v $PWD/data/simka/:/simka gatb_tools_machine -c simka -- -in /simka/simka_input.txt -out /tmp/simka_results/ -out-tmp /tmp/simka_temp_output

You'll see results in '$PWD/tmp' when the command has finished to execute.

This command explained:

docker run                                 [1]
   --rm                                    [2]
   -i -t                                   [3]
   -v $PWD:/tmp                            [4]
   -v $PWD/data/simka/:/simka              [4']
   gatb_tools_machine                      [5] 
   -c simka                                [6]
   --                                      [7]
   -in /simka/example/simka_input.txt      [8]
   -out /tmp/simka_results/                [9]
   -out-tmp /tmp/simka_temp_output         [10]

   [1]-[5]: Docker arguments
   [6]-[7]: simka container's invoker program
   [8]-[10]: 'bin/simka' arguments

   [1]: start Docker container
   [2]: destroy container when Docker finishes
        (it does NOT delete the 'gatb_tools_machine' image)
   [3]: start an interactive job 
        (for instance, you'll see messages on stdout, if any)
   [4]: mount a volume. This is required to get the results from Simka.
        Here, we say that '$PWD/tmp' ('tmp' subdirectory located within
        current local directory will be viewed as '/tmp' from the inside 
        of the container. Then, we say that $PWD/data/simka/ directory
        will be viewed as '/simka' from the inside of the container. In such
        a way, we have an easy way to provide OUR data (located within
        $PWD/data/simka/) to the program 'simka' located within the
        Docker container. In turn, 'simka' will produce results in '/tmp',
        i.e. in '$PWD/tmp', actually. 
   [5]: tell Docker which image to start: the 'gatb_tools_machine' of course.
   [6]: ask to start the simka program. See companion file 'run-tool.sh' for
        more information.
   [7]: '--' is required to separate arguments [6] from the rest of the
        command line
   [8]: the data file to process with simka. Here we use a data file
        provided with the simka software to test it.
   [9]: tells simka where to put results. Of course, simka will write 
        within /tmp directory inside the container. However, since we
        have directive [4], data writing is actually done in $PWD/tmp, 
        i.e. a local directory.
   [10]: tells simka where to put temporary files. 

Now that you see how you can start 'simka' GATB-Tool, you will be capable of using all other tools.

Here are a list of basic command to use to test all provided GATB-Tools with sample data.

Simka visualization:

docker run --rm -i -t -v $PWD/tmp/:/tmp gatb_tools_machine -c simka-visu -- -in /tmp/simka_results/ -out /tmp/simka_results/ -pca -heatmap -tree

Bloocoo:

docker run --rm -i -t -v $PWD/tmp/:/tmp -v $PWD/data/bloocoo/:/bloocoo gatb_tools_machine -c bloocoo -- -file /bloocoo/errclose.fasta -out /tmp/errclose_bloocoo_corr_errs.tab -kmer-size 31 -abundance-min 5 -err-tab

MetaBloocoo:

cd $PWD/data/bloocoo
curl -O http://downloads.hmpdacc.org/data/Illumina/anterior_nares/SRS018585.tar.bz2
tar -xjf SRS018585.tar.bz2
cd ../..
docker run --rm -i -t -v $PWD/tmp/:/tmp -v /Users/pdurand/tmp/nosave/gatb-tools-machine/data/bloocoo/:/bloocoo gatb_tools_machine -c metabloocoo -- count -file /bloocoo/SRS018585/SRS018585.denovo_duplicates_marked.trimmed.1.fastq -out /tmp/SRS018585

DSK:

docker run --rm -i -t -v $PWD/tmp/:/tmp -v $PWD/data/dsk/:/dsk gatb_tools_machine -c dsk -- -file /dsk/read50x_ref10K_e001.fasta.gz -kmer-size 27 -out /tmp/dsk27 -max-memory 200 -verbose 0
docker run --rm -i -t -v $PWD/tmp/:/tmp gatb_tools_machine -c h5dump -- -y -d histogram/histogram /tmp/dsk27.h5 | grep "^\ *[0-9]" | tr -d " " | tr -d "," | paste - - > $PWD/tmp/dsk27.histo

MindTheGap:

docker run --rm -i -t -v $PWD/tmp/:/tmp -v $PWD/data/MindTheGap/:/mdg gatb_tools_machine -c MindTheGap -- find -in /mdg/master.fasta -ref /mdg/deleted.fasta -kmer-size 31 -out /tmp/mdg_find -insert-only

Short Read Connector:

docker run --rm -i -t -v $PWD/tmp/:/tmp -v $PWD/data/ShortReadConnector/:/src gatb_tools_machine -c rconnector -- -b /src/c1.fasta.gz -q /src/fof.txt -p src_linker

DiscoSNP++:

docker run --rm -i -t -v $PWD/tmp/:/tmp -v $PWD/data/DiscoSnp/:/disco gatb_tools_machine -c discosnp -- -r /disco/fof.txt -T

TakeABreak:

docker run --rm -i -t -v $PWD/tmp/:/tmp -v $PWD/data/TakeABreak/:/tab gatb_tools_machine -c takeabreak -- -in /tab/test4.fasta.gz -out /tmp/test4.takeabreak

Clone this wiki locally