-
Notifications
You must be signed in to change notification settings - Fork 0
Home
This project mostly consists in a Dockerfile to be used to create a single Docker image called "gatb_tools_machine":
- based on a Debian 8 Linux system
- containing GATB-Tools; list of available tools is here
As stated in the README.md file of this project:
# prepare a directory
cd <some-working-directory>
# clone this project locally
git clone https://github.com/GATB/gatb-tools-machine.git
# enter the project
cd gatb-tools-machine
# build the images
# (do not forget ending '.': it is part of the docker build command)
docker build -f Dockerfile -t gatb_tools_machine .
This project contains a 'data' directory that can be used to illustrate how to execute a GATB-Tool inside the Docker container named 'gatb_tools_machine' with real data located outside the container.
First of all, create a 'tmp' (i.e. working) directory:
cd <gatb-tools-machine Github home project>
mkdir tmp
Now, within "gatb-tools-machine Github home project", we have:
./tmp: a working directory
./data: test data for each tool; in turn, we have a sub-folder for each tool to test
Here are commands to execute to run a test of:
Simka:
docker run --rm -i -t -v $PWD/tmp/:/tmp -v $PWD/data/simka/:/simka gatb_tools_machine -c simka -- -in /simka/simka_input.txt -out /tmp/simka_results/ -out-tmp /tmp/simka_temp_output
You'll see results in '$PWD/tmp' when the command has finished to execute.
This command explained:
docker run [1]
--rm [2]
-i -t [3]
-v $PWD:/tmp [4]
-v $PWD/data/simka/:/simka [4']
gatb_tools_machine [5]
-c simka [6]
-- [7]
-in /simka/example/simka_input.txt [8]
-out /tmp/simka_results/ [9]
-out-tmp /tmp/simka_temp_output [10]
[1]-[5]: Docker arguments
[6]-[7]: simka container's invoker program
[8]-[10]: 'bin/simka' arguments
[1]: start Docker container
[2]: destroy container when Docker finishes
(it does NOT delete the 'gatb_tools_machine' image)
[3]: start an interactive job
(for instance, you'll see messages on stdout, if any)
[4]: mount a volume. This is required to get the results from Simka.
Here, we say that '$PWD/tmp' ('tmp' subdirectory located within
current local directory will be viewed as '/tmp' from the inside
of the container. Then, we say that $PWD/data/simka/ directory
will be viewed as '/simka' from the inside of the container. In such
a way, we have an easy way to provide OUR data (located within
$PWD/data/simka/) to the program 'simka' located within the
Docker container. In turn, 'simka' will produce results in '/tmp',
i.e. in '$PWD/tmp', actually.
[5]: tell Docker which image to start: the 'gatb_tools_machine' of course.
[6]: ask to start the simka program. See companion file 'run-tool.sh' for
more information.
[7]: '--' is required to separate arguments [6] from the rest of the
command line
[8]: the data file to process with simka. Here we use a data file
provided with the simka software to test it.
[9]: tells simka where to put results. Of course, simka will write
within /tmp directory inside the container. However, since we
have directive [4], data writing is actually done in $PWD/tmp,
i.e. a local directory.
[10]: tells simka where to put temporary files.
Now that you see how you can start 'simka' GATB-Tool, you will be capable of using all other tools.
Here are a list of basic command to use to test all provided GATB-Tools with sample data.
Simka visualization:
docker run --rm -i -t -v $PWD/tmp/:/tmp gatb_tools_machine -c simka-visu -- -in /tmp/simka_results/ -out /tmp/simka_results/ -pca -heatmap -tree
Bloocoo:
docker run --rm -i -t -v $PWD/tmp/:/tmp -v $PWD/data/bloocoo/:/bloocoo gatb_tools_machine -c bloocoo -- -file /bloocoo/errclose.fasta -out /tmp/errclose_bloocoo_corr_errs.tab -kmer-size 31 -abundance-min 5 -err-tab
MetaBloocoo:
cd $PWD/data/bloocoo
curl -O http://downloads.hmpdacc.org/data/Illumina/anterior_nares/SRS018585.tar.bz2
tar -xjf SRS018585.tar.bz2
cd ../..
docker run --rm -i -t -v $PWD/tmp/:/tmp -v /Users/pdurand/tmp/nosave/gatb-tools-machine/data/bloocoo/:/bloocoo gatb_tools_machine -c metabloocoo -- count -file /bloocoo/SRS018585/SRS018585.denovo_duplicates_marked.trimmed.1.fastq -out /tmp/SRS018585
DSK:
docker run --rm -i -t -v $PWD/tmp/:/tmp -v $PWD/data/dsk/:/dsk gatb_tools_machine -c dsk -- -file /dsk/read50x_ref10K_e001.fasta.gz -kmer-size 27 -out /tmp/dsk27 -max-memory 200 -verbose 0
docker run --rm -i -t -v $PWD/tmp/:/tmp gatb_tools_machine -c h5dump -- -y -d histogram/histogram /tmp/dsk27.h5 | grep "^\ *[0-9]" | tr -d " " | tr -d "," | paste - - > $PWD/tmp/dsk27.histo
MindTheGap:
docker run --rm -i -t -v $PWD/tmp/:/tmp -v $PWD/data/MindTheGap/:/mdg gatb_tools_machine -c MindTheGap -- find -in /mdg/master.fasta -ref /mdg/deleted.fasta -kmer-size 31 -out /tmp/mdg_find -insert-only
Short Read Connector:
docker run --rm -i -t -v $PWD/tmp/:/tmp -v $PWD/data/ShortReadConnector/:/src gatb_tools_machine -c rconnector -- -b /src/c1.fasta.gz -q /src/fof.txt -p src_linker
DiscoSNP++:
docker run --rm -i -t -v $PWD/tmp/:/tmp -v $PWD/data/DiscoSnp/:/disco gatb_tools_machine -c discosnp -- -r /disco/fof.txt -T
TakeABreak:
docker run --rm -i -t -v $PWD/tmp/:/tmp -v $PWD/data/TakeABreak/:/tab gatb_tools_machine -c takeabreak -- -in /tab/test4.fasta.gz -out /tmp/test4.takeabreak