Skip to content

A pipeline for the analysis of sedimentary ancient mtDNA

License

Notifications You must be signed in to change notification settings

mpieva/quicksand

Repository files navigation

MIT License DOI

quicksand

quick analysis of sedimentary ancient DNA

See the documentation for a comprehensive documentation of the pipeline.

Description

quicksand is a bioinformatic pipeline for the analysis and taxonomic binning of (target enriched) ancient, mitochondrial, sedimentary DNA. quicksand uses krakenuniq for metagenomic classification, BWA for the mapping of DNA sequences and analyses mapped sequences for DNA deamination patterns.

Optimized for speed and portablity, quicksand is written in Nextflow and requires either Singularity or Docker.

Workflow

Graphical representation of the pipeline workflow

Quickstart

Requirements

To run the pipeline, please install

Input

The pipeline accepts demultiplexed, adapter-trimmed and overlap-merged bam and fastq files. Put all files in one directory, name the files DIR/{READGROUP}.{bam, fastq}. Provide the directory with the --split flag

Download Datastructure

To run quicksand a kraken database for metagenomics classification, the reference genomes for mapping and a set of bed-files are required for the run of the pipeline.

For the most recent RefSeq releases please download the quicksand-datastructure here:

latest=$(curl http://ftp.eva.mpg.de/quicksand/LATEST)
wget -r -np -nc -nH --cut-dirs=3 --reject="*index.html*" -q --show-progress -P refseq http://ftp.eva.mpg.de/quicksand/build/$latest

This step takes a while! Make yourself a coffee and relax

For a custom creation of the datastructure see the quicksand-build pipeline

Download Test-data

To run quicksand with real data, download the Hohlenstein-Stadel mtDNA (please see the README for more information) as input

wget -P split \
http://ftp.eva.mpg.de/neandertal/Hohlenstein-Stadel/BAM/mtDNA/HST.raw_data.ALL.bam

Run quicksand

quicksand is executed directly from github, no local build is required. With the databases and the testdata downloaded, run the pipeline.

nextflow run mpieva/quicksand -r v2.1 \
  --db        refseq/kraken/Mito_db_kmer22/ \
  --genomes   refseq/genomes/ \
  --bedfiles  refseq/masked/ \
  --split     split/ \
  -profile    singularity

Output

Please see the documentation for a comprehensive description of the output!

References

This pipeline uses code inspired by the nf-core initative, reused here under the MIT license.

The nf-core framework for community-curated bioinformatics pipelines.

Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen.

Nat Biotechnol. 2020 Feb 13. doi: 10.1038/s41587-020-0439-x.