This is a remake of the original 16S pipeline developed with in the Genome Institute of Singapore, designed for Illumina shotgun sequencing of 16S rRNA amplicon sequences (Ong et al., 2013, PMID 23579286).
The pipeline is based on
snakemake. The main
program (S16.py
) will write a config file (conf.json
) and
snakemake file (snake.make
) in the given output directory. These are
then used to call snakemake via qsub
using the also created wrapper
script snake.sh
. The system will send an email to you upon
completion (be it successful or not).
For help see S16.py --help
.
Only upon successful completion the output directory will contain an
empty file called COMPLETE
.
Results (abundance tables and piecharts) can then be found in
results
subdirectory (see report.html
there).
- Reads are first preprocessed (merging of multiple files, trimming etc.) with famas.
- Preprocessed reads are classified into 'unspecific', 'spike-in' and '16S', based on a
BWA-MEM mapping against the original
EMIRGE/ARB SSU database containing the spike-in (see
ratios.txt
for corresponding ratios). - Only 16S sequences are used for reconstructing the full-length sequences with EMIRGE (Miller et al., 2011, PMID 21595876) or EMIRGE amplicon (Miller et al., 2013; PMID 23405248)
- Primers are clipped from the reconstructed sequences
- Clipped sequences are mapped with Graphmap against a preclustered version (99% OTU) of the Greengenes database for classification.
- Abundance-tables and piecharts are created in the
results
subdirectory of the output directory. Pairwise identity thresholds for the different taxonomi ranks are implemented as determined by Yarza et al. (2014; PMID 25118885).
If you want to first see what the pipeline would do, call S16.py
with --no-run
.
Then check the created files (see above).
To print all commands that would be run, use:
snakemake -s snake.make --configfile conf.json -n -p
To get a graphical representation of the workflow, run (from the output directory):
snakemake -s snake.make --configfile conf.json --dag --forceall | dot -Tpdf > dag.pdf
and have a look at dag.pdf
. Once you're satisfied just run bash snake.sh
.
This is tuned towards an in-house setup. If you want to replicate it
see the CONF variable in S16.py
, which lists expected binaries and
databases.