MetaAll is a collection of different methods, combined into three-step workflow, that enable integrated metagenomic analysis of Illumina short PE and Oxford Nanopore Technologies (ONT) long reads. Three methods are combined for pathogen detection: taxonomic classification of reads, taxonomic classification of contigs and mapping to reference genomes. For more complex metagenomic workflows suitable for execution on HPC, see PatDetect.

Installation & Dependencies

To obtain the scripts, download repository using git clone or wget and additionally install:

NOTE: Build Singularity images from definition files (.def and .sif file must have same name) in singularity_images folder.

Obtain the required databases

Download required databases:

NOTE: Make sure you have enough disk space.

Example of use

Before every run double check workflow parameters and path to samples and databases. Once set, simply run bash

Short PE reads/contigs classification

In terminal, navigate to the short_reads_classification/ folder, which contains config.yml, and Snakefile. Workflow performs quality check, trimming, host removal, assembly, read/contig classification and visualization preparation of results. Before run, set the parameters in config.yml file and script. Check PE reads name (must end with "_R1.fastq.gz" and "_R2.fastq.gz"). Host reference genome must be indexed (use bowtie2-build command).

Suggestion: Before run use "-n" flag in shell scripts, to perform dry-run.

Short PE reads reference genome alignment

From short_reads_mapping/ folder, simply copy and scripts, next to folder containing short PE reads. Name of the folder containing sequence data, must be data. There also has to be a reference sequence of the target pathogen present e.g. enterovirus_refseq.fasta. The script takes target virus as pos arg 1 (this arg is linked to refseq name, excluding ".fasta" extension) and thread number as pos arg 2. For example: bash enterovirus_refseq 32. Before run, set the parameters in and scripts.

Long reads/contigs classification

In terminal, navigate to the long_reads_classification/ folder, which contains config.yml, and Snakefile. Workflow performs quality check, trimming, host removal, assembly, polishing, read/contig classification and visualization preparation of results. Before run, set the parameters in config.yml file and script.

IMPORTANT: Check input path (the defined path must end above the folder containing reads). For example: if raw reads are located in ../path_to_sequence_run/fastq_pass/barcode01, the defined path in config.yml must be:


Rename folder if you wish (e.g. rename "barcode01" to "sample01")

Suggestion: Before run use "-n" flag in shell scripts, to perform dry-run.

Long reads reference genome alignment

In terminal, navigate to the long_reads_mapping/ folder, which contains config.yml, and Snakefile. Workflow performs performs mapping on the provided reference genome and calculation of the mapping statistics. Before run, set the parameters in config.yml file and script. Check long reads extension (must end with ".fastq.gz").

Suggestion: Before run use "-n" flag in shell scripts, to perform dry-run.


For easier and faster analysis, we recommend detection by classification first, followed by mapping. If you would like to use detection by mapping only, please note that workflows where mapping to reference genomes is performed, do not undertake preprocessing steps.


List of tools used

FastQC MultiQC NanoPack BBMap Porechop_ABI bowtie2 BWA minimap2 samtools SPAdes Flye medaka seqtk KrakenUniq Krona Pavian viralVerify DIAMOND MEGAN