phyloFlash - A pipeline to rapidly reconstruct the SSU rRNAs and explore phylogenetic composition of an illumina (meta)genomic dataset.
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Failed to load latest commit information.

phyloFlash v3.3 beta 1

phyloFlash logo

by Harald Gruber-Vodicka, Elmar A. Pruesse, and Brandon Seah.

phyloFlash is a pipeline to rapidly reconstruct the SSU rRNAs and explore phylogenetic composition of an Illumina (meta)genomic or transcriptomic dataset. Manual

NOTE Version 3 changed some input options and also how mapping-based taxa (NTUs) are handled. Please download the last release of v2.0 (tar.gz archive) for the old implementation. No changes have been made to the database setup, so databases prepared for v2.0 can still be used for v3.0.


Download releases from the releases page. If you clone the repository directly off GitHub you might end up with a version that is still under development.

# Download phyloFlash
tar -xzf pf3.0b1.tar.gz

# Check for dependencies
cd phyloFlash-pf3.0b1
./ -check_env

# Get missing dependencies - the easiest way is to install conda/bioconda -
# First add bioconda to the conda channels and then grab what you need
conda config --add channels bioconda
conda install emirge
conda install bbmap
conda install vsearch
conda install spades
conda install mafft
conda install bedtools

# Install reference database
./ --remote

# Run with test data and 16 processors (default is to use all processors available) -lib TEST -CPUs 16 -read1 test_files/test_F.fq.gz -read2 test_files/test_R.fq.gz

# Run with interleaved reads -lib LIB -read1 reads_FR.fq.gz -interleaved

# Additionally run EMIRGE for 16S rRNA sequence reconstruction -lib LIB -emirge -read1 reads_F.fq.gz -read2 reads_R.fq.gz

# Compress output into tar.gz archive and write a log file -lib LIB -zip -log -read1 reads_F.fq.gz -read2 reads_R.fq.gz

# Run both SPAdes and EMIRGE and produce all optional outputs -lib LIB -everything -read1 reads_F.fq.gz -read2 reads_R.fq.gz

# Supply trusted contigs containing SSU rRNA sequences to screen vs reads -lib LIB -read1 reads_F.fq.gz -read2 reads_R.fq.gz -trusted contigs.fasta

# Use SortMeRNA instead of BBmap for initial mapping (slower, but more sensitive) -lib LIB -read1 reads_F.fq.gz -read2 reads_R.fq.gz -sortmerna

Use the -help option to display a brief help and the -man option to display the full help message.

Use the -sc switch for MDA datasets (single cell) or other hard to assemble read sets.

Use the -zip switch to compress output files into tar.gz archive, and -log to save run messages to a log file


phyloFlash screens metagenomic or metatranscriptomic reads for SSU rRNA sequences by mapping against the SILVA SSU Ref database.

Extracted reads are assembled and/or reconstructed into full-length sequences, and the proportion of reads assembled is estimated by re-mapping them to the full-length seqeunces.

phyloFlash reports a taxonomic summary of the reads from the initial mapping, the best database matches of the full-length sequences, and a second taxonomic summary of the unassembled reads left over.

Plain text and HTML-formatted reports are produced, reporting summary statistics from each run. The HTML report includes an interactive graphical summary.

Going further

The phyloFlash suite also includes other tools for SSU rRNA-centric metagenome analyses. Run the commands without arguments to see help messages.

  • - Automatically download read files from ENA given a read accession number, and run phyloFlash on them
  • - Compare the taxonomic composition of multiple samples from their phyloFlash results. This produces a barplot, heatmap, or distance matrix based on the NTU abundances in two or more samples.
  • - Given a metagenomic assembly graph in Fastg format, identify SSU rRNA sequences and extract contigs connected to them. Optionally compare to phyloFlash results from the same library.


For further information please refer to the Manual.

Versions and changes

  • v3.3 beta 1
    • Add support for using SortMeRNA instead of BBmap for initial mapping step
    • Changes to how mapping data are hashed; process SAM file of initial mapping to fix known bugs with bitflag and read name reporting in BBmap and SortMeRNA
  • v3.2 beta 1
    • Report ambiguous hits during mapping step, use consensus of top hits to assign taxonomy instead of single best hit
    • Add utility to compare taxonomic composition of multiple libraries from phyloFlash output
    • Add utility to extract genome bins from Fastg files
  • v3.1 beta 2
    • Fix bug in Fasta headers with changed output from Bedtools v2.26+
    • Make bbmap and overwrite existing output files of same name
    • Rearrange elements in HTML report file
  • v3.1 beta 1
    • Allow user to supply "trusted contigs" of sequence assemblies containing SSU rRNA which will also be screened against the read libraries
    • Fix bugs in tree plotting
  • v3.0 beta 1
    • Re-map extracted SSU reads onto assembled sequences to check proportion assembled
    • Revamp of HTML report output. Embed interactive graphical summary, use SVG-formatted graphics, remove dependency on R packages for report graphics.
    • Changes to how mapping-based NTUs are calculated. Now count all reads (not only unambiguously-mapped) and count segments of read pairs separately.
    • No change to heatmap script for comparing multiple samples
  • v2.0 complete rewrite


Please report any problems to the phyloFlash Google group or with the GitHub issue tracker.

(Pull requests with suggested fixes are of course also always welcome.)

We also welcome any feedback on the software and its documentation, especially suggestions for improvement!


We thank colleagues and phyloFlash users who have contributed to phyloFlash development by testing the software, reporting bugs, and suggesting new features.