q2-ghost-tree is a plugin for creating hybrid trees within the QIIME 2 environment
ghost-tree is a bioinformatics tool that combines sequence data from two genetic marker databases into one hybrid phylogenetic tree that can be used for diversity analyses. One database is used as a "foundation tree" because it better describes genetic relationships across all phyla, and the other database (the "extensions" or tips of the tree) provides finer taxonomic resolution.
The most popular application of this method is for fungal microbiome analysis using ITS sequences which provide great species identification, but make poor quality multiple sequence aligments (MSAs) and subsequently poor phylogenetic trees.
Here is an example of results you can achieve with ghost-tree:
Fig 1. Saliva (blue) and restroom (red) ITS sequences compared using A) binary jaccard, B) unweighted UniFrac with Muscle aligned ITS sequences, and C) unweighted UniFrac with a ghost-tree produced tree.
You can use ghost-tree in two ways:
Expanded examples and directions follow.
METHOD 1: Use a pre-built ghost tree for analysis (recommended if pre-built tree is available)
METHOD 2: Install and use the q2-ghost-tree plugin
- This has more features including the ability to graft at different levels (i.e. phylum, class, order, family). Default is genus.
METHOD 1: Use a pre-built ghost tree for analysis
In addition to your own sequences or feature table with accession IDs, you will need:
a ghost tree built using the same database you use for closed-reference clustering (i.e., UNITE for ITS seqs)
a database related to your region of interest (i.e., UNITE for ITS seqs)
Import your ghost tree into QIIME 2 as a .qza file:
Find the ghost tree you need for your ITS sequences
You can use one of the newer pre-built ghost trees found here (for older trees, look here), because you really just need the "extension IDs" in the tree to match the IDs inside your feature table .qza file. Find the ghost tree which corresponds to the accession IDs in the database you are using.
You can also use a ghost tree that you built using the plugin.
Import the ghost tree you would like to use
Note, the ghost trees are already rooted by midpoint from ghost-tree, but apparently there's additional magic in QIIME 2. :) So we quickly import as "Unrooted" and then root it again.
qiime tools import \ --input-path ghost_tree.nwk \ --type Phylogeny[Unrooted] \ --output-path ghost-tree-unrooted.qza
Root your ghost tree by midpoint in QIIME 2
qiime phylogeny midpoint-root \ --i-tree ghost-tree-unrooted.qza \ --o-rooted-tree ghost-tree-midpoint-root.qza
Import the UNITE database you wish to use for clustering your sequences in QIIME 2:
You can find the UNITE ITS databases here. Select the one appropriate for your analysis and import it as a QIIME 2 type.
This will be the database you use to cluster your sequences against.
qiime tools import \ --type FeatureData[Sequence] \ --input-path sh_refs_qiime.fasta \ --output-path sh_refs_qiime.qza
Recluster your high quality amplicon sequence variants (ASVs) using the vsearch plugin
Once you have these sequences, you will need to cluster them using
qiime vsearch. This is necessary because the IDs in the ghost tree
must match your feature table for
qiime diversity phylogenetic
analyses to work. Hash values (common for ASVs) will not work here.
See directions here for closed-reference otu clustering to dereplicate sequences. Briefly, you will:
- Use the
qiime vsearch dereplicate-sequencescommand.
- Then cluster your seqs with the UNITE database you selected using
qiime vsearch cluster-features-closed-reference.
Now you have a feature table with IDs that match the pre-built ghost tree!
The tips of the tree match your feature table because you selected the corresponding databases earlier. You performed closed-reference clustering with the same database that was used to build an ITS+18S ghost tree.
If you used a pre-built ghost tree, you just need to filter your feature table to contain only IDs that match the ghost tree. There is a file provided that you can use to filter your table.
You can then use the feature table and the ghost tree for
qiime diversity analysis.
METHOD 2: plugin installation
Install the current version of QIIME 2 and activate it following the directions on the QIIME 2 website. Always use the most recent version.
Make sure you are now working inside the QIIME 2 virtual environment. The command prompt should include something like
qiime2-2018.8with the current version of QIIME 2. You want to be working from within the QIIME 2 environment when you install the rest of the code so that your new tools are organized and installed in this software environment.
Install the standalone, original ghost-tree tool from Conda:
ghost-tree is hosted on Conda's Bioconda channel (channels are designated -c). You can install it using
conda install ghost-treeor
conda install ghost-tree -c bioconda.
ghost-treeshould bring up help documentation about ghost-tree. If you do not see the help docs, something went wrong.
ghost-tree has three software dependencies it relies on. These are Sumaclust, Muscle and FastTree. If you use Conda to install ghost-tree, it should have installed these for you!
Next, you will install q2-ghost-tree plugin:
You must have Git installed.
git clone https://github.com/JTFouquier/q2-ghost-tree.git
Find the setup.py file by navigating to the appropriate directory on the command line and enter
pip install -e .
When you type
qiimeyou should now see ghost-tree as an available plugin.
You can also type
qiime ghost-treeto see the subcommands in q2-ghost-tree
--helpwill show you ghost-tree docs and subcommand docs.
Importing small test files as QIIME 2 data types:
Test that you can import some test files in the small_test_files directory
qiime tools import \ --input-path extension_seqs.fasta \ --type FeatureData[Sequence] \ --output-path extension_seqs.qza
qiime tools import \ --input-path minitaxonomy.txt \ --type FeatureData[Taxonomy] \ --input-format HeaderlessTSVTaxonomyFormat \ --output-path minitaxonomy.qza
qiime tools import \ --input-path miniotus.txt \ --type OtuMap \ --output-path miniotus.qza
qiime tools import \ --input-path foundation_seqs.fasta \ --type FeatureData[Sequence] \ --output-path foundation_seqs.qza
qiime tools import \ --input-path foundation_tree.nwk \ --type Phylogeny[Rooted] \ --output-path foundation_tree.qza
qiime tools import \ --input-path minitaxonomy_foundation.txt \ --type FeatureData[Taxonomy] \ --input-format HeaderlessTSVTaxonomyFormat \ --output-path minitaxonomy_foundation.qza
This is pre filtered, and extracted to only contain fungi.
qiime tools import \ --input-path silva_fungi_only.txt \ --type FeatureData[AlignedSequence] \ --output-path silva_fungi_only.qza
Testing subcommands in q2-ghost-tree
Group your rep seqs at 90% similarity.
This handles the abundant 'unidentified' organisms.
qiime ghost-tree extensions-cluster \ --i-extension-sequences extension_seqs.qza \ --p-similarity-threshold 0.90 \ --o-otu-map extensions_otu_map_90.qza
Create a ghost tree using a foundation .nwk tree
Note the subcommand used is for a foundation TREE.
qiime ghost-tree scaffold-hybrid-tree-foundation-tree \ --i-otu-map extensions_otu_map_90.qza \ --i-extension-taxonomy minitaxonomy.qza \ --i-extension-sequences extension_seqs.qza \ --i-foundation-tree foundation_tree.qza \ --i-foundation-taxonomy minitaxonomy_foundation.qza \ --o-ghost-tree ghost-tree-foundation-tree-90-otus.qza
Create a ghost tree using a foundation .nwk tree, and using class-level graft points instead of default genus.
qiime ghost-tree scaffold-hybrid-tree-foundation-tree \ --i-otu-map extensions_otu_map_90.qza \ --i-extension-taxonomy minitaxonomy.qza \ --i-extension-sequences extension_seqs.qza \ --i-foundation-tree foundation_tree.qza \ --i-foundation-taxonomy minitaxonomy_foundation.qza \ --o-ghost-tree ghost-tree-foundation-tree-90-otus-class-level-graft-points.qza \ --p-graft-level c
Create a ghost tree using aligned sequences instead of a tree as your foundation
Note the subcommand used is for a foundation ALIGNMENT.
qiime ghost-tree scaffold-hybrid-tree-foundation-alignment \ --i-otu-map extensions_otu_map_90.qza \ --i-extension-taxonomy minitaxonomy.qza \ --i-extension-sequences extension_seqs.qza \ --i-foundation-alignment silva_fungi_only.qza \ --o-ghost-tree ghost-tree-foundation-allignment-90-otus.qza
Working with full size files
This walkthrough is specific to UNITE ITS and SILVA 18S trees for fungal ITS analysis.
These are large and steps may take a few minutes to a few hours.
SILVA (or the foundation)
Using the most recent release of SILVA
Import a SILVA DB and transform from RNA to DNA upon import
qiime tools import \ --input-path SILVA_132_SSURef_Nr99_tax_silva_full_align_trunc.fasta \ --type FeatureData[AlignedSequence] \ --input-format AlignedRNAFASTAFormat \ --output-path SILVA_132_SSURef_Nr99_tax_silva_full_align_trunc.qza
Silva Taxonomy File
qiime tools import \ --input-path tax_slv_ssu_132.txt \ --type SilvaTaxonomy \ --output-path tax_slv_ssu_132.qza \ --input-format SilvaTaxonomyFormat
Silva Accession ID Map
qiime tools import \ --input-path tax_slv_ssu_132.acc_taxid \ --type SilvaAccession \ --output-path tax_slv_ssu_132.acc_taxid.qza \ --input-format SilvaAccessionFormat
Extract fungi only from Silva
qiime ghost-tree extract-fungi \ --i-aligned-silva-file SILVA_132_SSURef_Nr99_tax_silva_full_align_trunc.qza \ --i-accession-file tax_slv_ssu_132.acc_taxid.qza \ --i-taxonomy-file tax_slv_ssu_132.qza \ --o-aligned-seqs silva_fungi_only_full_aligned_132.qza
Filter alignment positions
qiime ghost-tree filter-alignment-positions \ --i-aligned-sequences-file silva_fungi_only_full_aligned_132.qza \ --p-maximum-gap-frequency 0.9 \ --p-maximum-position-entropy 0.8 \ --o-aligned-seqs silva_fungi_only_full_aligned_132_FILTERED.qza
Import a UNITE ITS sequence database:
qiime tools import \ --input-path sh_refs_qiime_ver7_dynamic_01.12.2017.fasta \ --type FeatureData[Sequence] \ --output-path sh_refs_qiime_ver7_dynamic_01.12.2017.qza
Making a ghost tree
With the small and large file examples above, you should now have
examples of most of the commands in ghost-tree. This
combined with reading the
--help docs in ghost-tree should give you
enough information to make your own ghost tree.
After you get a ghost tree, see METHOD 1 for using ghost trees in your analysis.
Please don't hesitate to post on the QIIME 2 forum with questions.
Examples of misc. imports
Import a seqs.fna file:
qiime tools import \ --type FeatureData[Sequence] \ --input-path seqs.fna \ --output-path seqs.qza
Import a Biom table:
qiime tools import \ --input-path otu_table.biom \ --type FeatureTable[Frequency] \ --output-path feature_table.qza \ --input-format BIOMV100Format