ghost-tree is a bioinformatics tool that combines sequence data from two genetic marker databases into one hybrid phylogenetic tree that can be used for diversity analyses. One database is used as a "foundation tree" because it better describes genetic relationships across all phyla, and the other database (the "extensions" or tips of the tree) provides finer taxonomic resolution.
The most popular application of this method is for fungal microbiome analysis using ITS sequences which provide great species identification, but make poor quality multiple sequence aligments (MSAs) and subsequently poor phylogenetic trees.
Here is an example of results you can achieve with ghost-tree:
Fig 1. Saliva (blue) and restroom (red) ITS sequences compared using A) binary jaccard, B) unweighted UniFrac with Muscle aligned ITS sequences, and C) unweighted UniFrac with a ghost-tree produced tree.
This is a QIIME 2 plugin. The original ghost-tree repository can be found here, and the paper can be found here.
Expanded examples and directions follow.
- This has more features including the ability to graft at different levels (i.e. phylum, class, order, family). Default is genus.
-
a ghost tree built using the same database you use for closed-reference clustering (i.e., UNITE for ITS seqs)
-
a database related to your region of interest (i.e., UNITE for ITS seqs)
You can use one of the newer pre-built ghost trees found here (for older trees, look here), because you really just need the "extension IDs" in the tree to match the IDs inside your feature table .qza file. Find the ghost tree which corresponds to the accession IDs in the database you are using.
You can also use a ghost tree that you built using the plugin.
Note, the ghost trees are already rooted by midpoint from ghost-tree, but apparently there's additional magic in QIIME 2. :) So we quickly import as "Unrooted" and then root it again.
qiime tools import \
--input-path ghost_tree.nwk \
--type Phylogeny[Unrooted] \
--output-path ghost-tree-unrooted.qza
qiime phylogeny midpoint-root \
--i-tree ghost-tree-unrooted.qza \
--o-rooted-tree ghost-tree-midpoint-root.qza
You can find the UNITE ITS databases here. Select the one appropriate for your analysis and import it as a QIIME 2 type.
This will be the database you use to cluster your sequences against.
qiime tools import \
--type FeatureData[Sequence] \
--input-path sh_refs_qiime.fasta \
--output-path sh_refs_qiime.qza
ASVs (your high quality representative sequences) of type
FeatureData[Sequence]
come from
Dada2 or
Deblur.
Once you have these sequences, you will need to cluster them using
qiime vsearch
. This is necessary because the IDs in the ghost tree
must match your feature table for qiime diversity
phylogenetic
analyses to work. Hash values (common for ASVs) will not work here.
See directions here for closed-reference otu clustering to dereplicate sequences. Briefly, you will:
- Use the
qiime vsearch dereplicate-sequences
command. - Then cluster your seqs with the UNITE database you selected using
qiime vsearch cluster-features-closed-reference
.
The tips of the tree match your feature table because you selected the corresponding databases earlier. You performed closed-reference clustering with the same database that was used to build an ITS+18S ghost tree.
If you used a pre-built ghost tree, you just need to filter your feature table to contain only IDs that match the ghost tree. There is a file provided that you can use to filter your table.
You can then use the feature table and the ghost tree for
phylogenetic qiime diversity
analysis.
-
Install the current version of QIIME 2 and activate it following the directions on the QIIME 2 website. Always use the most recent version.
Make sure you are now working inside the QIIME 2 virtual environment. The command prompt should include something like
qiime2-2018.8
with the current version of QIIME 2. You want to be working from within the QIIME 2 environment when you install the rest of the code so that your new tools are organized and installed in this software environment. -
Install the standalone, original ghost-tree tool from Conda:
ghost-tree is hosted on Conda's Bioconda channel (channels are designated -c). You can install it using
conda install ghost-tree
orconda install ghost-tree -c bioconda
.Typing
ghost-tree
should bring up help documentation about ghost-tree. If you do not see the help docs, something went wrong.ghost-tree has three software dependencies it relies on. These are Sumaclust, Muscle and FastTree. If you use Conda to install ghost-tree, it should have installed these for you!
-
Next, you will install q2-ghost-tree plugin:
You must have Git installed.
git clone https://github.com/JTFouquier/q2-ghost-tree.git
Find the setup.py file by navigating to the appropriate directory on the command line and enter
pip install -e .
When you type
qiime
you should now see ghost-tree as an available plugin.You can also type
qiime ghost-tree
to see the subcommands in q2-ghost-treeTyping
--help
will show you ghost-tree docs and subcommand docs.
Test that you can import some test files in the small_test_files directory
qiime tools import \
--input-path extension_seqs.fasta \
--type FeatureData[Sequence] \
--output-path extension_seqs.qza
qiime tools import \
--input-path minitaxonomy.txt \
--type FeatureData[Taxonomy] \
--input-format HeaderlessTSVTaxonomyFormat \
--output-path minitaxonomy.qza
qiime tools import \
--input-path miniotus.txt \
--type OtuMap \
--output-path miniotus.qza
qiime tools import \
--input-path foundation_seqs.fasta \
--type FeatureData[Sequence] \
--output-path foundation_seqs.qza
qiime tools import \
--input-path foundation_tree.nwk \
--type Phylogeny[Rooted] \
--output-path foundation_tree.qza
qiime tools import \
--input-path minitaxonomy_foundation.txt \
--type FeatureData[Taxonomy] \
--input-format HeaderlessTSVTaxonomyFormat \
--output-path minitaxonomy_foundation.qza
This is pre filtered, and extracted to only contain fungi.
qiime tools import \
--input-path silva_fungi_only.txt \
--type FeatureData[AlignedSequence] \
--output-path silva_fungi_only.qza
This handles the abundant 'unidentified' organisms.
qiime ghost-tree extensions-cluster \
--i-extension-sequences extension_seqs.qza \
--p-similarity-threshold 0.90 \
--o-otu-map extensions_otu_map_90.qza
Note the subcommand used is for a foundation TREE.
qiime ghost-tree scaffold-hybrid-tree-foundation-tree \
--i-otu-map extensions_otu_map_90.qza \
--i-extension-taxonomy minitaxonomy.qza \
--i-extension-sequences extension_seqs.qza \
--i-foundation-tree foundation_tree.qza \
--i-foundation-taxonomy minitaxonomy_foundation.qza \
--o-ghost-tree ghost-tree-foundation-tree-90-otus.qza
Create a ghost tree using a foundation .nwk tree, and using class-level graft points instead of default genus.
qiime ghost-tree scaffold-hybrid-tree-foundation-tree \
--i-otu-map extensions_otu_map_90.qza \
--i-extension-taxonomy minitaxonomy.qza \
--i-extension-sequences extension_seqs.qza \
--i-foundation-tree foundation_tree.qza \
--i-foundation-taxonomy minitaxonomy_foundation.qza \
--o-ghost-tree ghost-tree-foundation-tree-90-otus-class-level-graft-points.qza \
--p-graft-level c
Note the subcommand used is for a foundation ALIGNMENT.
qiime ghost-tree scaffold-hybrid-tree-foundation-alignment \
--i-otu-map extensions_otu_map_90.qza \
--i-extension-taxonomy minitaxonomy.qza \
--i-extension-sequences extension_seqs.qza \
--i-foundation-alignment silva_fungi_only.qza \
--o-ghost-tree ghost-tree-foundation-allignment-90-otus.qza
This walkthrough is specific to UNITE ITS and SILVA 18S trees for fungal ITS analysis.
These are large and steps may take a few minutes to a few hours.
Using the most recent release of SILVA
qiime tools import \
--input-path SILVA_132_SSURef_Nr99_tax_silva_full_align_trunc.fasta \
--type FeatureData[AlignedSequence] \
--input-format AlignedRNAFASTAFormat \
--output-path SILVA_132_SSURef_Nr99_tax_silva_full_align_trunc.qza
qiime tools import \
--input-path tax_slv_ssu_132.txt \
--type SilvaTaxonomy \
--output-path tax_slv_ssu_132.qza \
--input-format SilvaTaxonomyFormat
qiime tools import \
--input-path tax_slv_ssu_132.acc_taxid \
--type SilvaAccession \
--output-path tax_slv_ssu_132.acc_taxid.qza \
--input-format SilvaAccessionFormat
qiime ghost-tree extract-fungi \
--i-aligned-silva-file SILVA_132_SSURef_Nr99_tax_silva_full_align_trunc.qza \
--i-accession-file tax_slv_ssu_132.acc_taxid.qza \
--i-taxonomy-file tax_slv_ssu_132.qza \
--o-aligned-seqs silva_fungi_only_full_aligned_132.qza
qiime ghost-tree filter-alignment-positions \
--i-aligned-sequences-file silva_fungi_only_full_aligned_132.qza \
--p-maximum-gap-frequency 0.9 \
--p-maximum-position-entropy 0.8 \
--o-aligned-seqs silva_fungi_only_full_aligned_132_FILTERED.qza
qiime tools import \
--input-path sh_refs_qiime_ver7_dynamic_01.12.2017.fasta \
--type FeatureData[Sequence] \
--output-path sh_refs_qiime_ver7_dynamic_01.12.2017.qza
With the small and large file examples above, you should now have
examples of most of the commands in ghost-tree. This
combined with reading the --help
docs in ghost-tree should give you
enough information to make your own ghost tree.
After you get a ghost tree, see METHOD 1 for using ghost trees in your analysis.
qiime tools import \
--type FeatureData[Sequence] \
--input-path seqs.fna \
--output-path seqs.qza
qiime tools import \
--input-path otu_table.biom \
--type FeatureTable[Frequency] \
--output-path feature_table.qza \
--input-format BIOMV100Format