-
Notifications
You must be signed in to change notification settings - Fork 73
metaphlan3
MetaPhlAn is a tool for profiling the composition of microbial communities from metagenomic shotgun sequencing data.
- Tutorials for MetaPhlAn1 and MetaPhlAn2 are also available.
- Please direct questions to the MetaPhlAn biobakery support forum.
- For information on installation and pre-requisites, see here.
If you're using this tutorial as part of a workshop, or if you'd like to think about the algorithm and implementation details a bit, you'll occasionally see "discussion questions" at the end of tutorial sections, formatted like this:
- What is your quest?
- What is your favorite color?
The basic steps of MetaPhlAn are:
MetaPhlAn accepts as input short reads from a single shotgun metagenomic sequencing experiment and outputs the list of detected microbes and their relative abundances.
MetaPhlAn accepts metagenomic sequence in several formats, including
.fasta
, .fastq
, bowtie2out
and sam
.
To see all the input formats (and other arguments), type metaphlan -h | less
. Use the
arrow key to move up and down. Type q
to quit back to the prompt.
usage: metaphlan --input_type {fastq,fasta,bowtie2out,sam} [--force]
[--bowtie2db METAPHLAN_BOWTIE2_DB] [-x INDEX]
[--bt2_ps BowTie2 presets] [--bowtie2_exe BOWTIE2_EXE]
[--bowtie2_build BOWTIE2_BUILD] [--bowtie2out FILE_NAME]
[--min_mapq_val MIN_MAPQ_VAL] [--no_map] [--tmp_dir]
[--tax_lev TAXONOMIC_LEVEL] [--min_cu_len]
[--min_alignment_len] [--add_viruses] [--ignore_eukaryotes]
[--ignore_bacteria] [--ignore_archaea] [--stat_q]
[--perc_nonzero] [--ignore_markers IGNORE_MARKERS]
[--avoid_disqm] [--stat] [-t ANALYSIS TYPE]
[--nreads NUMBER_OF_READS] [--pres_th PRESENCE_THRESHOLD]
[--clade] [--min_ab] [-o output file] [--sample_id_key name]
[--use_group_representative] [--sample_id value]
[-s sam_output_file] [--legacy-output] [--CAMI_format_output]
[--unknown_estimation] [--biom biom_output] [--mdelim mdelim]
[--nproc N] [--install] [--force_download]
[--read_min_len READ_MIN_LEN] [-v] [-h]
[INPUT_FILE] [OUTPUT_FILE]
DESCRIPTION
MetaPhlAn version 3.0 (20 Mar 2020):
METAgenomic PHyLogenetic ANalysis for metagenomic taxonomic profiling.
AUTHORS: Nicola Segata (nicola.segata@unitn.it), Duy Tin Truong, Francesco Asnicar (f.asnicar@unitn.it),
Francesco Beghini (francesco.beghini@unitn.it)
- Which command line arguments are required?
- Which optional command line arguments seem most important or most commonly used?
For the purpose of this tutorial, we will use the following set of six input files that have been subsampled for rapid analysis:
- SRS014476-Supragingival_plaque.fasta.gz
- SRS014494-Posterior_fornix.fasta.gz
- SRS014459-Stool.fasta.gz
- SRS014464-Anterior_nares.fasta.gz
- SRS014470-Tongue_dorsum.fasta.gz
- SRS014472-Buccal_mucosa.fasta.gz
The original files, and many others, can be downloaded from the HMP DACC.
-
IF you are running this tutorial as a part of short course on a cloud instance, the input files have been pre-downloaded. Open a terminal and enter:
cd Tutorials/metaphlan3 ls input
-
Copy the input files into the current working directory
cp input/*fasta.gz .
Use the ls
command to check if the files have been correctly copied/downloaded.
We are now ready to run MetaPhlAn.
Please proceed to Run a single sample section below.
-
IF you're running this tutorial locally (your PC/laptop), make a new folder in your current working directory with
mkdir
and access it withcd
. Use this folder for all MetaPhlAn analysis.mkdir metaphlan_analysis cd metaphlan_analysis
-
Click on the links to download each file. NOTE that if you are using the Google Chrome browser, it may automatically decompress (gunzip) the gzip-ed files. The files should download to your default
Downloads
folder location. Copy files to this location from your Downloads folder:cp ~/Downloads/SRS*.fasta.gz .
-
OR, you can use the
curl
orwget
programs to download each file directly into the folder you created. Make sure in the correct folder i.e.metaphlan_analysis
withpwd
. Right click on each link, choose "Copy Link Address" (or equivalent), and in a terminal type:curl -LO https://github.com/biobakery/biobakery/raw/master/test_suite/biobakery_tests/data/metaphlan3/input/SRS014476-Supragingival_plaque.fasta.gz or wget https://github.com/biobakery/biobakery/raw/master/test_suite/biobakery_tests/data/metaphlan3/input/SRS014476-Supragingival_plaque.fasta.gz
Use the ls
command to check if the files have been correctly copied/downloaded. We are now ready to run MetaPhlAn.
- If you're familiar with the command line and file structure, does it matter where you place MetaPhlAn's input files, or where you run it from?
- What about these input files might make them particularly appropriate for a short demonstration?
Here is the basic example to profile a single metagenome from raw reads:
metaphlan SRS014476-Supragingival_plaque.fasta.gz --input_type fasta > SRS014476-Supragingival_plaque_profile.txt
Running MetaPhlAn, following the example in the prior section, will
create two output files. Check what files have been created with ls -ltr
.
File 1: SRS014476-Supragingival_plaque.fasta.gz.bowtie2out.txt
This file contains the intermediate mapping results to unique sequence markers.
Alignments are listed one per line in tab-separated columns of read and reference marker.
less -S SRS014476-Supragingival_plaque.fasta.gz.bowtie2out.txt
Output:
HWUSI-EAS1568_102539179:1:100:10001:7882/1 712117__F3PCC2__HMPREF9056_02717
HWUSI-EAS1568_102539179:1:100:10007:17628/1 712357__A0A0K2JD54__AMK43_09330
HWUSI-EAS1568_102539179:1:100:10017:5224/1 2047__E3H3C1__HMPREF0733_12099
HWUSI-EAS1568_102539179:1:100:10023:7402/1 712122__A0A0M4H4P0__AM609_02120
HWUSI-EAS1568_102539179:1:100:10025:16605/1 43768__C0E1K0__murJ
HWUSI-EAS1568_102539179:1:100:10033:18381/1 43768__E0DI09__HMPREF0299_5319
HWUSI-EAS1568_102539179:1:100:10083:4412/1 2047__E3H2T8__coaBC
HWUSI-EAS1568_102539179:1:100:10091:12482/1 505__C4GHX6__GCWU000324_00464
HWUSI-EAS1568_102539179:1:100:10094:10442/1 544581__U1RE23__HMPREF1979_00736
HWUSI-EAS1568_102539179:1:100:10103:1753/1 28133__F9DGE7__CBG57_05925
HWUSI-EAS1568_102539179:1:100:10109:14464/1 43768__E0DGH3__HMPREF0299_6971
HWUSI-EAS1568_102539179:1:100:10112:17904/1 43768__E0DEQ2__HMPREF0299_6337
File 2: SRS014476-Supragingival_plaque_profile.txt
This file contains the final computed organism abundances.
Organism abundances are listed one clade per line, tab-separated from the clade's percent abundance:
less -S SRS014476-Supragingival_plaque_profile.txt
Output:
#mpa_v30_CHOCOPhlAn_201901
#/n/huttenhower_lab/tools/metaphlan3/bin/metaphlan SRS014476-Supragingival_plaque.fasta.gz --input_type fasta
#SampleID Metaphlan_Analysis
#clade_name NCBI_tax_id relative_abundance additional_species
k__Bacteria 2 100.0
k__Bacteria|p__Actinobacteria 2|201174 100.0
k__Bacteria|p__Actinobacteria|c__Actinobacteria 2|201174|1760 100.0
k__Bacteria|p__Actinobacteria|c__Actinobacteria|o__Corynebacteriales 2|201174|1760|85007 65.25681
k__Bacteria|p__Actinobacteria|c__Actinobacteria|o__Micrococcales 2|201174|1760|85006 34.74319
k__Bacteria|p__Actinobacteria|c__Actinobacteria|o__Corynebacteriales|f__Corynebacteriaceae 2|201174|1760|85007|1653 65.25681
k__Bacteria|p__Actinobacteria|c__Actinobacteria|o__Micrococcales|f__Micrococcaceae 2|201174|1760|85006|1268 34.74319
k__Bacteria|p__Actinobacteria|c__Actinobacteria|o__Corynebacteriales|f__Corynebacteriaceae|g__Corynebacterium 2|201174|1760|85007|1653|1716 65.25681
k__Bacteria|p__Actinobacteria|c__Actinobacteria|o__Micrococcales|f__Micrococcaceae|g__Rothia 2|201174|1760|85006|1268|32207 34.74319
k__Bacteria|p__Actinobacteria|c__Actinobacteria|o__Corynebacteriales|f__Corynebacteriaceae|g__Corynebacterium|s__Corynebacterium_matruchotii 2|201174|1760|85007|1653|1716|43768 65.25681
k__Bacteria|p__Actinobacteria|c__Actinobacteria|o__Micrococcales|f__Micrococcaceae|g__Rothia|s__Rothia_dentocariosa 2|201174|1760|85006|1268|32207|2047 34.74319 k__Bacteria|p__Actinobacteria|
~
~
- The file has a 4-line header. The first line lists the reference marker genes database that MetaPhlAn uses. There are ~1.1M unique clade-specific marker genes identified from ~100k reference genomes (~99,500 bacterial and archaeal and ~500 eukaryotic). The second line lists the path to the tool, the name of the input file and the arguments that mentioned. The fourth line has the column headers for the columns below.
- The first column lists clades, ranging from taxonomic kingdoms
(Bacteria, Archaea, etc.) through species. The taxonomic level of each
clade is prefixed to indicate its level:
Kingdom: k__, Phylum: p__, Class: c__, Order: o__, Family: f__, Genus: g__, Species: s__
. Let us examine these more clearly by listing them out as per taxonomic hierarchy.
We will look for (grep) lines which contain the pattern s__
that is associated with species and print the first match with the -m1
argument. Remember that this file have 4 tab-separated columns and the taxonomy is listed in the first; so we will use cut -f1
to get (cut out) the first column only (the field at position 1). Finally, the taxonomic levels are separated by the |
character which we will replace with the new line character \n
.
grep s__ -m1 SRS014476-Supragingival_plaque_profile.txt | cut -f1 | sed 's/|/\n/g'
Output (The taxonomy of the microbe C. matruchotii):
k__Bacteria
p__Actinobacteria
c__Actinobacteria
o__Corynebacteriales
f__Corynebacteriaceae
g__Corynebacterium
s__Corynebacterium_matruchotii
- The second column lists the respective NCBI Taxon IDs.
- The third column lists relative abundances. Since sequence-based profiling is relative and does not provide absolute
cellular abundance measures, clades are hierarchically summed. Each
level will sum to 100%; that is, the sum of all kingdom-level clades is
100%, the sum of all genus-level clades (including unclassified) is also
100%, and so forth. OTU equivalents can be extracted by using only the
species-level
s__
clades from this file (again, making sure to include clades unclassified at this level).
Let us check if all orders sum to 100% using grep
. The orders 'Corynebacteriales' and 'Micrococcales' are in the class 'Actinobacteria'.
grep o__ SRS014476-Supragingival_plaque_profile.txt | grep -v f__
Output:
k__Bacteria|p__Actinobacteria|c__Actinobacteria|o__Corynebacteriales 2|201174|1760|85007 65.25681
k__Bacteria|p__Actinobacteria|c__Actinobacteria|o__Micrococcales 2|201174|1760|85006 34.74319
Similarly, the families must sum to 100%. In this example, let us also display the fields of interest i.e. taxonomy names and percentages for ease of viewing.
grep f__ SRS014476-Supragingival_plaque_profile.txt | grep -v g__ | cut -f1,3
Output:
k__Bacteria|p__Actinobacteria|c__Actinobacteria|o__Corynebacteriales|f__Corynebacteriaceae 65.25681
k__Bacteria|p__Actinobacteria|c__Actinobacteria|o__Micrococcales|f__Micrococcaceae 34.74319
- The fourth column lists additional species for cases where the metagenome profile contains clades that represent multiple species. The species listed in column 1 is the representative species in such cases.
- What do the parts of the read identifiers in the first column of the first, per-read/marker output file indicate?
- What do the parts of the gene identifiers in the second column of the first, per-read/marker output file indicate?
- Since the second, per-clade abundance output file is already normalized, you never need to sum-normalize these relative abundances. However, if you tried to, what would the sum of all clades' relative abundances be?
If available, it is recommended to use the bowtie2out file as an input to MetaPhlAn as it significantly speeds up metagenomic profiling. Let us delete the File 2 we created in the previous step and use the bowtie2out file (File1) to regenerate it. Notice that we will have to change the --input_type
argument.
rm -f SRS014476-Supragingival_plaque_profile.txt
metaphlan SRS014476-Supragingival_plaque.fasta.gz.bowtie2out.txt --input_type bowtie2out > SRS014476-Supragingival_plaque_profile.txt
ls -ltr
When available, MetaPhlAn can take advantage of multiple cores using
the nproc
flag:
metaphlan SRS014459-Stool.fasta.gz --input_type fasta --nproc 4 > SRS014459-Stool_profile.txt
Note: nproc
is used by bowtie2 which processes 10K reads per second per thread. Since we have a very small number of reads in this demo, the difference in speed up is negligible.
- Each MetaPhlAn execution processes exactly one sample, but the resulting single-sample analyses can easily be combined into an abundance table spanning multiple samples. Let's finish the last four samples from the input files tutorial section:
metaphlan SRS014464-Anterior_nares.fasta.gz --input_type fasta --nproc 4 > SRS014464-Anterior_nares_profile.txt
metaphlan SRS014470-Tongue_dorsum.fasta.gz --input_type fasta --nproc 4 > SRS014470-Tongue_dorsum_profile.txt
metaphlan SRS014472-Buccal_mucosa.fasta.gz --input_type fasta --nproc 4 > SRS014472-Buccal_mucosa_profile.txt
metaphlan SRS014494-Posterior_fornix.fasta.gz --input_type fasta --nproc 4 > SRS014494-Posterior_fornix_profile.txt
- Alternatively, if you are familiar with shell syntax, you can loop over all input files (make sure you have deleted previously generated output files to prevent errors):
for i in SRS*.fasta.gz; do metaphlan $i --input_type fasta --nproc 4 > ${i%.fasta.gz}_profile.txt; done
Either way, you will now have a complete set of six profile output files and six intermediate mapping files. If you'd like to skip this step to speed things up, the 12 demo file outputs can be downloaded from the following links (right-click on the link and pick 'Save Link as ..' or click on the link and then right-click on the preview page and select "Save Page as...", or copy the URL to download on a server).
-
Profile output files
-
Intermediate mapping output files
-
If you are running this tutorial as part of a short course, the output bowtie2out and profile files are in the
output
folder. To copy them to the current directory, enter:
cp output/*_profile.txt .
Finally, the MetaPhlAn distribution includes a utility script that will create a single tab-delimited table from these files:
merge_metaphlan_tables.py *_profile.txt > merged_abundance_table.txt
The resulting table can be opened in Excel, any gene expression analysis
program, less
(example below), or visualized graphically as per
subsequent tutorial sections:
less -S merged_abundance_table.txt
The first few lines look like:
#mpa_v30_CHOCOPhlAn_201901
clade_name NCBI_tax_id SRS014494-Posterior_fornix_profile SRS014476-Supragingival_plaque_profile SRS014472-Buccal_mucosa_profile SRS014470-Tongue_dorsum_profile SRS014464-Anterior_nares_profile SRS014459-Sto
k__Bacteria 2 100.0 100.0 100.0 100.0 100.0 100.0
k__Bacteria|p__Actinobacteria 2|201174 0 100.0 0 0 0 0
k__Bacteria|p__Actinobacteria|c__Actinobacteria 2|201174|1760 0 100.0 0 0 0 0
k__Bacteria|p__Actinobacteria|c__Actinobacteria|o__Corynebacteriales 2|201174|1760|85007 0 65.25681 0 0 0 0
k__Bacteria|p__Actinobacteria|c__Actinobacteria|o__Corynebacteriales|f__Corynebacteriaceae 2|201174|1760|85007|1653 0 65.25681 0 0 0 0
k__Bacteria|p__Actinobacteria|c__Actinobacteria|o__Corynebacteriales|f__Corynebacteriaceae|g__Corynebacterium 2|201174|1760|85007|1653|1716 0 65.25681 0 0 0 0
k__Bacteria|p__Actinobacteria|c__Actinobacteria|o__Corynebacteriales|f__Corynebacteriaceae|g__Corynebacterium|s__Corynebacterium_matruchotii 2|201174|1760|85007|1653|1716|43768 0 65.25681 0 0 0
k__Bacteria|p__Actinobacteria|c__Actinobacteria|o__Micrococcales 2|201174|1760|85006 0 34.743190000000006 0 0 0 0
k__Bacteria|p__Actinobacteria|c__Actinobacteria|o__Micrococcales|f__Micrococcaceae 2|201174|1760|85006|1268 0 34.743190000000006 0 0 0 0
k__Bacteria|p__Actinobacteria|c__Actinobacteria|o__Micrococcales|f__Micrococcaceae|g__Rothia 2|201174|1760|85006|1268|32207 0 34.743190000000006 0 0 0 0
k__Bacteria|p__Actinobacteria|c__Actinobacteria|o__Micrococcales|f__Micrococcaceae|g__Rothia|s__Rothia_dentocariosa 2|201174|1760|85006|1268|32207|2047 0 34.743190000000006 0 0 0 0
- How might you convert these relative abundance measures to pseudo-RPKs that are sensitive to each sample's read depth?
- In what important ways does analysis of a relative abundance table differ from that of a gene expression (microarray or RNA-seq) transcript table? In what ways are they similar?
- Under what circumstances is this tab-delimited text data format particularly efficient or inefficient? Is this likely to be a problem for species-level taxonomic profiles?
A heatmap is one way to visualize tabular abundance results such as
those from MetaPhlAn. The plotting tool we'll use here, hclust2
, is a
convenience script that can show any, some, or all of the microbes or
samples in a MetaPhlAn table. In this tutorial we will plot the heatmap
for all of the samples.
If you're using this tutorial from the bioBakery VM (locally or on the cloud), hclust2 is already installed! Otherwise, you can install hclust2 and other bioBakery tools automatically with Conda. :
conda install -c biobakery hclust2
This will install hclust2 and all of its dependencies.
Alternatively, you can manually install hclust2 from source by downloading hclust2 and then installing the hclust2 dependencies (numpy, pandas, biopython, scipy, and matplotlib).
Step 1: Generate the species only abundance table
Run the following command to create a species only abundance table, providing the abundance table ( merged_abundance_table.txt ) created in prior tutorial steps: :
grep -E "s__|clade" merged_abundance_table.txt | sed 's/^.*s__//g'\
| cut -f1,3-8 | sed -e 's/clade_name/body_site/g' > merged_abundance_table_species.txt
There are four parts to this command. The first grep searches the file
for the regular expression "s__|clade"
which matches to those lines
with species information and also to the header which contains the names of the body sites. The
sed
removes the full taxonomy from each line so the first column only
includes the species name. The cut
gives us all columns except the NCBI Taxon ID (column 2) and the last sed
helps us replace clade_name
to body_site
.
The new abundance table (merged_abundance_table_species.txt) will contain only the species abundances with just the species names (instead of the full taxonomy).
The first few lines of the file will look like:
body_site SRS014494-Posterior_fornix_profile SRS014476-Supragingival_plaque_profile SRS014472-Buccal_mucosa_profile SRS014470-Tongue_dorsum_profile SRS014464-Anterior_nares_profile SRS014459-Stool_profile
Corynebacterium_matruchotii 0 65.25681 0 0 0 0
Rothia_dentocariosa 0 34.743190000000006 0 0 0 0
Bacteroides_stercoris 0 0 0 0 0 31.62003
Prevotella_histicola 0 0 0 51.36481 0 0
Prevotella_pallens 0 0 0 5.10895 0 0
Gemella_haemolysans 0 0 19.17145 0 0 0
Dolosigranulum_pigrum 0 0 0 0 2.7635099999999997 0
Lactobacillus_crispatus 80.56118000000001 0 0 0 0 0
Lactobacillus_iners 19.43882 0 0 0 0 0
- Why is it useful to visualize (and particularly to cluster) only the "tips" of the taxonomic tree?
Step 2: Generate the heatmap
Next generate the species only heatmap by running the following command:
hclust2.py -i merged_abundance_table_species.txt -o abundance_heatmap_species.png --f_dist_f braycurtis --s_dist_f braycurtis --cell_aspect_ratio 0.5 -l --flabel_size 10 --slabel_size 10 --max_flabel_len 100 --max_slabel_len 100 --minv 0.1 --dpi 300
We have only 16 microbes in this demo file but typically, for ease of viewing, one can show the top 25 species using the --ftop 25
argument. This script uses
Bray-Curtis as the distance measure both between samples (s) and between
features (f) (microbes), sets the ratio between the width/height of cells to
0.5, uses a log scale for assigning heatmap colors, sets the sample and
feature label size to 10, sets the max sample and feature label length to
100, selects the minimum value to display as 0.1, and selects an image
resolution of 300 (in that order!).
Open the resulting heatmap
(abundance_heatmap_species.png)
to take a look. If you generated it on your local computer, just double
click. If you're using a server with just a terminal interface, you
might have to transfer the file locally first using a tool like scp
.
If you're using a server with a graphical interface, you can open the
file using a command like see abundance_heatmap_species.png
). Using
any of these methods, the results should look like:
Notice that due to the very large differences between body site communities in the human microbiome, we can still easily see site-specific species despite the small demonstration input files (each is subsampled to only 10,000 reads for efficiency).
- Which microbes are most abundant at each body site in these demonstration data?
- Under what circumstances is log-scaling the heatmap abundance colors good? When might it be bad (i.e. visually deceptive)?
You can also visualize microbial abundances on a tree of life (also referred to as a phylogeny or cladogram) that captures their taxonomic (or phylogenetic) relatedness. Here, we'll use a tool called GraPhlAn that can render trees and annotate them with microbial names or data such as abundances. The instructions here assume that you will run GraPhlAn from the command line, but if you'd like to use an online Galaxy module instead, see the section on GraPhlAn in Galaxy. For more information on this tool, refer to the GraPhlAn tutorial. If you're using this tutorial from the bioBakery VM (locally or on the cloud), GraPhlAn is already installed! Otherwise, you can install GraPhlAn and other bioBakery tools automatically with Conda:
conda install -c biobakery graphlan
conda install -b bioconda export2graphlan
This will install GraPhlAn, export2graphlan, and all of its dependencies.
Alternatively, you can manually install them from source by downloading GraPhlAn and downloading export2graphlan and then install the GraPhlAn dependencies (numpy, pandas, biopython, scipy, and matplotlib).
Step 1: Create the GraPhlAn input files
GraPhlAn requires two inputs: (i) a tree structure to represent and (ii) graphical annotation options for the tree.
Run the following command to generate the two input files for GraPhlAn (the tree and annotation files) providing the abundance table (merged_abundance_table.txt) created in the prior tutorial steps reformatted to remove the version header and the NCBI taxon id in the second column.
tail -n +2 merged_abundance_table.txt | cut -f1,3- > merged_abundance_table_reformatted.txt
export2graphlan.py --skip_rows 1 -i merged_abundance_table_reformatted.txt --tree merged_abundance.tree.txt --annotation merged_abundance.annot.txt --most_abundant 100 --abundance_threshold 1 --least_biomarkers 10 --annotations 5,6 --external_annotations 7 --min_clade_size 1
The command above has options to skip rows 1 and 2 (headers), select the top 100
most abundance clades to highlight, set a minimum abundance threshold
for clades to be annotated, extract a minimum of 10 biomarkers, select
taxonomic levels 5 and 6 to be annotated within the tree, select
taxonomic level 7 to be used in the external legend, and set the minimum
size of clades annotated as biomarkers to 1. The output files created
are merged_abundance.tree.txt
and merged_abundance.annot.txt
.
- What are the contents and structure of the "tree" file?
- What are the contents and structure of the "annot" file?
Step 2: Create a cladogram
Run the following commands to generate the cladogram providing the tree ( merged_abundance.tree.txt ) and its annotation ( merged_abundance.annot.txt ) files from the prior step :
graphlan_annotate.py --annot merged_abundance.annot.txt merged_abundance.tree.txt merged_abundance.xml
graphlan.py --dpi 300 merged_abundance.xml merged_abundance.png --external_legends
The first command creates an xml file from the tree and annotation inputs. The second command creates the image, sets the image resolution to 300 and requests external legends.
The first few lines of the xml file are:
<phyloxml xmlns="http://www.phyloxml.org" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.phyloxml.org http://www.phyloxml.org/1.10/phyloxml.xsd">
<phylogeny rooted="true">
<clade>
<clade>
<name>k__Bacteria</name>
<branch_length>1.0</branch_length>
<property applies_to="clade" datatype="xsd:string" id_ref="clade_marker_size" ref="A:1">10.0</property>
<clade>
The generated cladogram (merged_abundance.png) is:
The annotation (merged_abundance_annot.png ) should be:
And the legend (merged_abundance_legend.png ) is:
As above, if you generated these images on your local computer, open
them by simply double clicking. If you're using a server with only a
terminal interface, transfer the file locally first using a tool like
scp
. If you're using a server with a graphical interface, you can open
the file using a command like see merged_abundance.png
).
- What is the PhyloXML format? Why might it be used in this context?
- Why is it often particularly useful to plot circular, rather than linear, cladograms?
- What other types of annotations might be useful on such a tree (either different graphical formats, or different types of data to take advantage of them)?
- HUMAnN 2.0
- HUMAnN 3.0
- MetaPhlAn 2.0
- MetaPhlAn 3.0
- MetaPhlAn 4.0
- MetaPhlAn 4.1
- PhyloPhlAn 3
- PICRUSt 2.0
- ShortBRED
- PPANINI
- StrainPhlAn 3.0
- StrainPhlAn 4.0
- MelonnPan
- WAAFLE
- MetaWIBELE
- MACARRoN
- FUGAsseM
- HAllA
- HAllA Legacy
- ARepA
- CCREPE
- LEfSe
- MaAsLin 2.0
- MaAsLin 3.0
- MMUPHin
- microPITA
- SparseDOSSA
- SparseDOSSA2
- BAnOCC
- anpan
- MTXmodel
- MTX model 3
- PARATHAA