Cydrasil 1.5: A comprehensive phylogenetic tree of cyanobacterial 16S rRNA gene sequences
Daniel Roush, Ana Giraldo-Silva, Vanessa M. C. Fernandes, Nathali Maria Machado de Lima, Corey Nelson, Sam McClintock, Sergio Velasco Ayuso, Kevin Klicki, Blake Dirks, Watson Arantes, Kira Sorochkina, and Ferran Garcia-Pichel
1School of Life Sciences, Arizona State University, 85282 Tempe, Arizona, USA
2Center for Fundamental and Applied Microbiomics, Biodesign Institute, Arizona State University, 85282 Tempe, Arizona, USA
Interactive Reference Tree
Cydrasil is a sequence database, alignment, and phylogenetic tree containing 1494 cyanobacterial 16S rRNA gene sequences, downloaded from NCBI and IMG and used for placement of sequence variants (sOTUs) from 16S rRNA gene amplicon studies. The reference alignment was generated using SSUALIGN1 with default parameters. This aligner uses a profile-based alignment strategy, in which each target sequence is aligned independently to a covariance model that uses the 16S rRNA gene secondary structure, and then masked using SSUMASK with the automatically computed alignment confidence values (posterior probabilities). A maximum-likelihood phylogenetic tree was then generated using the RAxML-HPC22 Workflow on XSEDE (8.2.12) on the CIPRES Science Gateway3. The ML + thorough bootstrap workflow was used with the following modified parameters: 1000 bootstraps (-N 1000) and the GTRGAMMA model. All other parameters were left at default values.
Alignment of query sequences to reference alignment
There are issues with compiling PaPaRa on new Macs. We will be moving from PaPaRa in Mid-October 2019
A FASTA file containing sequences of interest (typically the reference sequence file from Qiime1/2) is aligned to the reference alignment. We use PaPaRa4 alignment and the repository contains a phylip (.phy) alignment to use with PaPaRa. PaPaRa Installation PaPaRa Github
-t, Reference tree
-s, Reference alignment in phylip (.phy) format
-q, Fasta file containing sequences to be aligned to the reference alignment
-n, Name of output alignment
-r, Prevent PaPaRa from adding gaps in the reference alignment
papara -t cydrasil-rc1-bipartitions-tree-1000.nwk -s cydrasil-rc1-alignment-phylip.phy -q ref-seqs.fasta -r -n combined-aln
The output combined alignment (from step 1) is then used as input (argument -s) for RaxML8 evolutionary placement algorithm.
Input RAxML Arguments
--f, Mode selection of RAxML (v)
--T, Number of threads for multithreading
--t, Reference tree file name (cydrasil-rc1-bipartitions-tree-1000.nwk)
--s, Combined alignment from PaPaRa (combined-aln)
--m, Substitution model (GTRGAMMA)
--n, Name for output
Phylogenetic tree with placements in .jplace format (RAxML_portableTree.SeqPlacements.jplace)
raxml-HPC --f v --s papara_alignment.combined-aln --t cydrasil-rc1-bipartitions-tree-1000.nwk --m GTRGAMMA --n SeqPlacements
To visualize the new phylogenetic tree containing the sequences of interest, the placement file (RAxML_portableTree.SeqPlacements.jplace) is uploaded onto [iTOL] (https://itol.embl.de/)5. Placements act as a dataset within iTOL and can be toggled. Nodes with sequences of interest are visualized with red spheres and clicking a node will show a breakdown of sequence ids and the corresponding confidence values for that node.
Steps to visualize tree in iTOL:
Create an user account in the iTOL server
Drag .jplace file into an iTOL project
Root the tree: Right click on node I1884 and click re-root tree here
From the Controls box set parameters as follows:
- Display mode: Normal
- Parameters: 0 degree rotation
- Invert: No
- Slanted: No
- Branch lengths: Use
- Labels: At tips NOTE: You can toggle labels to off if the tree slows your computer.
- Label rotation: On
- Label alignment: Left
ADVANCED: No changes. If you are having trouble reading your results, you can use the scaling factors to separate leaves.
- Turn on phylogenetic placements
- Use “Show query form” to search placement of individiual query sequences
- Insert query sequence ID and use the “highlight option” to display red spheres indicating phylogenic placement of a given query sequence.
- Nawrocki E. 2009. Structural RNA Homology Search and Alignment Using Covariance Models. Washington University School of Medicine.
- Stamatakis A. 2014. RAxML version 8: A tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics 30:1312–1313.
- Miller MA, Pfeiffer W, Schwartz T. 2010. Creating the CIPRES Science Gateway for inference of large phylogenetic trees. 2010 Gatew Comput Environ Work GCE 2010.
- Berger SA, Stamatakis A. 2011. Aligning short reads to reference alignments and trees. Bioinformatics 27:2068–2075.
- Letunic I, Bork P. 2016. Interactive tree of life (iTOL) v3: an online tool for the display and annotation of phylogenetic and other trees. Nucleic Acids Res 44:W242–W245.