# 03. Phylogeny

Author: Willem Fuetterer


In this Jupyter Notebook the alpha diversity of the samples is analyzed.

**Exercise overview:**<br>
[1. Setup](#setup)<br>
[2. Phylogeny](#phylogeny)<br>





<a id='setup'></a>

## 1. Setup

In [2]:
# importing all required packages & notebook extensions at the start of the notebook
import os
import biom
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import qiime2 as q2
from qiime2 import Visualization

%matplotlib inline

In [1]:
# defining location of data
raw_data_dir = "../data/raw"
data_dir = "../data/processed"
vis_dir  = "../results"

<a id='phylogeny'></a>

## 2. Phylogeny

##### Verify type of input data

In [3]:
! qiime tools peek $data_dir/rep-seqs-filtered.qza

[32mUUID[0m:        250a008e-72a3-4f0b-8969-d82ee0631683
[32mType[0m:        FeatureData[Sequence]
[32mData format[0m: DNASequencesDirectoryFormat


##### Multiple sequence alignment of sequences

In [15]:
! qiime alignment mafft \
    --i-sequences $data_dir/rep-seqs-filtered.qza \
    --o-alignment $data_dir/aligned-rep-seqs.qza

[32mSaved FeatureData[AlignedSequence] to: ../data/processed/aligned-rep-seqs.qza[0m
[0m

##### Removing the ambiguously aligned regions from the alignment

In [16]:
! qiime alignment mask \
    --i-alignment $data_dir/aligned-rep-seqs.qza \
    --o-masked-alignment $data_dir/masked-aligned-rep-seqs.qza

[32mSaved FeatureData[AlignedSequence] to: ../data/processed/masked-aligned-rep-seqs.qza[0m
[0m

##### Construction of the phylogenetic tree using FastTree (for exploratory analysis)

Due to its speed FastTree was used for exploratory analyses. But since that comes at the cost of accuracy results were not generated with this tree [2].

In [17]:
! qiime phylogeny fasttree \
    --i-alignment $data_dir/masked-aligned-rep-seqs.qza \
    --o-tree $data_dir/fasttree-tree.qza

! qiime phylogeny midpoint-root \
    --i-tree $data_dir/fasttree-tree.qza \
    --o-rooted-tree $data_dir/fasttree-tree-rooted.qza

[32mSaved Phylogeny[Unrooted] to: ../data/processed/fasttree-tree.qza[0m
[0m[32mSaved Phylogeny[Rooted] to: ../data/processed/fasttree-tree-rooted.qza[0m
[0m

##### Inspect format of result

In [4]:
! qiime tools peek $data_dir/fasttree-tree-rooted.qza

[32mUUID[0m:        54dbac30-b904-41cf-bdc2-9ac608bc6561
[32mType[0m:        Phylogeny[Rooted]
[32mData format[0m: NewickDirectoryFormat


##### Visualizing phylogenetic tree

In [9]:
# exporting the rooted tree for visualization with FigTree
! qiime tools export \
    --input-path $data_dir/fasttree-tree-rooted.qza \
    --output-path $data_dir/fasttree-tree-rooted-exported

[32mExported ../data/processed/fasttree-tree-rooted.qza as NewickDirectoryFormat to directory ../data/processed/fasttree-tree-rooted-exported[0m
[0m

##### Construction of the phylogenetic tree using RAxML and Bootstrapping (for results)

RAxML was used for generation of results due to its high accuracy, consistently outperforming FastTree in producing trees with better topological accuracy. Furthermore integration of bootstrapping provides support values for nodes in the phylogenetic tree by resampling the original dataset and thereby also accounts for sampling errors inherent in biological data. This creates a more robust phylogenetic tree and allows assessment of its statistical reliability [3][4].

In [None]:
! qiime phylogeny raxml-rapid-bootstrap \
    --i-alignment $data_dir/masked-aligned-rep-seqs.qza \
    --p-seed 1723 \
    --p-rapid-bootstrap-seed 9384 \
    --p-bootstrap-replicates 10 \
    --p-substitution-model GTRCAT \
    --p-n-threads 4 \
    --o-tree $data_dir/raxml-cat-bootstrap-tree.qza

##### Rooting the tree

In [None]:
! qiime phylogeny midpoint-root \
    --i-tree $data_dir/raxml-cat-bootstrap-tree.qza \
    --o-rooted-tree $data_dir/raxml-cat-bootstrap-tree-rooted.qza

##### Inspect format of result

In [5]:
! qiime tools peek $data_dir/raxml-cat-bootstrap-tree-rooted.qza

[32mUUID[0m:        5a279bd1-ab17-4106-9cab-730758b0bb6d
[32mType[0m:        Phylogeny[Rooted]
[32mData format[0m: NewickDirectoryFormat


##### Visualizing phylogenetic tree

In [None]:
# exporting the rooted tree for visualization with iTOL
! qiime tools export \
    --input-path $data_dir/raxml-cat-bootstrap-tree-rooted.qza \
    --output-path $data_dir/raxml-cat-bootstrap-tree-rooted-exported

[32mExported ../data/processed/raxml-cat-bootstrap-tree-rooted.qza as NewickDirectoryFormat to directory ../data/processed/raxml-cat-bootstrap-tree-rooted-exported[0m
[0m

## Bibliography

[1] Baldauf SL. Phylogeny for the faint of heart: a tutorial. Trends in Genetics. 2003;19(6):345-351. doi:10.1016/S0168-9525(03)00112-4

[2] Price MN, Dehal PS, Arkin AP. FastTree 2 â€“ Approximately Maximum-Likelihood Trees for Large Alignments. PLoS ONE. 2010;5(3):e9490. doi:10.1371/journal.pone.0009490

[3] Stamatakis A. Using RAxML to Infer Phylogenies. CP in Bioinformatics. 2015;51(1). doi:10.1002/0471250953.bi0614s51

[4] Kozlov AM, Darriba D, Flouri T, Morel B, Stamatakis A. RAxML-NG: a fast, scalable and user-friendly tool for maximum likelihood phylogenetic inference. Wren J, ed. Bioinformatics. 2019;35(21):4453-4455. doi:10.1093/bioinformatics/btz305
