# 03. Phylogeny

Author: Willem Fuetterer


In this Jupyter Notebook the alpha diversity of the samples is analyzed.

**Exercise overview:**<br>
[1. Setup](#setup)<br>
[2. Phylogeny](#phylogeny)<br>





<a id='setup'></a>

## 1. Setup

In [1]:
# importing all required packages & notebook extensions at the start of the notebook
import os
import biom
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import qiime2 as q2
from qiime2 import Visualization

%matplotlib inline

In [1]:
# defining location of data
raw_data_dir = "../data/raw"
data_dir = "../data/processed"
vis_dir  = "../results"

<a id='phylogeny'></a>

## 2. Phylogeny

##### Verify type of input data

In [3]:
! qiime tools peek $data_dir/rep-seqs-filtered.qza

[32mUUID[0m:        250a008e-72a3-4f0b-8969-d82ee0631683
[32mType[0m:        FeatureData[Sequence]
[32mData format[0m: DNASequencesDirectoryFormat


##### Multiple sequence alignment of sequences

In [15]:
! qiime alignment mafft \
    --i-sequences $data_dir/rep-seqs-filtered.qza \
    --o-alignment $data_dir/aligned-rep-seqs.qza

[32mSaved FeatureData[AlignedSequence] to: ../data/processed/aligned-rep-seqs.qza[0m
[0m

##### Removing the ambiguously aligned regions from the alignment

In [16]:
! qiime alignment mask \
    --i-alignment $data_dir/aligned-rep-seqs.qza \
    --o-masked-alignment $data_dir/masked-aligned-rep-seqs.qza

[32mSaved FeatureData[AlignedSequence] to: ../data/processed/masked-aligned-rep-seqs.qza[0m
[0m

##### Construction of the phylogenetic tree using FastTree

In [17]:
! qiime phylogeny fasttree \
    --i-alignment $data_dir/masked-aligned-rep-seqs.qza \
    --o-tree $data_dir/fasttree-tree.qza

! qiime phylogeny midpoint-root \
    --i-tree $data_dir/fasttree-tree.qza \
    --o-rooted-tree $data_dir/fasttree-tree-rooted.qza

[32mSaved Phylogeny[Unrooted] to: ../data/processed/fasttree-tree.qza[0m
[0m[32mSaved Phylogeny[Rooted] to: ../data/processed/fasttree-tree-rooted.qza[0m
[0m

##### Inspect format of result

In [4]:
! qiime tools peek $data_dir/fasttree-tree-rooted.qza

[32mUUID[0m:        54dbac30-b904-41cf-bdc2-9ac608bc6561
[32mType[0m:        Phylogeny[Rooted]
[32mData format[0m: NewickDirectoryFormat


##### Bootstrapping

In [None]:
! qiime phylogeny raxml-rapid-bootstrap \
    --i-alignment $data_dir/masked-aligned-rep-seqs.qza \
    --p-seed 1723 \
    --p-rapid-bootstrap-seed 9384 \
    --p-bootstrap-replicates 10 \
    --p-substitution-model GTRCAT \
    --p-n-threads 4 \
    --o-tree $data_dir/raxml-cat-bootstrap-tree.qza

Rooting the tree

In [None]:
! qiime phylogeny midpoint-root \
    --i-tree $data_dir/raxml-cat-bootstrap-tree.qza \
    --o-rooted-tree $data_dir/raxml-cat-bootstrap-tree-rooted.qza

Inspect format of result

In [None]:
! qiime tools peek $data_dir/raxml-cat-bootstrap-tree-rooted.qza

## Bibliography

[1] Price MN, Dehal PS, Arkin AP. FastTree 2 – Approximately Maximum-Likelihood Trees for Large Alignments. PLoS ONE. 2010;5(3):e9490. doi:10.1371/journal.pone.0009490

[2] Baldauf SL. Phylogeny for the faint of heart: a tutorial. Trends in Genetics. 2003;19(6):345-351. doi:10.1016/S0168-9525(03)00112-4
