# Phylogeny

In [3]:
from qiime2 import Visualization
import os
import pandas as pd

# Data directory 
data_dir = '../data'
if not os.path.isdir(data_dir):
    os.makedirs(data_dir)

### 1. Phylogeny _de novo_

We start by using the `mafft` action to obtain a multiple sequence alignment of our sequences:

In [3]:
# Multiple Alignment

# the original input was --i-sequences $data_dir/rep-seqs-filtered.qza \

! qiime alignment mafft \
    --i-sequences $data_dir/denoising/dada2_rep_set.qza \
    --o-alignment $data_dir/phylogeny/aligned-seqs.qza

[32mSaved FeatureData[AlignedSequence] to: ../data/phylogeny/aligned-seqs.qza[0m
[0m

Then, we do masking (removing) the ambiguously aligned regions from the alignment to increase the performance of the reconstructed phylogeny:

In [4]:
# Alignment Masking (removing regions w/ ambiguous aligments)

! qiime alignment mask \
    --i-alignment $data_dir/phylogeny/aligned-seqs.qza \
    --o-masked-alignment $data_dir/phylogeny/masked-aligned-seqs.qza

[32mSaved FeatureData[AlignedSequence] to: ../data/phylogeny/masked-aligned-seqs.qza[0m
[0m

Finally, we generate the De Novo tree using fasttree for its fast performance, root the unrooted tree at the midpoint of the longest tip-to-tip distance, and visualize it.

In [5]:
# DE NOVO Tree Construction

! qiime phylogeny fasttree \
    --i-alignment $data_dir/phylogeny/masked-aligned-seqs.qza \
    --o-tree $data_dir/phylogeny/fasttree.qza

! qiime phylogeny midpoint-root \
    --i-tree $data_dir/phylogeny/fasttree.qza \
    --o-rooted-tree $data_dir/phylogeny/fasttree-rooted.qza

[32mSaved Phylogeny[Unrooted] to: ../data/phylogeny/fasttree.qza[0m
[0m[32mSaved Phylogeny[Rooted] to: ../data/phylogeny/fasttree-rooted.qza[0m
[0m

In [6]:
# DE NOVO Tree Visualisation 

! qiime empress tree-plot \
    --i-tree $data_dir/phylogeny/fasttree-rooted.qza \
    --m-feature-metadata-file $data_dir/taxonomy/taxonomy.qza \
    --o-visualization $data_dir/phylogeny/fasttree-rooted.qzv

[32mSaved Visualization to: ../data/phylogeny/fasttree-rooted.qzv[0m
[0m

In [12]:
Visualization.load(f'{data_dir}/phylogeny/fasttree-rooted.qzv')

### 2. Bootstrapping

We do bootstrapping to assert robustness of the branch splits. 

In [3]:
# BOOTSRAPPING (long step)

! qiime phylogeny raxml-rapid-bootstrap \
    --i-alignment $data_dir/phylogeny/masked-aligned-seqs.qza \
    --p-seed 1723 \
    --p-rapid-bootstrap-seed 9384 \
    --p-bootstrap-replicates 100 \
    --p-substitution-model GTRCAT \
    --p-n-threads 3 \
    --o-tree $data_dir/phylogeny/raxml-cat-bootstrap-tree.qza

[32mSaved Phylogeny[Unrooted] to: Alien_data/raxml-cat-bootstrap-tree.qza[0m
[0m

Again, we root the tree and visualize it.

In [9]:
! qiime phylogeny midpoint-root \
    --i-tree $data_dir/phylogeny/raxml-cat-bootstrap-tree.qza \
    --o-rooted-tree $data_dir/phylogeny/raxml-cat-bootstrap-rooted.qza

! qiime empress tree-plot \
    --i-tree $data_dir/phylogeny/raxml-cat-bootstrap-rooted.qza \
    --m-feature-metadata-file $data_dir/taxonomy/taxonomy.qza \
    --o-visualization $data_dir/phylogeny/raxml-cat-bootstrap-rooted.qzv

[32mSaved Phylogeny[Rooted] to: ../data/phylogeny/raxml-cat-bootstrap-rooted.qza[0m
[0m[32mSaved Visualization to: ../data/phylogeny/raxml-cat-bootstrap-rooted.qzv[0m
[0m

In [13]:
Visualization.load(f'{data_dir}/phylogeny/raxml-cat-bootstrap-rooted.qzv')

### 3. Fragment Insert Tree

Finally we perform fragment insertion to use a tree that was already constructed and only try to insert our sequences into that existing tree.

We first downlaod the reference database

In [14]:
# Get reference database (Greengenes)
! wget -nv -O $data_dir/phylogeny/sepp-refs-gg-13-8.qza https://data.qiime2.org/2021.4/common/sepp-refs-gg-13-8.qza

2022-12-16 22:15:53 URL:https://s3-us-west-2.amazonaws.com/qiime2-data/2021.4/common/sepp-refs-gg-13-8.qza [50161069/50161069] -> "../data/phylogeny/sepp-refs-gg-13-8.qza" [1]


Then, we use fragment-insertion sepp to generate phylogenetic tree using the same input sequences, and visualize the results

In [None]:
# (slow step)
# FRAG IN Construct Tree

#original input --i-representative-sequences $data_dir/rep-seqs-filtered.qza \

! qiime fragment-insertion sepp \
    --i-representative-sequences $data_dir/denoising/dada2_rep_set.qza \
    --i-reference-database $data_dir/phylogeny/sepp-refs-gg-13-8.qza \
    --p-threads 2 \
    --o-tree $data_dir/phylogeny/sepp-tree.qza \
    --o-placements $data_dir/phylogeny/sepp-tree-placements.qza

In [4]:
# FRAG IN Visualize Tree
! qiime empress tree-plot \
    --i-tree $data_dir/phylogeny/sepp-tree.qza \
    --m-feature-metadata-file $data_dir/taxonomy/taxonomy.qza \
    --o-visualization $data_dir/phylogeny/sepp-tree.qzv

[32mSaved Visualization to: ../data/phylogeny/sepp-tree.qzv[0m
[0m

In [6]:
Visualization.load(f'{data_dir}/phylogeny/sepp-tree.qzv')