## Create Report Ready Phylogenetic Tree from Newick
### Example Notebook

#### Summary
Using the Bio.Phylo module we will import a phylogenetic tree in Newick format, find the longest branch length, colour it red, output plot using matplotlib. Longest length from root to sample measured in nucleotides substitutions per site.

#### Data
Example data: 
*tree.nhx*
* Sequenced _Vibrio cholera_ strains from [2010 Haitian outbreak](https://wwwnc.cdc.gov/eid/article/17/11/11-0794_article#tnF1)

Modify _TREEFILE_ below to use different set

#### Imports and Parameters
<!---
Note that the parameter included below is what is necessary to be changed in order to run this on other data sets.
-->

<!---
Use !ls data/ to list the files available
-->

In [None]:
# Run this command as a system command in a code cell if necessary: eg. "no module name Bio"
# !pip install biopython


In [None]:
# Import statements and data
from Bio import Phylo
from Bio.Phylo.PhyloXML import Phylogeny
import matplotlib.pyplot as plt


# Parameter for user to change to HCtreefull.nhx
# A newick format phylogenetic tree

TREEFILE = "data/tree.nhx"



# Quick preview of tree
# print("Preview Newick Tree\n")
# f = open(TREEFILE, 'r')
# f_contents = f.read()
# print(f_contents)
# f.close()

In [None]:
# Read in tree
tree = Phylo.read(TREEFILE, "newick")

# Convert to Phyloxml to allow inclusion of tree attributes beyond what Newick can do
tree = tree.as_phyloxml()
tree = Phylogeny.from_tree(tree)

print("Number of leaves in tree: ", len(tree.get_terminals()), "\n")
print(tree)

In [None]:
# Use biopython functions to run from the leaves to the root find the longest path
depths = tree.depths()

farthestleaf = sorted(depths, key = depths.get)[-1].name
longestpath = sorted(depths, key = depths.get)[-1].branch_length

In [None]:
# Colour the tree accordingly: highlight the longest path
tree.root.color = "gray" 
longest = tree.common_ancestor({"name": farthestleaf})
longest.color = "red"

In [None]:
# This is an example of notebook magics. Something to look up later from the handouts.
%matplotlib inline 
tree.rooted = True

# Draw using matplotlib
plt.rcParams["figure.figsize"] = (15,15)
Phylo.draw(tree)

In [None]:
# Print out a result
print("The longest path from root to leaf is:", longestpath, "nucleotide substitutions per site.\nand the sameple there is:", farthestleaf)