### Exercise: Working with the newick tree structure

This is exercise uses the Python programming language and a simple package called `toytree` which is able to read and write newick formatted data files and draw tree visualizations. 

In [1]:
import toytree 

### 1. We can define a tree by writing a newick string
The text below describes a tree in *newick* format. When researchers are working with phylogenetic trees as their data, this is the type of data they are working with. It is simply text. This format can contain just the relationships, described by nested parentheses like below, or it can contain additional information such as branch lengths, which we'll see later. 

In [2]:
tree1 = "(gibbon, (orangutan, (gorilla, (chimp, human))));"

### 2. Draw a tree
This is visualization of the tree structure defined above. We pass an argument to the function below to draw the tree with a specific type of style defined using the `ts='s'` argument.

In [3]:
toytree.tree(tree1).draw(ts='s');

### 3. Rotating nodes
Rotating nodes simply changes the order in which the labels at the tips are arranged. However, it does not change the evolutionary relationships. Below we changed the order of human, chimp, and gorilla, but you can see that the common ancestor of human and chimp is still node 5, and the their common ancestor with gorilla is still node 6. 

In [4]:
toytree.tree(tree1).rotate_node(6).draw(ts='s');

### Rotating nodes changes the visualization, but not the tree structure.
This is because the tree structure represents the hierarchy of relationships (clades within clades). Below we rotate several more nodes and print the resulting newick structure. As you can see, it does not change. 

In [5]:
toytree.tree(tree1).rotate_node(6).write(fmt=9)

'(gibbon,(orangutan,(gorilla,(chimp,human))));'

In [6]:
toytree.tree(tree1).rotate_node(7).write(fmt=9)

'(gibbon,(orangutan,(gorilla,(chimp,human))));'

In [7]:
toytree.tree(tree1).rotate_node(8).write(fmt=9)

'(gibbon,(orangutan,(gorilla,(chimp,human))));'

### Edge length information (Divergence times, i.e., ages of clades)
Additional information such as the ages of clades is easy to include in newick. Below you can see that the lengths of branches are simply numeric values placed next to parentheses or tips (nodes of the tree). Below we use a different tree style for plotting (`ts='n'`) since this style will show branch length differences. 

In [8]:
tree2 = "(gibbon:3, (orangutan:2, (gorilla:1, (chimp:0.25, human:0.25):0.75):1):1);"
toytree.tree(tree2).draw(ts='n', scalebar=True);

### What do edge lengths mean?
Sometimes they represent the ages of clades, in which case all of extant tips of the tree should align at time=0. Alternatively, though, the edge lengths of a tree can represent the amount of change observed between samples. For example, below we could imagine that the edge lengths represent the amount of DNA sequence divergence between samples. If one sample has a higher rate of substitution than another then the edge lengths will not necessarily align at zero. This information if informative about rates, but not so much about time. Converting rate estimates to divergence time estimates will be covered later in class. 

In [9]:
tree2 = "(gibbon:0.03, (orangutan:0.02, (gorilla:0.01, (chimp:0.0075, human:0.0025):0.0075):0.001):0.001);"
toytree.tree(tree2).draw(ts='n', scalebar=True);

### Challenge: 
Write a newick string for the relationships of six imaginary taxa and plot it. Next, try to add branch lengths to the tree. Hint: just like in the code above, you need to store the newick string as a variable (e.g., tree1) and then load it with toytree using the `toytree.tree()` command, followed by the `.draw()` command. 