# Introduction to toytree

### Learning objectives
This notebook provides an introduction to working with phylogenetic trees in Python. By the end of this notebook you will:

1. Be familiar with the `toytree` Python package.
2. Know how to load and plot phylogenetic trees from newick files.
3. Understand that phylogenetic trees can be represented as ToyTree objects in Python, from which many things can be done, including visualization, modification, analysis, and comparisons.

### Introduction to working with trees in `toytree`

`toytree` is a Python tree plotting library designed for use inside 
jupyter notebooks. In fact, this entire tutorial was created using notebooks, and assumes that you are following along in a notebook of your own. To begin, we will import `toytree`, and the plotting library it is built on, `toyplot`, as well as `numpy` for generating some numerical data. 

In [None]:
import toytree       # a tree plotting library
import toyplot       # a general plotting library
import numpy as np   # numerical library

In [None]:
print("toytree", toytree.__version__)
print("toyplot", toyplot.__version__)
print("numpy", np.__version__)

### Load and draw your first tree
The main Class object in toytree is a `ToyTree`, which provides plotting functionality as well as many useful functions and attributes for returning information and statistics about trees. As we'll see below, you can generate a `ToyTree` object in many ways, but generally it is done by reading in a newick formatted string of text. The example below shows the simplest way to load a `ToyTree` which is to use the `toytree.tree()` convenience function to parse a file, URL, or string.

In [None]:
# load a toytree from a newick file from a public URL
tre = toytree.tree("https://eaton-lab.org/data/Cyathophora.tre")

In [None]:
# root and draw the tree (more details on this later)
rtre = tre.root("prz", regex=True)
rtre.draw(tip_labels_align=True);

### Newick, Nexus, and other tree file formats 
Trees can be flexibly loaded from a range of text formats, including newick, nexus, and various types of extended newick (NHX) format. All of these formats use the basic newick format as a way of representing a phylogenetic tree in text, where relationships are stored as sets of nested parentheses. (We will discuss more details of newick format later.)

Below are two examples of newick format, each representing the same tree topology and branch lengths, but with different types of meta-data stored as internal node labels. The first stores integer values, which are usually a type of *support* measurement, the second has string values, which are likely to be internal name labels.

In [None]:
# newick with edge-lengths & int support values
newick1 = "((a:1,b:1)90:3,(c:3,(d:1, e:1)100:2)100:1)100;"

# newick with edge-lengths & string node-labels
newick2 = "((a:1,b:1)A:3,(c:3,(d:1, e:1)B:2)C:1)root;"

`toytree.tree` will automatically detect whether to store the internal node labels as 'support' or 'name' attributes, or, if they are something else (rarely, but sometimes the case) this can be indicated to the function as an option. This function will return a `ToyTree` class object which we save as a variable, and then plot.

In [None]:
# parse newick, loading internal labels as support values
tre1 = toytree.tree(newick1)
tre1.draw(node_labels="support", node_sizes=25);

In [None]:
# parse newick, loading internal labels as name strings
tre2 = toytree.tree(newick2)
tre2.draw(node_labels="name", node_sizes=25);

### Tree functions and attributes

The `toytree` package is designed to be user-friendly, and to be used interactively. One aspect of this design is to make it easy to learn about objects using tab-completion in a jupyter-notebook. This will show you all of the possible functions or attributes that can be accessed from a particular object. Let's try it with a ToyTree object.

Start by typing in the cell below the name of one of our tree variables from above (`rtre`) followed by a dot (`rtre.`) and then pressing `<tab>`. You should see a small window pop-up listing the many attributes and functions available.

In [None]:
# try out tab-completion on a ToyTree object here.


### Accessing tree information
Many of the attributes and functions of ToyTrees are used to access information about the tree itself, such as how many tips or nodes it has, whether it is rooted, which tips are descended from which nodes, and what their names are, etc.  A few examples are shown below.


In [None]:
rtre.ntips

In [None]:
rtre.nnodes

In [None]:
tre.is_rooted(), rtre.is_rooted()

In [None]:
rtre.get_tip_labels()

In [None]:
rtre.get_edges()

### Drawing trees: basics

When you call `.draw()` on a tree it returns **three** objects, a `Canvas`, a `Cartesian` axes object, and a `Mark`. This follows the design principle of the `toyplot` plotting library on which toytree is based. The Canvas describes the plot space, and the Cartesian coordinates define how to project points onto that space. One canvas can have multiple cartesian coordinates, and each cartesian object can have multiple Marks. This will be demonstrated more later.

As you will see below, I end many toytree drawing commands with a semicolon (`;`), this simply hides the printed return statement showing that the Canvas and Cartesian objects were returned. The Canvas will automatically render in the cell below the plot even if you do not save the return Canvas as a variable. Below I do not use a semicolon and so the three returned objects are shown as text (e.g., `<toyplot.canvas.Canvas...>`), and the plot is displayed. 

In [None]:
rtre.draw()

In [None]:
# the semicolon hides the returned text of the Canvas and Cartesian objects
rtre.draw();

In [None]:
# or, we can store them as variables (this allows more editing on them later)
canvas, axes, mark = rtre.draw()

### Drawing trees: styles
There are innumerous ways in which to style ToyTree drawings. `toytree` also provides a number of pre-built `tree_style` types (normal, dark, coalescent, multitree), and users can also create their own style dictionaries that can be easily reused. Below are some examples. 

In [None]:
# drawing with pre-built `tree_style`s (you can also use `ts` as a shortcut)
rtre.draw(tree_style='n');  # normal-style
rtre.draw(tree_style='d');  # dark-style
rtre.draw(ts='o');          # umlaut-style

In [None]:
# define a custom style dictionary
mystyle = {
    "layout": 'd',
    "edge_type": 'p',
    "edge_style": {
        "stroke": toytree.color.COLORS1[2],
        "stroke-width": 2.5,
    },
    "tip_labels_align": True, 
    "tip_labels_colors": toytree.color.COLORS2[0],
    "tip_labels_style": {
        "font-size": "10px"
    },
    "node_labels": False,
    "node_sizes": 8,
    "node_colors": toytree.color.COLORS1[2],
}

In [None]:
# use your custom style dictionary in one or more tree drawings
rtre.draw(height=400, **mystyle);

### Node data

As we saw briefly before, ToyTrees can store additional data (which we term *features*) on Nodes of a tree. These features can be parsed directly from the input data (e.g., newick file), like in the example earlier where we loaded either 'support' or 'name' data on internal nodes. Or, we can also create and add *any* arbitrary data to Nodes on a tree, and then use those data either for visualization or analyses. 



In [None]:
# data associated tree `rtre` loaded from newick string
rtre.get_node_data()

We can add a new feature to this tree using the `set_node_data` function, which can assign specific values to individual Nodes or a single value as a default to all Nodes. Let's create a feature called 'color' that will take string values. 




In [None]:
# returns a copy of the tree with new data added to Nodes
color_tree = rtre.set_node_data(
    feature="color", 
    mapping={i: 'green' for i in (0, 1, 13, 24)},
    default="red",
)

In [None]:
# see that 'color' is now present in the tree data.
color_tree.get_node_data()

In [None]:
# you can fetch just the color data by entering it to .get_node_data()
color_tree.get_node_data("color")

In [None]:
# and you can pass this as a argument to .draw()
color_tree.draw(
    node_sizes=8,
    node_colors=color_tree.get_node_data("color"),
    node_mask=False,
);

This design in `toytree` of assigning data to Nodes on a tree, and then extracting those data from the tree to enter as arguments when plotting, is much more fool-proof than simply entering a list of colors, or any other data, since you may be uncertain about which order they should be entered, e.g., from root to tips, tips-root, etc. (we'll cover much more on traversal orders later). Thus it greatly reduces errors in your code.


### Subpackages

In addition to visualization `toytree` has many additional uses for working with trees. Many of these functions are located in subpackages that are accessed in `toytree.distance`, `toytree.rtree`, `toytree.mod`, and more. Below these are breifly demonstrated.


In [None]:
# distance: various measures of distances between Nodes or Trees
toytree.distance.get_treedist_rf(rtre, tre)

In [None]:
# rtree: generate random trees under a number of generative methods
random_tree = toytree.rtree.bdtree(ntips=10)
random_tree.draw();

In [None]:
# mod: modify tree relationships or features -- this scales the root height to 1000
random_tree2 = random_tree.mod.edges_scale_to_root_height(1000)
random_tree2.draw(scale_bar=True);

### Conclusion

This was a very brief introduction to the `toytree` Python package. See the complete documentation for further details on the methods shown here, as well as many more. 