# Object-oriented programming and the ETE3 package

Author: Greg Wray  
2025-APR-08

## A very brief introduction to object-oriented programming

Many modern programming languages incorporate an object-oriented paradigm (**OOP**), including Python, R, Java, and C++. A central concept in OOP is treating data and functions as **objects**. Rather than memorizing the computer science definition, a useful way to think about an object is that data values are associated with specific properties and uses. 

For example, `my_var = [5, 2]` stores several values in memory: one byte indicates that the data type is list, one byte indicates its length, and a subsequent block of memory represents each item in the list, including its position in the list, its data type, and its value. This information is bundled into a single object with a consistent internal organization. Behind the scenes, Python understands that internal organization and knows, for example, that the first byte indicates a data type. It then knows where to find the "data" bytes and how to infer their data type, and only allows operationsconsistent with the properties of an integer (in this case). For example, it knows that these values cannot be divided by zero and that they can be added to another number but not to a string. In addition, Python uses the length value for indexing and to determine how many times to iterate in a functional programming context such as a list comprehension.   

A list object is fairly straightforward object, but the real strength of objects comes with the ability to create a **class**: a definition of a customized object that can include multiple data types and data structures, associated meta-data, rules of use, and dedicated functions and methods. When you create a data object using a class definition, it is called an **instance** of the class. A class definition always includes a constructor function to create instances, and typically also includes additional methods and functions designed to work specifically with instances of that class. For example, `DataFrame` is a class that defines certain properties; when you call the `DataFrame()` constructor function it creates an instance with access to all the attributes, functions, and methods that this class provides. You could produce a data frame using other approaches, but it would not have access to these attributes, functions, and methods; you would need to write code yourself to find (for example) its dimensions. 

Less obviously, functions are also objects in most OOP languages. In computer science terminology, functions are first-class entities. This means, for example, that a function can be passed as input to another function (or even to itself), which can be very useful. 

A key OOP concept is **inheritance**, which is the ability to create new classes that incorporate the features of an existing class(data fields, attributes, and methods), with new capabilities. For example, consider the situation where you create a class to represent a phylogenetic tree, but later want to create a class that adds branch lengths. Inheritance allows you to keep the existing class, while creating a new one with the original features but adding a new data field and associated methods to work with it.  

OOP languages also use data **abstraction**. This means that only certain parts of the data within an object are visible to functions (also known as data hiding). This helps reduce errors, as it prevents functions from accessing information that is not relevant, but the programmer might mistakenly try to incorporate. In addition, OOP uses **encapsulation** to hide the details of how data is organized and manipulated within a class from the broader programming environment. For example, consider the inheritance situation described in the previous paragraph. Creating a new sub-class involves changing the class definition and adding new methods. Encapsulation means that these changes won't break exisiting code that use it.  

## Set-up for hands-on session

Run these code blocks to install the ete3 library and then load it for use. 

In [None]:
pip install ete3

In [None]:
import numpy as np
import ete3
from ete3 import *

## The ETE3 library

ETE is short for *Evolutionary Tree Exploration* version 3. The ETE3 library provides classes and functions for estimating, representing, manipulating, analyzing, and displaying trees and associated data. Access the website [here](https://etetoolkit.org). 

This notebook provides a very basic introduction to ETE3. The goals are: (1) to show how ETE3 works with tree topology and associated data and (2) to demonstrate the value of object-oriented programming paradigms for solving real biological data analysis problems. There are many capabilities of ETE3 that this notebook does *not* cover, the two most important areas being tree reconstruction and data modeling and analysis.  

**Motivation.** Trees are used to represent a wide range of information in biology. The most common application of trees in biology is to represent *phylogenetic history*, including relationships among taxa at any level, trait evolution, ancestral state reconstruction, the history of gene duplications and losses, and more. Even if you are not an evolutionary biologist, trees can be very useful, as they are frequently used to represent *overall similarity* in just about anything of interest to a biologist, including morphologies, ecosystems, experimental treatments, and much more. 

**Tree vocabulary.** Trees consist of a set of nodes connted by branches. Each node can have 0 or more child nodes (by convention "below" the node) and at most 1 parent node (by convention "above" it). Nodes with 0 child nodes are called leaf nodes, while nodes with 1 or more child nodes are internal nodes. A tree may or may not include a root node; when present, it is the only node with 0 parent nodes; all other nodes have exactly 1 parent.   

In [None]:
# why does the ETE3 logo have a silhouette from the 1970s movie ET?
t = Tree("(A,(B,(E,D)));")
t.phonehome()

### ETE3 data structures

The classes that ETE3 uses to represent trees are `Tree` and `Node`. Because every node is essentially a tree in its own right, these classes are effectively synonymous; however, it's more explicit to use `Tree` to represent the entire tree, and `Node` when to represent a sub-tree. In keeping with the OOP paradigm, these classes define several data fields, in this case the topology of the tree, leaf node names, branch lengths, node support, as well as attributes such as the number of nodes, maximum and minimum depth of the tree. These classes also allow data of almost any kind to be attached to nodes, including single values, trait matrices, images, and DNA or protein sequences.  

ETE3 also provides several classes that use the OOP concept of inheritance: they have the properties of the `Tree` base class, but include additional attributes and methods to work with specific kinds of evolutionary trees. The class `PhyloTree` represents phylogenies of taxa, with nodes explicitly considered to be ancestors. The classes `Seqgroup` and `EvolTree` represent phylogenies of sequences rather than taxa. 

ETE3 uses the OOP principles of abstraction and encapsulation to create data objects that are designed to work mostly or entirely behind the scenes. For example, `.children()` returns a list of nodes below the one passed to the method; however the contents of the list is not a sequence of node names, but is instead defined by an internal class that is not intended for use by most programmers.  

### Creating trees
ETE3 provides several ways to create a `Tree` data object.

**Create a tree from scratch.**  It is possible to build a tree entirely by hand. Although this is uncommon, it can be very useful for learning how to manipulate the topology and metadata of a `Tree` object.

In [None]:
# build a tree node by node
t = Tree()                      # create an empty tree
#A = t.add_child(name='A')       # adds a node at the root
#B = t.add_child(name='B')       # adds a node at the root
#C = A.add_child(name='C')       # adds a node as daughter to existing node A
#D = C.add_sister(name='D')      # adds node as sister to existing node C
#E = A.add_child(name='E')       # adds another daughter to existing node A (trifurcations are allowed)
print(t)

In [None]:
# check the data type of your tree
type(t)

When displaying a tree using `print()`, the default is not to show names of internal branches. This behavior can be modifed as shown in the example below.

In [None]:
# display names of internal branches
print(t.get_ascii(show_internal=True))

**Create a random tree.** ETE3 provides `populate()` as a mechanism to quickly produce trees with random topology. This can be useful for permutation tests or just to generate a tree to practice with.

In [None]:
# create an empty tree and then add 6 random branches with names
t = Tree()
leaf_names = ['Mus', 'Bos', 'Cervus', 'Phoca', 'Micropterus', 'Physeter', 'Homo', 'Panthera', 'Felis']
t.populate(9, names_library = leaf_names)
print(t)

**Create a tree from Newick definition string.** In most cases, `Tree` instances are created using the **Newick** format. This is one of the most widely used formats for representing trees. Newick format is simple and flexible. It uses parentheses and commas separating leaf names to represent topology; a semi-colon indicates the end of the specification. Optionally, Newick fromat can also include names for internal branches, as well as lengths and support for branches leading to each node; the last two fields are always present for every node in an instance of the Tree class, with default values of 1.0.  

To create small trees, you can type your own string or copy+paste an existing one. 

In [None]:
# create a tree using a simple Newick definition string
t = Tree('(A,(B,(E,D)));')
print(t)

In [None]:
# create a tree with branch lengths using a Newick definition string
t = Tree('(A:4,(B:1,(E:1,D:3):0.5):0.5);')
print(t)

Note that displaying a tree using `print()` does not show branch lengths, even when they are specified to be different from 1.0. To show branch lengths, see **Viewing trees**, below.

To create a tree with additional attributes, we need to tell the `Tree` creator function to accept a different input format. In this case, the optional argument `format=1` indicates a standard Newick tree with named internal nodes. For a full list of Newick tree formats that ETE3 accepts, see the documentation.

In [None]:
# create a tree with named internal nodes
t = Tree('((((H,K)D,(F,I)G)B,E)A,((L,(N,Q)O)J,(P,S)M)C);', format=1)
print(t.get_ascii(show_internal=True))

**Create a tree from a file.** For larger trees, typing or copy+pasting strings quickly becomes impractical. The `Tree` constructor function can take a file name and path as an argument. The input file must be a plain-text file that contains a single string in Newick format. To save a Tree data object as a file, ETE3 provides the `write()` method. The ability to read and write Newick formats provides way to save and share `Tree` instances as plain text files.

In [181]:
# create a Tree object from a file in Newick format
t1 = Tree('newick_example.nw')
print(t1)


      /-6669.DappuP312785
     |
     |         /-7739.JGI126010
     |      /-|
     |     |   \-7739.JGI126021
     |     |
     |   /-|      /-45351.NEMVEDRAFT_v1g217973-PA
   /-|  |  |   /-|
  |  |  |  |  |  |   /-6085.XP_002167371
  |  |  |  |  |   \-|
  |  |  |   \-|      \-10228.TriadP54105
  |  |  |     |
  |  |  |     |   /-7668.SPU_016633tr
  |  |  |      \-|
  |   \-|         \-7668.SPU_013365tr
  |     |
  |     |      /-51511.ENSCSAVP00000011400
  |     |   /-|
  |     |  |   \-7719.ENSCINP00000035803
  |     |  |
  |     |  |   /-7757.ENSPMAP00000006833
  |     |  |  |
  |      \-|  |      /-7955.ENSDARP00000103018
  |        |  |     |
  |        |  |   /-|   /-8049.ENSGMOP00000003903
  |        |  |  |  |  |
  |        |  |  |  |  |      /-8128.ENSONIP00000009923
  |        |  |  |   \-|   /-|
  |         \-|  |     |  |  |   /-69293.ENSGACP00000005927
  |           |  |     |  |   \-|
  |           |  |     |  |     |   /-99883.ENSTNIP00000008530
  |           |  |   

In [None]:
# create a small tree object and then save it as a plain text file
t = Tree("(A:1,(B:1,(E:1,D:3):0.5):0.5);")
t.write(outfile = 'my_little_tree.nw')
print(t)

In [None]:
# read back your file into a Tree object and display
t = Tree('my_little_tree.nw')
print(t)

### Creating pointers to nodes

To work with specific nodes, we first need to create a **pointer** that tells ETE3 what position in the tree to refer to or operate on. This seems like a redundant step, since the node already has a label, but it is useful for two reasons. First, it allows abstraction (e.g., assignment of different nodes to the same identifier during program execution). Second, pointers are useful because they are not just string, but rather a class with attributes and methods. Once we define a pointer, we can extract information about the node, its parent node, and the sub-tree that it defines. We can also use the pointer to delete the node or sub-tree or use it as an insertion point for adding a node (see Manipulating tree structure, below).

In [None]:
# create a tree with named internal nodes
t5 = Tree('((((H,K)D,(F,I)G)B,E)A,((L,(N,Q)O)J,(P,S)M)C);', format=1)
print(t.get_ascii(show_internal=True))

In [None]:
# create pointers to specific nodes
D = t5.search_nodes(name='D') [0]
B = t5.search_nodes(name='B') [0]
J = t5.search_nodes(name='J') [0]
print(D, '\n', B, '\n', J)

### Getting information about trees and nodes

ETE3 provides several useful attributes, methods, and functions to extract information about `Tree` and `Node` objects. Most of these apply equally to both kinds of objects. For instance:
 * `.name` returns the label of the node
 * `.describe` returns a brief summary of the sub-tree
 * `.up` returns the name of the parent node
 * `.children` returns a List of daughter nodes (empty if none)
 * `.dist` returns the distance to the parent node (default = 1.0)
 * `.support` returns a value representing support for the node (default = 1.0)

The examples below just scratch the surface of the methods available. 

In [None]:
# create a Tree object from file to work with
t5 = Tree('((((H,K)D,(F,I)G)B,E)A,((L,(N,Q)O)J,(P,S)M)C);', format=1)
print(t5.get_ascii(show_internal=True))

In [None]:
# create pointers for all internal nodes
D = t5.search_nodes(name='D') [0]
G = t5.search_nodes(name='G') [0]
B = t5.search_nodes(name='B') [0]
A = t5.search_nodes(name='A') [0]
M = t5.search_nodes(name='M') [0]
O = t5.search_nodes(name='O') [0]
J = t5.search_nodes(name='J') [0]
C = t5.search_nodes(name='C') [0]

In [None]:
# test for root
t5.is_root()

In [None]:
# test for root
D.is_root()

In [None]:
# return the parent node
D.up

In [None]:
# return a list of direct child nodes
J.children [:]

In [None]:
# return summary information about a node and its sub-tree
J.describe()

In [None]:
# return summary information about the entire tree
t5.describe()

### Copying trees

As with mutable objects generally in Python, assigning an existing instance of `Tree` creates an alias that points to the original object. Use the `.copy()` method to create a true copy. Everything in standard Newick format will be copied (topology, leaf node names, branch lengths, and node support). If additional data are attached to nodes, however, they may not be copied using the default setting. To ensure that *all* attached data is faithfully copied, set the optional `method` parameter to `'deepcopy'` (takes longer, but ensures complete copying).

In [None]:
# create a tree to work with
t5 = Tree('example.nw')

In [None]:
# create a copy, then check that the labels, branch lengths, and node support were copied 
duplicated_tree = t5.copy()
duplicated_tree.write(outfile = 'duplicated_tree.nw')

In [None]:
# create a deep copy of a tree (not needed in this case)
duplicated_tree = t5.copy(method='deepcopy')

### Manipulating tree structure

ETE3 provides several methods for manipulating the structure of an existing tree. It is possible to **delete** a single node with or without removing its child nodes. It is also possible to **detach** a leaf or internal node with its child nodes for further use. Finally, it is possible to **add** new nodes at any position. Adding a detached node is analogous to a cut-and-paste operation. 

Note that the operations in the examples below occur *in place*; to preserve the original tree, assign to a new variable name.

In [None]:
# create a tree to work with
t5 = Tree('((((H,K)D,(F,I)G)B,E)A,((L,(N,Q)O)J,(P,S)M)C);', format=1)
D = t5.search_nodes(name='D') [0]
B = t5.search_nodes(name='B') [0]
J = t5.search_nodes(name='J') [0]
print(t5.get_ascii(show_internal=True))

**Deleting single nodes.** The `delete()` method removes a single node. If the node is a leaf, it simply disappears; if it is an internal node, any child nodes are assigned to the nearest surviving node. 

In [None]:
# delete node D
D.delete()
print(t5.get_ascii(show_internal=True))

**Deleting sub-trees.** The `detach()` method deletes a single node with all child nodes. 

In [None]:
# regenerate the tree and pointers
t5 = Tree('((((H,K)D,(F,I)G)B,E)A,((L,(N,Q)O)J,(P,S)M)C);', format=1)
D = t5.search_nodes(name='D') [0]
B = t5.search_nodes(name='B') [0]
J = t5.search_nodes(name='J') [0]
print(t5.get_ascii(show_internal=True))

In [None]:
# delete a sub-tree and assign to an identifier for later use
my_subtree = J.detach()
print(t5.get_ascii(show_internal=True))

In [None]:
# take a look at the deleted sub-tree
print(my_subtree.get_ascii(show_internal=True))

**Add a detached node to a tree.**  Any existing node object can be added to a tree. The default position is the root, but you can use a node pointer to indicate another site of attachment.

In [None]:
# re-attach a deleted node at the root
t5.add_child(my_subtree)
print(t5.get_ascii(show_internal=True))

In [None]:
# re-attach a deleted node at a specified location
my_subtree = B.add_child(my_subtree)
print(t5.get_ascii(show_internal=True))

**Attach a new node.** Adding a new node is similar, but requires that you provide a name. 

In [None]:
# regenerate the original tree and add a single node at the root 
t5 = Tree('((((H,K)D,(F,I)G)B,E)A,((L,(N,Q)O)J,(P,S)M)C);', format=1)
t5.add_child(name='zebra')
print(t5.get_ascii(show_internal=True))

In [None]:
# to add a single node at a specified location, first create a pointer to it
B = t5.search_nodes(name='B') [0]
okapi = B.add_child(name='okapi')
print(t5.get_ascii(show_internal=True))

### Viewing trees

ETE3 provides a tools for rendering trees. The interface is somewhat clunky but it gets the job done. 
                     
To use these features, you first need to generate a style definition object using the `TreeStyle()` generator function. Once created, the definition can be updated. To plot the tree, use `.show()` and pass the the style object.

In [None]:
# generate a tree
t7 = Tree('((((H,K)D,(F,I)G)B,E)A,((L,(N,Q)O)J,(P,S)M)C);', format=1)

In [None]:
# generate a style definition, then plot
ts = TreeStyle()
ts.show_leaf_name = True
ts.mode = "r"                 # rectangular plot
t7.show(tree_style=ts)

In [None]:
# plot a larger tree 
t7 = Tree('example.nw')
t7.show(tree_style=ts)

In [None]:
# plot again in circular form
ts = TreeStyle()
ts.show_leaf_name = True
ts.mode = "c"                 # circular plot
ts.arc_start = 270            # start plotting at 12 o'clock
ts.arc_span = 90             # spread branches over 180 degrees
t7.show(tree_style=ts)