# Enum Subpackage Documentation

## Bipartitions

   """Generator of bipartitions (Nodes on either side of edges).

    Bipartitions represent the splits in a tree. Many algorithms compare
    tips (or internal Nodes) on either side of each split to compute
    metrics on trees. This function aims to provide a flexible and fast
    framework for yielding bipartitions in various formats.

    Notes
    -----
    - Bipartitions are generated in Node idx traversal order.
    - Bipartitions are formatted as a tuple of two items, each of
    which is referred to as a partition.
    - The order of partitions, e.g. (part1, part2) can be toggled using
    the argument `sort`.
    - The type used to represent a partition can be toggled using the
    argument `type`. Common formats are `set` or `tuple`.

    Parameters
    ----------
    feature: str
        Feature to return to represent Nodes on either side of a
        bipartition. Default is "name". None will return Node objects.
        Any other Node feature, such as "idx" can also be used. Note
        the feature arg does not affect the order in which partitions
        or bipartitions are returned/sorted (see `sort` argument below).
    include_singleton_partitions: bool
        If True then singleton splits (e.g., (A | B,C,D)) are included
        in the result. By default these are excluded since it is
        implicit that one exists for every tip Node in a tree.
    include_internal_nodes: bool
        Default is to only show tip Nodes on either side of a
        bipartition, but internal Nodes can be included as well. In
        this case the results are easier to interpret if the returned
        values have unique features (e.g., feature=None or 'idx').
    type: Callable
        The type of collection used to represent a partition. Default
        is `set` to return a tuple of sets, but another useful option
        is `tuple`, which returns a tuple of tuples. The latter
        collection can be converted into a set of bipartitions.
    sort: bool
        If False, bipartitions are returned as (child, parent) order
        given the topology and rooting in Node idx order traversal. If
        sort=True, bipartitions are instead always sorted first by len,
        e.g., (fewer, longer) and if the same len, then next by the
        lowest alphanumeric tip name, e.g., ({'a', 'b'}, {'c', 'd'}).
        If the requested partition `type` is sortable (i.e., not a set)
        then items within a partition are also consistently sorted.


### Example - Expressing bipartitions in dataframes

In this example, a `toytree` object taking a simple newick string is split into bipartitions, which are then printed as a `multitree` object.

In [None]:
import toytree

In [None]:
#draw initial tree
tree = toytree.tree("(a,b,((c,d)CD,(e,f)EF)X)AB;")
tree.draw()

(<toyplot.canvas.Canvas at 0x22f43df4740>,
 <toyplot.coordinates.Cartesian at 0x22f440e73b0>,
 <toytree.drawing.src.mark_toytree.ToyTreeMark at 0x22f43926e70>)

In [None]:
#make multitree from bipartitions
bipartitions = tree.enum.iter_bipartitions(feature=None)
bipartitions_list = list(bipartitions)
print(bipartitions_list)

[({<Node(idx=2, name='c')>, <Node(idx=3, name='d')>}, {<Node(idx=0, name='a')>, <Node(idx=5, name='f')>, <Node(idx=1, name='b')>, <Node(idx=4, name='e')>}), ({<Node(idx=5, name='f')>, <Node(idx=4, name='e')>}, {<Node(idx=0, name='a')>, <Node(idx=2, name='c')>, <Node(idx=1, name='b')>, <Node(idx=3, name='d')>}), ({<Node(idx=5, name='f')>, <Node(idx=2, name='c')>, <Node(idx=3, name='d')>, <Node(idx=4, name='e')>}, {<Node(idx=0, name='a')>, <Node(idx=1, name='b')>})]


### Quartets

Generators to sample quartets of tips from a tree.

The primary function `iter_quartets` can be used as a generator to
yield quartet subtrees from a larger tree. This function is quite
fast and includes options for sorting the output, and transforming
its format to return Node objects, names, or any arbitrary feature
of Nodes. See examples.

Methods
-------
Get fast unordered sets of all combinations of 4 tip Nodes in a tree
>>> tree.enum._iter_unresolved_quartet_sets()   # {0, 1, 2, 3}, ...

Get name-ordered tuples of Nodes for each quartet induced by bipartitions in a tree.
>>> tree.enum.iter_quartets()                   # ((0, 1), (2, 3)), ...

See Also
--------
Get number of quartets induced by the splits in a tree.
>>> tree.enum.get_n_quartets()                  # 5

Format
------
Quartets represent a sample from a bipartition or quadripartition
where there is a split, e.g. `AB|CD`, separating to sets of items.
The order of the items within each partition of the quartet is not
often of interest, but it is nice to have a consistent sort option in
case it is useful.

Supported:
- ({'a', 'b'}, {'c', 'd'})  # type=set, collapse=False; sort affects order of p1,p2
- (('a', 'b'), ('c', 'd'))  # type=tuple, collapse=False; sort affects order of p1,p2 and within each p
- ('a', 'b', 'c', 'd')      # type=tuple, collapse=True; same as above, imagine middle split is still there.

Not supported:
- ({'a', 'b', 'c', 'd'})    # type=set, collapse=True; split info lost.
"""

=============================================================


Generator to yield quartets induced by edges in a tree.

    This yields all quartets (4-sample subtrees) that exist within
    a larger tree. The set of possible quartets is not affected by
    tree rooting, but is affected by collapsed edges (polytomies),
    which reduce the number of quartets.

    Quartets are returned as Tuple[Node, Node, Node, Node], or Tuple
    of the requested features of Nodes, where e.g. ('a', 'b', 'c', 'd')
    implies the quartet `ab|cd`. The order in which quartets are
    yielded depends on the topology and rooting, and is in Node idx
    traversal order, where the first two Nodes are below the edge, and
    the second two above. This can be changed to a consistent name
    sorted order for each split partition using `sort=True`.

    Parameters
    ----------
    feature: str
        Feature used to represent Nodes on either side of bipartitions.
        Default is "name". None will return Node objects. Other Node
        features can be used but be aware if using quartets to compare
        among trees that 'idx' changes depending on topology, and other
        features may not be unique among Nodes.
    type: Callable
        The type of collection used to represent a partition. Default
        is `set` to return a tuple of sets, but another useful option
        is `tuple`, which returns a tuple of tuples.
    sort: bool
        If False, quartets are returned with Nodes spanning edges as
        (below, below, above, above) in idx traversal order given the
        topology and rooting. If sort=True, partitions are instead
        always sorted alphanumerically within and between partitions.
    collapse: bool
        If True then quartets are returned as a single tuple, e.g.,
        (0, 1, 2, 3), else they are returned as a tuple of tuples,
        e.g., ((0, 1), (2, 3)). In either case, the induced split is
        implied to occur in the middle, e.g., 0,1 vs 2,3. Collapse arg
        cannot be combined with type=set.
    quadripartition: bool
        If True then quartets are only returned that are induced by
        quadripartitite splits in a the tree. This is a subset of the
        quartets induced by bipartitions, since the tip Nodes must come
        from four different clades from each edge/split.

In [None]:
#asdf

### Quadripartitions  



In [None]:
#asdf

asdf

In [None]:
#asdf

# Mod subpackage documentation

### Merging nodes  

`merge_nodes()` takes a user-inputted merge method and selection method and uses it to discard at least one tip and one internal Node while keeping one child Node. The remaining child Node inherets its parent's distance.

The `merge_method=` argument has two kinds of possible inputs:  
1. A function that returns `True` if a node _should_ be merged
2. A feature name/value for which a Node will be merged if _all_ descendant leaves share the _same_ feature name/value  

The `selection_method=` argument takes a function that returns a _single_ `Node` from a collection of Nodes. By default, this uses the `min()` function, returning the lowest indexed Node. 

In [None]:
import toytree  

tree = toytree.rtree.unittree(5, seed=123)
tree1 = tree.mod.add_internal_node_and_child("r1", name="r1")
# merge nodes with identical leaf names.
tree2 = tree1.mod.merge_nodes("name")
# more verbose example to do the same
merge_method = lambda x: len(set(x.iter_leaf_names())) == 1
tree2 = tree1.mod.merge_nodes(merge_method)
toytree.mtree([tree1, tree2]).draw();

AttributeError: 'TreeModAPI' object has no attribute 'merge_nodes'

# Distance Subpackage Documentation

# Distance & Dissimilarity Functions  




The toytree _.distance_ subpackage has two main purposes: (1) to provide the user with efficient methods to measure or describe paths between nodes in a tree, and (2) to provide many methods of describing dissimilarities between two trees. All dissimilarity metrics currently implemented are quantified by quartet and bipartition differences, which are explained in  [tree distances](/toytree/tree-distance/). 




## Node-level distances  

The functions provided to study node-level distances are generally provided as `get_` and `iter_` functions. `get_` functions return paths or distances as tuples, dictionaries, or matrices while `iter_` functions are iterable generators. All currently implemented node-level distance functions are shown with examples below.  

Distances can generally be described by `patristic distance` (default), or the sum of the lengths of edges in the shortest path between two nodes, or by `toplogical distance`, or simply the number of edges separating two nodes. For topological distance, use `toplogy_only=True`

### Node Paths

In [1]:
import toytree

#generate random topology with 16 tips
tree = toytree.rtree.rtree(ntips=16)

#draw to show all internal nodes
tree.draw(ts = 's', tip_labels = False, node_labels = 'idx');

`get_node_path` returns a list of Nodes connecting two queried Nodes of a tree(including at ends).

In [2]:
toytree.distance.get_node_path(tree, 15, 0)

(<Node(idx=15, name='r15')>,
 <Node(idx=29)>,
 <Node(idx=30)>,
 <Node(idx=28)>,
 <Node(idx=23)>,
 <Node(idx=21)>,
 <Node(idx=18)>,
 <Node(idx=17)>,
 <Node(idx=16)>,
 <Node(idx=0, name='r0')>)

And `iter_node_path` is the iterative generator version.

In [3]:
from toytree.distance import iter_node_path

for node in iter_node_path(tree, 15, 0):
    print(node.idx)

15
29
30
28
23
21
18
17
16
0


### Node Distances

In [4]:
#Newick string generated in R with phylomaker_v2
newick = "(((Sambucus_nigra:112.340729,(Arctostaphylos_viscida:1.761115,Arctostaphylos_patula:1.761115):110.579613)mrcaott248ott650:11.393508,((Lupinus_sparsiflorus:112.701196,(((Ceanothus_leucodermis:4.464401,Ceanothus_cuneatus:4.464401):46.93409,(Frangula_rubra:10.957388,Rhamnus_ilicifolia:10.957388):40.441103):59.749516,(Quercus_douglasii:11.776698,Quercus_wislizeni:11.776699):99.371309)mrcaott371ott2511:1.553188)mrcaott371ott579:5.877408,Aesculus_californica:118.578604)mrcaott2ott96:5.155633)Pentapetalae:201.315791,Pinus_sabiniana:325.050028)Spermatophyta;"

#generate ToyTree from Newick string
tree = toytree.tree(newick)

tree.draw('s');


Yi Jin, Hong Qian,
V.PhyloMaker2: An updated and enlarged R package that can generate very large phylogenies for vascular plants,
Plant Diversity,
Volume 44, Issue 4,
2022,
Pages 335-339,
ISSN 2468-2659,
https://doi.org/10.1016/j.pld.2022.05.005.

`get_node_distance` returns the patristic distance (sum of distances belonging to each edge in shortest path) between two Nodes on a ToyTree.

In [5]:
toytree.distance.get_node_distance(tree, 15, 17)

199.561928

In [6]:
toytree.distance.get_node_distance(tree, 15, 17, topology_only= True)

3

`get_node_distance_matrix` returns the pairwise distance matrix for every node in the tree. The user can also use `get_internal_node_distance_matrix` and `get_tip_distance_matrix` for more specific distance matrices.  

A matrix is returned as a np.ndarray with rows and columns ordered by Node int idx labels, or as a pd.DataFrame (`df=True`) with row and column names as str Node names for leaf Nodes and idx labels for internal Nodes.

In [9]:
tree.distance.get_internal_node_distance_matrix(df= True, topology_only=True)

Unnamed: 0,12,13,14,15,16,17,18,19,20,21,22
12,0,1,7,7,6,6,5,4,3,2,3
13,1,0,6,6,5,5,4,3,2,1,2
14,7,6,0,2,1,3,2,3,4,5,6
15,7,6,2,0,1,3,2,3,4,5,6
16,6,5,1,1,0,2,1,2,3,4,5
17,6,5,3,3,2,0,1,2,3,4,5
18,5,4,2,2,1,1,0,1,2,3,4
19,4,3,3,3,2,2,1,0,1,2,3
20,3,2,4,4,3,3,2,1,0,1,2
21,2,1,5,5,4,4,3,2,1,0,1


`get_descendant_dists` returns a dictionary with {Node: dist} pairs of all descendants relative to a queried node. Without a queried node, all descendants/distances are relative to the root node. Values
are generated in "preorder" traversal order (left then right). 

An iterable generator `iter_descendant_dists` is also provided.

In [11]:
tree.distance.get_descendant_dists(18)

{<Node(idx=18, name='mrcaott371ott2511')>: 0,
 <Node(idx=16)>: 59.749516,
 <Node(idx=14)>: 106.683606,
 <Node(idx=4, name='Ceanothus_leucodermis')>: 111.14800699999999,
 <Node(idx=5, name='Ceanothus_cuneatus')>: 111.14800699999999,
 <Node(idx=15)>: 100.190619,
 <Node(idx=6, name='Frangula_rubra')>: 111.14800699999999,
 <Node(idx=7, name='Rhamnus_ilicifolia')>: 111.14800699999999,
 <Node(idx=17)>: 99.371309,
 <Node(idx=8, name='Quercus_douglasii')>: 111.14800699999999,
 <Node(idx=9, name='Quercus_wislizeni')>: 111.148008}

`get_farthest_node` returns the farthest Node from a selected Node and `get_farthest_node_distance` returns the distance between the two.

In [12]:
node = tree.distance.get_farthest_node(11)
dist = tree.distance.get_farthest_node_distance(11)
print(node, dist)

<Node(idx=0, name='Sambucus_nigra')> 650.100056


## Tree-level dissimilarities  

This set of functions computes tree similarity metrics based on quartets. All tree similarity matrics revolve around the following terms: 

`Q` = Total possible quartets  
`S` = Resolved in the same way between the two trees  
`D` = Resolved differently between the two trees  
`R1` = Unresolved in tree 1, resolved in tree 2  
`R2` = Unresolved in tree 2, resolved in tree1  
`U` = unresolved in both trees  
`N` = S + D + R1 + R2 + U

In [14]:
import toytree

t1 = toytree.rtree.rtree(ntips=6, seed=123)
t2 = toytree.rtree.rtree(ntips=6, seed=321)

t1.draw('s');
t2.draw('s');

## Tree-level dissimilarities  

In order to quantify the difference between two trees, these methods decompose trees into sets of bipartitions or quartets and measure differences based on these sets. In order to quickly show an overview of the different distance scores, use `get_treedist_quartets`. This overview shows all tree distances based on quartet metrics where: 

$Q =$ Total possible quartets
$S =$ Resolved in the same way between the two trees
$D =$ Resolved differently between the two trees  
$R1 =$ Unresolved in tree 1, resolved in tree 2  
$R2 =$ Unresolved in tree 2, resolved in tree 1  
$U =$  Unresolved in both trees  
$N = S + D + R1 + R2 + U$

with arguments (tree1, tree2, similarity=False). When similarity=True, scores are shown as similarity scores (1-distance)  

Using these metrics, `get_treedist_quartets` also shows a list of calculated scores. Descriptions of these scores can be found in the paper below:  

_Estabrook GF, McMorris FR, Meacham CA (1985). “Comparison of undirected
  phylogenetic trees based on subtrees of four evolutionary units.”
  Systematic Zoology, 34(2), 193--200. doi:10.2307/2413326 ._


In [3]:
import toytree

tree1 = toytree.rtree.rtree(ntips=10, seed=123)
tree2 = toytree.rtree.rtree(ntips=10, seed=321)
tree1.draw('s')
tree2.draw('s')
toytree.distance.get_treedist_quartets(tree1, tree2)


Q                              210.000000
S                              107.000000
D                              103.000000
U                                0.000000
R1                               0.000000
R2                               0.000000
N                              210.000000
do_not_conflict                  0.490476
explicitly_agree                 0.490476
strict_joint_assertions          0.490476
semistrict_joint_assertions      0.490476
steel_and_penny                  0.490476
symmetric_difference             0.490476
symmetric_divergence             0.019048
similarity_to_reference          0.490476
marczewski_steinhaus             0.658147
dtype: float64

Note: For reference, these two trees will be used for the rest of this notebook's examples.


### Robinson-Foulds distance  

The Robinson-Foulds (RF) distance is a metric that measures the normalized* count of bipartitions induced by one tree, but not the other tree. In other words, it is the symmmetric difference between two bipart sets divided by the total number of bipartitions in both sets. ___Larger_ values indicate that the two trees are _more_ different__  

*To show the normalized score, use `normalize=True`

In [9]:
normalized = toytree.distance.get_treedist_rf(tree1, tree2, normalize=True)
default = toytree.distance.get_treedist_rf(tree1, tree2)
print(' normalized: ', normalized, '\n','default: ',default)

 normalized:  0.8571428571428571 
 default:  12


### Information-corrected Robinson-Foulds distance  

The information-corrected Robinson-Foulds distance (RFI) measures the sum of the `phylogenetic information` of edges taht are different between two trees. `Information` is calculated as the __probability that a randomly sampled binary tree of the same size contains the split.__ Splits that contain less information (e.g.m a cherry vs a deep split) are more likely to arise by chance, and thus contribute less to the metric.  

`normalize=True` normalizes the score relative to the sum of phylogenetic information present in both subtrees.



In [10]:
normalized = toytree.distance.get_treedist_rfi(tree1, tree2, normalize=True)
default = toytree.distance.get_treedist_rfi(tree1, tree2)
print(' normalized: ', normalized, '\n','default: ',default)

 normalized:  0.8944865320126851 
 default:  66.2410417642415


### Generalized Robinson-Foulds Matching Split distance  



In [14]:
normalized = toytree.distance.get_treedist_rfg_ms(tree1, tree2, normalize=True)
default = toytree.distance.get_treedist_rfg_ms(tree1, tree2, normalize=False)
print(' normalized: ', normalized, '\n','default: ',default)

⚠️ toytree | treedist_utils:get_trees_matching_split_dist | no normalization method for matching split distance.


 normalized:  15.0 
 default:  15.0


In [None]:

# toytree.distance.get_treedist_rfg_mci(tree1, tree2)
# toytree.distance.get_treedist_rfg_ms(tree1, tree2)
# toytree.distance.get_treedist_rfg_msi(tree1, tree2)
# toytree.distance.get_treedist_rfg_spi(tree1, tree2)

# Annotate subpackage Documentation

The `.annotate` subpackage offers a cleaner, more readable way to edit a tree than modifying each argument in the `.draw()` method. The funcitons provided in this subpackage work by adding marks on top of the most recent `canvas` created by Toytree. It can be accessed direclty from a tree object (e.g., tree.annotate.{function}()), but some functions require additional arguments to specify axes, styles, etc.  

Since this subpackage contains very simple modifications that can be quickly added on top of existing trees, we encourage you to share with us any functions you make that may fit in this subpackage! These can be shared via Github at https://github.com/eaton-lab/toytree/discussions

## Node and edge marks/labels 

- add_edge_labels
- add_edge_markers
- add_node_markers
- add_node_labels
- add_tip_markers
- add_node_bars

## Axes annotations

- add_axes_box_outline
- add_axes_scale_bar

## Data/graph annotations

- add_node_pie_charts
- add_edge_pie_charts