# BMI/CS 576 Fall 2019 - HW3
The objectives of this homework are to practice phylogenetic tree reconstruction algorithms.  Specifically, you will gain practice with the following techniques:

* Neighbor-joining
* branch and bound
* unweighted and weighted parsimony

## HW policies
Before starting this homework, please read over the [homework policies](https://canvas.wisc.edu/courses/167969/pages/hw-policies) for this course.  In particular, note that homeworks are to be completed *individually*.

You are welcome to use any code from the weekly notebooks in your solutions to the HW.

## PROBLEM 1: Neighbor-joining (20 points)

Construct a tree from these distances using the neighbor-joining algorithm. Show your updated distance matrix after each merge and give branch lengths for the final tree.

|       | A | B | C | D | E |
|-------|---|---|---|---|---|
| **A** |   | 7 | 13|  9| 10| 
| **B** |   |   |  8|  4|  5|
| **C** |   |   |   |  6|  9|
| **D** |   |   |   |   |  5|
| **E** |   |   |   |   |  &nbsp; |


In [1]:
import toytree, itertools

def print_matrix(d):
    print('', end="\t")
    for i in L:
        print(i, end="\t")
    print()
    for i in L:
        print(i, end="\t")
        for j in L:
            v = d.get((i, j), d.get((j, i))) if i != j else 0
            print(round(v, 2), end="\t")
        print()

d = {
    ('A', 'A'): 0, ('A', 'B'): 7, ('A', 'C'): 13, ('A', 'D'): 9, ('A', 'E'): 10, 
    ('B', 'A'): 7, ('B', 'B'): 0, ('B', 'C'): 8, ('B', 'D'): 4, ('B', 'E'): 5, 
    ('C', 'A'): 13, ('C', 'B'): 8, ('C', 'C'): 0, ('C', 'D'): 6, ('C', 'E'): 9, 
    ('D', 'A'): 9, ('D', 'B'): 4, ('D', 'C'): 6, ('D', 'D'): 0, ('D', 'E'): 5, 
    ('E', 'A'): 10, ('E', 'B'): 5, ('E', 'C'): 9, ('E', 'D'): 5, ('E', 'E'): 0
}

L = sorted({e1 for (e1, e2) in d})

for iteration in range(1, len(L) - 1):
    print(f"======================== Iteration {iteration} ========================")
    r = {e : sum(d[(e, k)] for k in L if e != k) / (len(L) - 2) for e in L}
    D = {(i, j) : d[(i, j)] - r[i] - r[j] for (i, j) in itertools.combinations(L, r=2)}
    (i, j) = min(D, key=D.get)
    k = i + j
    
    d[(i, k)] = (d[(i, j)] + r[i] - r[j]) / 2
    d[(j, k)] = (d[(i, j)] + r[j] - r[i]) / 2
    d.update({(m, k) : (d[(i, m)] + d[(j, m)] - d[(i, j)]) / 2 for m in L if m not in (i, j)})
    d.update({tuple(reversed(e)) : d[e] for e in d})
    
    print("Actual Distance: ")
    print_matrix(d)
    
    print()
    print("r: ")
    for e in r:
        print(f"r({e}) = {str(round(r[e], 2))}") 
        
    print()
    print("Corrected Distance: ")
    print_matrix(D)
    
    print()
    print(f"Merge {i} and {j} to a new node {k}")
    print(f"d({i}, {k}) = {d[(i, k)]}")
    print(f"d({j}, {k}) = {d[(j, k)]}")
    
    L = list([e for e in L if e not in (i, j)] + [k])
    print()
    
tree = "(E:3.0000,(A:6.0000,B:1.0000):1.0000,(C:5.0000,D:1.0000):1.0000);"

print("======================== Final Tree ========================")
print(f"The final tree is {tree}")
_ = toytree.tree(tree).draw(use_edge_lengths=True, scalebar=True)

Actual Distance: 
	A	B	C	D	E	
A	0	7	13	9	10	
B	7	0	8	4	5	
C	13	8	0	6	9	
D	9	4	6	0	5	
E	10	5	9	5	0	

r: 
r(A) = 13.0
r(B) = 8.0
r(C) = 12.0
r(D) = 8.0
r(E) = 9.67

Corrected Distance: 
	A	B	C	D	E	
A	0	-14.0	-12.0	-12.0	-12.67	
B	-14.0	0	-12.0	-12.0	-12.67	
C	-12.0	-12.0	0	-14.0	-12.67	
D	-12.0	-12.0	-14.0	0	-12.67	
E	-12.67	-12.67	-12.67	-12.67	0	

Merge A and B to a new node AB
d(A, AB) = 6.0
d(B, AB) = 1.0

Actual Distance: 
	C	D	E	AB	
C	0	6	9	7.0	
D	6	0	5	3.0	
E	9	5	0	4.0	
AB	7.0	3.0	4.0	0	

r: 
r(C) = 11.0
r(D) = 7.0
r(E) = 9.0
r(AB) = 7.0

Corrected Distance: 
	C	D	E	AB	
C	0	-12.0	-11.0	-11.0	
D	-12.0	0	-11.0	-11.0	
E	-11.0	-11.0	0	-12.0	
AB	-11.0	-11.0	-12.0	0	

Merge C and D to a new node CD
d(C, CD) = 5.0
d(D, CD) = 1.0

Actual Distance: 
	E	AB	CD	
E	0	4.0	4.0	
AB	4.0	0	2.0	
CD	4.0	2.0	0	

r: 
r(E) = 8.0
r(AB) = 6.0
r(CD) = 6.0

Corrected Distance: 
	E	AB	CD	
E	0	-10.0	-10.0	
AB	-10.0	0	-10.0	
CD	-10.0	-10.0	0	

Merge E and AB to a new node EAB
d(E, EAB) = 3.0
d(AB, EAB) = 1.0



## Trees for Problem 2 and 3
![parsimony_trees](parsimony_trees.png)

## PROBLEM 2: Unweighted parsimony (20 POINTS)

Suppose we are given five DNA sequences (1, 2, 3, 4, and 5), each of which is one base long. The figure above gives two possible trees relating these five sequences.

**(a)** For each of the two trees, find the minimal cost of the tree using Fitch’s algorithm (unweighted parsimony). Show the intermediate computations of the algorithm.

**(b)** For each of the two trees, determine an assignment of ancestral bases that achieves the minimal cost that you found in (a).

**(c)** Which tree would be preferred when using unweighted parsimony?

In [2]:
import toytree

newicks = ["(((C, G), G), (T, T));", "((G, G), (T, (C, T)));"]

def fitch_score_and_min_cost_states(newick):
    tree = toytree.tree(newick)
    R = {}
    num_changes = 0
    for node in tree.treenode.traverse("postorder"):
        if node.is_leaf():
            R[node.name] = {node.name}
        else:
            left_states, right_states = [R[child.name] for child in node.children]
            states_intersection = left_states & right_states
            if states_intersection:
                R[node.name] = states_intersection
            else:
                R[node.name] = left_states | right_states
                num_changes += 1
    for node in tree.treenode.traverse():
        node.add_feature('R', "{"+ ",".join(R[node.name]) + "}")
    return tree, num_changes, R


def fitch_ancestral_states(newick, R):
    tree = toytree.tree(newick)
    r = {} # a dictionary mapping node names to character states
    for node in tree.treenode.traverse("preorder"):
        if node.is_root():
            r[node.name] = sorted(R[node.name])[0] # use the lexicographically smallest element
        else:
            parent = node.up
            if r[parent.name] in R[node.name]:
                r[node.name] = r[parent.name]
            else:
                node.add_feature('color', toytree.colors[1])
                r[node.name] = sorted(R[node.name])[0]
    for node in tree.treenode.traverse():
        node.add_feature('r', r[node.name])
    return tree, r

def draw(tree, node_labels='name'):
    tree.draw(
        node_colors=[c or toytree.colors[0] for c in tree.get_node_values("color", show_root=True, show_tips=True)],
        node_labels=tree.get_node_values(node_labels, show_root=True, show_tips=True), 
        tree_style="c", 
        node_markers="r2.5x2",
        scalebar=False,
        use_edge_lengths=False,
        node_labels_style={"font-size": "16px"}
    )
    
def fitch_algorithm(newick):    
    tree, num_changes, R = fitch_score_and_min_cost_states(newick)
    draw(tree, node_labels='R')
    
    tree, r = fitch_ancestral_states(newick, R)
    draw(tree, node_labels='r')

    print(f"The minimum cost of the tree '{newick}' is {num_changes}.")

In [3]:
fitch_algorithm(newicks[0])

The minimum cost of the tree '(((C, G), G), (T, T));' is 2.


In [4]:
fitch_algorithm(newicks[1])

The minimum cost of the tree '((G, G), (T, (C, T)));' is 2.


In [5]:
print("(c): Both trees are equally perferred by unweighted parsimony")

(c): Both trees are equally perferred by unweighted parsimony


## PROBLEM 3: Weighted parsimony (20 POINTS)

(25points) Suppose we are given the same five sequences as in Problem 2 and the same two possible trees. Given the weighted parsimony costs given in the matrix below:

![weighted_parsimony_weights](weighted_parsimony_weights.png)

**(a)** For each of the two trees, find the minimal cost of the tree using the weighted parsimony algorithm. Show the intermediate computations of the algorithm.

**(b)** For each of the two trees, determine an assignment of ancestral bases that achieves the minimal cost that you found in (a).

**(c)** Which tree would be preferred when using weighted parsimony with the given costs?


In [45]:
S = {
    ('A', 'A'): 0, ('A', 'C'): 2, ('A', 'G'): 1, ('A', 'T'): 2, 
    ('C', 'A'): 2, ('C', 'C'): 0, ('C', 'G'): 2, ('C', 'T'): 1, 
    ('G', 'A'): 1, ('G', 'C'): 2, ('G', 'G'): 0, ('G', 'T'): 2, 
    ('T', 'A'): 2, ('T', 'C'): 1, ('T', 'G'): 2, ('T', 'T'): 0, 
}

chars = 'ACGT'

INF = float('inf')

def argmin(g):
    return min(enumerate(g), key=lambda e: e[1])

def weighted_parsimony(newick):
    tree = toytree.tree(newick)
    R = {}
    arrows = {}
    for node in tree.treenode.traverse("postorder"):
        if node.is_leaf():
            for a in chars:
                R[(str(node.idx), a)] = INF if a != node.name else 0
                print(f"R_{node.idx}({a})  = {R[(str(node.idx), a)]}", end=";\t")
            print("\n")
        else:
            j, k = [str(child.idx) for child in node.children]
            for a in chars:
                l_index, l_val = argmin(R[(j, b)] + S[(a, b)] for b in chars)
                r_index, r_val = argmin(R[(k, c)] + S[(a, c)] for c in chars)
                arrows[(node.name, a)] = [chars[l_index], chars[r_index]]
                R[(node.name, a)] = l_val + r_val
                print((f"R_{node.idx}({a})  = " + 
                       "min{" + ", ".join(f"R_{j}({b})+S({a},{b})" for b in chars) + "} + \n\t  " +
                       "min{" + ", ".join(f"R_{k}({c})+S({a},{c})" for c in chars) + "} \n\t" +
                       "= min{" + ", ".join(f"{R[(j, b)] + S[(a, b)]}" for b in chars) + "} + " +
                       "min{" + ", ".join(f"{R[(k, c)] + S[(a, c)]}" for c in chars) + "}" +
                       f" = {l_val} + {r_val} = {R[(str(node.idx), a)]}"))
            print()
    
    index, value = argmin(R[(node.name, a)] for a in chars) 
    print(f"=================Final score for this tree is {value} =================")
    
    for node in tree.treenode.traverse("preorder"):
        if node.is_leaf():
            continue
        
        index, value = argmin(R[(str(node.idx), a)] for a in chars) 
        
        if node.is_root():
            node.name = chars[index]
            
        cl, cr = arrows[(str(node.idx), chars[index])]
        l, r = node.children
        l.name = cl
        r.name = cr
        
    draw(tree)

In [46]:
print("The following tree is used for indexing vertices")
draw(toytree.tree(newicks[0]), node_labels='idx')

The following tree is used for indexing vertices


In [47]:
weighted_parsimony(newicks[0])

R_4(A)  = inf;	R_4(C)  = inf;	R_4(G)  = inf;	R_4(T)  = 0;	

R_3(A)  = inf;	R_3(C)  = inf;	R_3(G)  = inf;	R_3(T)  = 0;	

R_7(A)  = min{R_4(A)+S(A,A), R_4(C)+S(A,C), R_4(G)+S(A,G), R_4(T)+S(A,T)} + 
	  min{R_3(A)+S(A,A), R_3(C)+S(A,C), R_3(G)+S(A,G), R_3(T)+S(A,T)} 
	= min{inf, inf, inf, 2} + min{inf, inf, inf, 2} = 2 + 2 = 4
R_7(C)  = min{R_4(A)+S(C,A), R_4(C)+S(C,C), R_4(G)+S(C,G), R_4(T)+S(C,T)} + 
	  min{R_3(A)+S(C,A), R_3(C)+S(C,C), R_3(G)+S(C,G), R_3(T)+S(C,T)} 
	= min{inf, inf, inf, 1} + min{inf, inf, inf, 1} = 1 + 1 = 2
R_7(G)  = min{R_4(A)+S(G,A), R_4(C)+S(G,C), R_4(G)+S(G,G), R_4(T)+S(G,T)} + 
	  min{R_3(A)+S(G,A), R_3(C)+S(G,C), R_3(G)+S(G,G), R_3(T)+S(G,T)} 
	= min{inf, inf, inf, 2} + min{inf, inf, inf, 2} = 2 + 2 = 4
R_7(T)  = min{R_4(A)+S(T,A), R_4(C)+S(T,C), R_4(G)+S(T,G), R_4(T)+S(T,T)} + 
	  min{R_3(A)+S(T,A), R_3(C)+S(T,C), R_3(G)+S(T,G), R_3(T)+S(T,T)} 
	= min{inf, inf, inf, 0} + min{inf, inf, inf, 0} = 0 + 0 = 0

R_2(A)  = inf;	R_2(C)  = inf;	R_2(G)  = 0;	R_2(T)  = in

In [48]:
weighted_parsimony(newicks[1])

R_4(A)  = inf;	R_4(C)  = inf;	R_4(G)  = 0;	R_4(T)  = inf;	

R_3(A)  = inf;	R_3(C)  = inf;	R_3(G)  = 0;	R_3(T)  = inf;	

R_7(A)  = min{R_4(A)+S(A,A), R_4(C)+S(A,C), R_4(G)+S(A,G), R_4(T)+S(A,T)} + 
	  min{R_3(A)+S(A,A), R_3(C)+S(A,C), R_3(G)+S(A,G), R_3(T)+S(A,T)} 
	= min{inf, inf, 1, inf} + min{inf, inf, 1, inf} = 1 + 1 = 2
R_7(C)  = min{R_4(A)+S(C,A), R_4(C)+S(C,C), R_4(G)+S(C,G), R_4(T)+S(C,T)} + 
	  min{R_3(A)+S(C,A), R_3(C)+S(C,C), R_3(G)+S(C,G), R_3(T)+S(C,T)} 
	= min{inf, inf, 2, inf} + min{inf, inf, 2, inf} = 2 + 2 = 4
R_7(G)  = min{R_4(A)+S(G,A), R_4(C)+S(G,C), R_4(G)+S(G,G), R_4(T)+S(G,T)} + 
	  min{R_3(A)+S(G,A), R_3(C)+S(G,C), R_3(G)+S(G,G), R_3(T)+S(G,T)} 
	= min{inf, inf, 0, inf} + min{inf, inf, 0, inf} = 0 + 0 = 0
R_7(T)  = min{R_4(A)+S(T,A), R_4(C)+S(T,C), R_4(G)+S(T,G), R_4(T)+S(T,T)} + 
	  min{R_3(A)+S(T,A), R_3(C)+S(T,C), R_3(G)+S(T,G), R_3(T)+S(T,T)} 
	= min{inf, inf, 2, inf} + min{inf, inf, 2, inf} = 2 + 2 = 4

R_2(A)  = inf;	R_2(C)  = inf;	R_2(G)  = inf;	R_2(T)  = 

In [10]:
print("(c) The second tree is preferred using weighted parsimony since it has smaller cost")

(c) The second tree is preferred using weighted parsimony since it has smaller cost


## PROBLEM 4: Branch and bound (40 POINTS)

Implement the basic branch and bound algorithm (page 7 of the "Searching through tree space" lecture slides) for finding an optimal tree with an unweighted parsimony objective function.  Your implementation should be broken down into three functions, `best_tree_branch_and_bound`, `branch`, and `bound`, which you are to implement below.

Some implementation specifications for each function:
### `best_tree_branch_and_bound`

* You should use the [Heap queue algorithm](https://docs.python.org/3/library/heapq.html) module functions to efficiently maintain the queue (Q) 
* Q should be sorted by lower bound of the trees, with ties broken by lexicographical ordering of the trees' newick strings.
* The algorithm should begin with an unrooted tree consisting of the first three names in `sequence_names`, and add leaves in the order in which they appear in the sequence_names list.

### `branch`

* This function should call the `add_leaf` function (provided) to grow an unrooted tree with the next leaf in all possible ways

### `bound`

* You will likely want to take advantage of your work in the day 16 notebook for this function
* To convert an unrooted tree to a rooted tree (such that you can use your parsimony scoring code), it is recommended that you call the provided `root` function

In [11]:
# Code for PROBLEM 4
# You are welcome to develop your code as a separate Python module
# and import it here if that is more convenient for you.

import heapq
import toytree

def best_tree_branch_and_bound(alignment, sequence_names, branch, bound):
    """Computes an optimal (lowest scoring) tree using a branch and bound algorithm.
    
    Args:
        alignment: a list of strings corresponding to the rows of a multiple alignment.
        sequence_names: a list of the names of the sequences in the same order as the
                        rows of the multiple alignment.
        branch: a function that grows a partial tree in multiple ways
        bound: a function that computes the lower bound of a partial tree
    Returns:
        A tuple (score, newick_string) where newick_string is a Newick formatted string
        representing the optimal tree (unrooted) and score is its score.
    """
    h = []
    def push(t):
        heapq.heappush(h, (bound(t, alignment, sequence_names), t))
    
    push(f"({','.join(sequence_names[:3])});")
    
    while True:
        score, tree = heapq.heappop(h)
        ntips = toytree.tree(tree).ntips
        if ntips == len(sequence_names):
            return (score, tree)
        for t in branch(tree, sequence_names[ntips]):
            push(t)
    
def branch(newick_tree, next_leaf_name):
    """Grows a partial unrooted tree by adding the next leaf in all possible ways.
    
    Args:
        newick_tree: a partial unrooted tree as a Newick string
        next_leaf_name: the name of the next leaf to add to the tree
    Returns:
        A list of Newick strings representing all possible ways in which to add the next leaf.
    """
    return [add_leaf(newick_tree, next_leaf_name, i) for i in range(1, toytree.tree(newick_tree).nnodes)]

def bound(newick_tree, alignment, sequence_names):
    """Computes a lower bound for the unweighted parsimony score of a full tree that can be 
    grown from a given partial tree.
    
    Args:
        newick_tree: a partial unrooted tree as a Newick string
        alignment: a list of strings corresponding to the rows of a multiple alignment.
        sequence_names: a list of the names of the sequences in the same order as the
                        rows of the multiple alignment.
    Returns:
        The lower bound as an integer.
    """
    tree = toytree.tree(root(newick_tree))
    return score_tree_parsimony(tree, alignment, sequence_names)

def add_leaf(newick_tree, leaf_name, edge_index):
    """Adds a new leaf on to the specified edge in an unrooted tree.
    
    Args:
        newick_tree: a partial unrooted tree as a Newick string
        leaf_name: the name of the next leaf to add to the tree
        edge_index: the index (from 1 to the number of edges) of the edge on
                    which to add the leaf.  Edges are ordered by the order
                    in which their child node is encountered in a preorder
                    traversal of the tree.
    Returns:
        A newick string representing the tree with the added leaf.
    """
    new_tree = toytree.tree(newick_tree)
    if len(new_tree.treenode.children) != 3:
        raise ValueError("Tree does not look unrooted: " +  newick_tree)
    for i, node in enumerate(new_tree.treenode.traverse("preorder")):
        if i == edge_index:
            break
    parent = node.up
    node.detach()
    new_internal_node = parent.add_child()
    new_internal_node.add_child(node)
    new_internal_node.add_child(name=leaf_name)
    return new_tree.treenode.write(format=9)

def root(newick_tree):
    """Converts an unrooted tree into a rooted tree.
    (useful for scoring an unrooted tree with parsimony).
    
    Args:
        newick_tree: an unrooted tree as a Newick string
    Returns:
        A rooted version of the tree as a newick string.
    """
    unrooted_tree = toytree.tree(newick_tree)
    if len(unrooted_tree.treenode.children) != 3:
        raise ValueError("Tree does not look unrooted: " +  newick_tree)
    unrooted_tree_root = unrooted_tree.treenode
    first_child = unrooted_tree.treenode.children[0]
    first_child.detach()
    rooted_tree_root = toytree.TreeNode.TreeNode()
    rooted_tree_root.add_child(first_child)
    rooted_tree_root.add_child(unrooted_tree_root)
    return rooted_tree_root.write(format=9)

In [12]:
def fitch_score_and_min_cost_states(tree, leaf_states):
    """Runs the first stage of Fitch's algorithm for
       the given tree and character states as the leaves.
    
    Args:
        tree: a toytree tree.
        leaf_states: a dictionary mapping leaf names to characters.  
    Returns:
        A two-element tuple, where the first element is the minimum
        cost of the tree (minimum number of changes required to explain
        the leaf data) and second element is a dictionary mapping the
        node names to sets of possible states at the nodes (the R values
        in the algorithm)
    """
    R = {}
    num_changes = 0
    for node in tree.treenode.traverse("postorder"):
        if node.is_leaf():
            R[node.name] = {leaf_states[node.name]}
        else:
            left_states, right_states = [R[child.name] for child in node.children]
            states_intersection = left_states & right_states
            if states_intersection:
                R[node.name] = states_intersection
            else:
                R[node.name] = left_states | right_states
                num_changes += 1
    return num_changes, R

def alignment_leaf_states_list(alignment, sequence_names):
    """Returns a list of dictionaries, where each dictionary corresponds to the leaf states
    for a column of the alignment."""
    return [dict(zip(sequence_names, column)) for column in zip(*alignment)]

def score_tree_parsimony(tree, alignment, sequence_names):
    """Computes the parsimony score for a given tree and alignment.
    
    Args:
        tree: a toytree tree object
        alignment: a list of strings corresponding to the rows of a multiple alignment.
        sequence_names: a list of the names of the sequences in the same order as the
                        rows of the multiple alignment.
    Returns:
        The parsimony score (a number)
    """
    columns = alignment_leaf_states_list(alignment, sequence_names)
    fitch_results = [fitch_score_and_min_cost_states(tree, column) for column in columns]
    column_scores, column_Rs = zip(*fitch_results)
    return sum(column_scores)

## Tests for PROBLEM 4

### Data sets for testing

In [13]:
import fasta
def read_names_and_alignments_from_fasta(filename):
    return zip(*fasta.read_sequences_from_fasta_file(filename))

v3_sequence_names, v3_alignment = read_names_and_alignments_from_fasta("v3_alignment.fasta")

v3_big_sequence_names, v3_big_alignment = read_names_and_alignments_from_fasta("v3_big_alignment.fasta")

medium_num_seqs = 7
v3_medium_alignment = v3_big_alignment[:medium_num_seqs]
v3_medium_sequence_names = v3_big_sequence_names[:medium_num_seqs]

### Tests

In [14]:
# tests for branch3
assert sorted(branch('(D,PA,PB);', 'C1')) == ['(D,PA,(PB,C1));', 
                                              '(D,PB,(PA,C1));', 
                                              '(PA,PB,(D,C1));']
print("SUCCESS: branch passed all tests")

SUCCESS: branch passed all tests


In [15]:
# tests for branch4
assert sorted(branch('(D,PA,(PB,C1));', 'C2')) == ['(D,(PB,C1),(PA,C2));',
                                                   '(D,PA,((PB,C1),C2));',
                                                   '(D,PA,(C1,(PB,C2)));',
                                                   '(D,PA,(PB,(C1,C2)));',
                                                   '(PA,(PB,C1),(D,C2));']
print("SUCCESS: branch4 passed all tests")

SUCCESS: branch4 passed all tests


In [16]:
# tests for branch5
assert sorted(branch('(D,(PB,C1),(PA,C2));', 'C3')) == ['((PB,C1),(PA,C2),(D,C3));',
                                                        '(D,(C1,(PB,C3)),(PA,C2));',
                                                        '(D,(PA,C2),((PB,C1),C3));',
                                                        '(D,(PB,(C1,C3)),(PA,C2));',
                                                        '(D,(PB,C1),((PA,C2),C3));',
                                                        '(D,(PB,C1),(C2,(PA,C3)));',
                                                        '(D,(PB,C1),(PA,(C2,C3)));']
print("SUCCESS: branch5 passed all tests")

SUCCESS: branch5 passed all tests


In [17]:
# tests for bound_one_column
v3_alignment_column0 = [s[0] for s in v3_alignment]
v3_alignment_column3 = [s[3] for s in v3_alignment]
v3_alignment_column29 = [s[29] for s in v3_alignment]
assert bound('(D,PA,PB);', v3_alignment_column0, v3_sequence_names) == 0
assert bound('(D,PA,PB);', v3_alignment_column3, v3_sequence_names) == 1
assert bound('((C1,C2),(PA,PB),D);', v3_alignment_column29, v3_sequence_names) == 1
print("SUCCESS: bound_one_column passed all tests")

SUCCESS: bound_one_column passed all tests


In [18]:
# tests for bound_alignment
assert bound('(D,PA,PB);', v3_alignment, v3_sequence_names) == 25
assert bound('((D, C1),PA,PB);', v3_alignment, v3_sequence_names) == 48
assert bound('((C1,PA),(C2,PB),D);', v3_alignment, v3_sequence_names) == 64
print("SUCCESS: bound_alignment passed all tests")

SUCCESS: bound_alignment passed all tests


In [19]:
class CallCounter:
    def __init__(self, func):
        self._func = func
        self._num_calls = 0
    def __call__(self, *args, **kwds):
        self._num_calls += 1
        return self._func(*args, **kwds)
    def num_calls(self):
        return self._num_calls
    def reset(self):
        self._num_calls = 0

In [20]:
# tests for branch_and_bound_v3
counting_bound = CallCounter(bound)
assert best_tree_branch_and_bound(v3_alignment, v3_sequence_names, branch, counting_bound) == (58, '(D,PA,(PB,(C1,C2)));')
assert counting_bound.num_calls() == 19
print("SUCCESS: branch_and_bound_v3 passed all tests")

SUCCESS: branch_and_bound_v3 passed all tests


In [21]:
# tests for branch_and_bound_v3_medium
counting_bound = CallCounter(bound)
assert best_tree_branch_and_bound(v3_medium_alignment, v3_medium_sequence_names, branch, counting_bound) == (75, '(C09,(C35,(PD1,PD3)),(PA5,(PB6,PB8)));')
assert counting_bound.num_calls() == 133
print("SUCCESS: branch_and_bound_v3 passed all tests")

SUCCESS: branch_and_bound_v3 passed all tests


In [33]:
# tests for branch_and_bound_v3_big
###
### AUTOGRADER TEST - DO NOT REMOVE
###
