Skip to content

Computing the likelihood of a tree

ddarriba edited this page Mar 14, 2016 · 13 revisions

This page contains the following sections

  • Computing Conditional Likelihood Vectors (CLV)
  • Computing the likelihood at a rooted tree
  • Computing the likelihood at an unrooted tree

Computing Conditional Likelihood Vectors (CLV)

Let's use an example for computing the likelihood of a rooted tree:

rooted tree

We assume that lineages evolve independently under the same stochastic process.

For every single site, the likelihood of the tree is the product of the likelihoods of base substitution through each branch in the the tree, times the prior probability of the state at the root node (i.e., the stationary frequency of the state).

likelihood complete

However, in practice we do not know the states at the inner nodes of the tree. Hence the likelihood will be the sum over all possible state assignments in those unknown positions.

In a recursive way, we fill the CLV at each node:

CLV node

Given an inner node u, with children v and w, we compute the conditional likelihood at a specific site for the rate r and state xi as:

CLV node

At the tip nodes, the states are known, and therefore the recursion ends. The CLV at the tip nodes are filled as explained in Setting CLVs at tips from sequences and maps

In the example above, the post-order traversal gives us the following order of operations: {0,1,5,2,6,3,4,7,8}. As long as we compute the CLVs in that order, all the terms in the CLV are already available. Since the CLVs at the tips are filled in advance, we apply the CLV formula to the inner nodes in the following order: {5,6,7,8}

In libpll, we have to create the set of operations according to the post-order traversal:

{0,1,5},{5,2,6},{3,4,7},{6,7,8}, where each triplet contains the following values: {child1_clv, child2_clv, parent_clv}. Also in the operations we need to specify the matrix indices corresponding to the branches connecting the children with the nodes:

operations[0].child1_clv_index    = 0;
operations[0].child2_clv_index    = 1;
operations[0].parent_clv_index    = 5;
operations[0].child1_matrix_index = 0;
operations[0].child2_matrix_index = 1;

operations[1].child1_clv_index    = 5;
operations[1].child2_clv_index    = 2;
operations[1].parent_clv_index    = 6;
operations[1].child1_matrix_index = 2;
operations[1].child2_matrix_index = 3;

...

Given that matrix 0 was computed for the branch l0,5 and matrix 1 for l1,5, and so on.

Computing all the CLVs requires as many operations as inner nodes. Hence, 4 in our example. Finally, we update the CLVs following the order we have defined using the post-order traversal:

pll_update_partials(partition, 
                    operations, 
                    4);

Computing the likelihood of a rooted tree

lk rooted

If the tree for which we are computing the likelihood contains a physical root, computing the final likelihood score is quite simple once we have the CLVs of the root node.

For each site and rate, the likelihood is the dot product of the conditional likelihood vector and the stationary frequencies. The per-site likelihood is the average of the per-rate likelihoods, and the final likelihood of the product of the per-site likelihoods.

LK formula

Usually this likelihood score is an extremely low value that is not feasible to compute directly. For that reason, we work with the sum of the per-site log likelihoods instead.

In pll, once we have computed the CLVs for every node, we simply call the function for computing the rooted tree log likelihood in the root node (number 8). We need to specify also the index of the likelihood scaler in that node, and the index of the parameters:

logl = pll_compute_root_loglikelihood(partition,
                                      8,
                                      scaler_index,
                                      parameters_index);

Computing the likelihood of an unrooted tree

unrooted tree

If we do not have a physical root node, the likelihood is computed at a branch b instead (e.g., the branch connecting nodes 6 and 7), defined as the virtual root. The branch splits the tree into two rooted subtrees, where the root nodes are the edges of b. The CLVs must be computed for the two subtrees (rooted at nodes 6 and 7). The order of operations in our example will be {0,1,5,2,6} and {3,4,7} respectively.

lk unrooted

The likelihood at the virtual root is then computed as:

unrooted tree LK