Skip to content

3. Computation

Alberto edited this page Nov 24, 2020 · 6 revisions

3. Computation

This section explains computations performed in WFES in detail.

The main feature of WFES is to compute roes of the fundamental matrix of the Wright-Fisher model. Many properties of interest can be derived from the fundamental matrix. We first describe the calculation applied in WFES Single for a standard WF model.

3.1 Wright-Fisher model

The Wright-Fisher model describes a single bi-allelic locus in a population of fixed size. We denote a as the ancestral allele, and A as the derived, or focal allele. The organisms are diploid, so the total number of chromosomes in a population size N is 2N. Given i copies of the derived allele a at time t, the probability of having j copies in the next generation is:

Equation

Above, ψi is the binomial sampling probability for the number of individuals in the next generation. In the simple case of no mutation or selection, ψi only depends on the current number of copies, ψi = 1/2N. One way to parameterise the model with mutation and selection is:

Equation

Above, ω.. is the selection coefficient for a matricular genotype, μA->a is the backward mutation rate, μa->A is the forward mutation rate. Variables p and q are allele frequencies of A and a respectively: p=i/2N, q=1-p. The denominator is the average fitness of the population.

Equation 2 can be parametrized in an arbitrary manner. We follow Kimura [1964] and assign the following selection coefficients to the genotypes:

Genotype Fitness
AA 1 + s
Aa 1 + sh
aa 1

Above h ∈ [1, 0] is the dominance coefficient, and s ∈ [1, 0] is the selection coefficient. With the above formulation, (2) simplifies to:

Equation

3.2 Fundamental matrix calculation

Equation 1 yields a discrete finite-state Markov chain, with time scale in Wright-Fisher generatios. State i = 0 corresponds to extinction of A, and i = 2N is fixation of A. The model has 2N + 1 states, where i = 0 and i = 2N are absorbing, and the rest are transient. The transition probability matrix P is _(2N + 1) x (2N + 1). The transition probability matrix can be re-ordered ti group the transient-to-transient entries (Q) and transient-to-absorbing (R) entries:

Equation

With two absorbing states, 0 is a 2 x 2 matrix of zeros, I2 is a 2 x 2 identity matrix, and Q is a (2N - 1) x (2N - 1) matrix.

For any absorbing Markov chain, there exists a fundamental matrix N:

Equation

Each entry of Nij is the expected number of generations spent with j copies, given that we started with i copies. Knowing the entries of N allows to write down many useful absorption properties of the Markov chain. For example, the probability of absorbing in state k, conditional on starting with i copies is found as the (i, k)th entry of:

Equation

We can use B to find the expected number of generations in state j, conditional on starting in i and absorbing in k:

Equation

The conditional time to absorption in state k is then:

Equation

And times to extinction and fixation are:

Equation

Those properties are calculated in Wfes Single in absorption mode.

3.3 Solving sparse systems

Solving for the entire matrix N is expensive for large population sizes. However, since Ni,j expresses number of generations spent in a state j conditional on starting in i, we can simplify the calculation by explicitly conditioning on i. For example if we assume that allele A starts with one copy (i = 1), then only the first row, Ni is of interest. We can generalise this, by assuming a finite forward mutation rate v. In this case, for small 4 Nv < 1, there is a non-zero probability that with i ≤ 1. However, this probability vanishes quickly with increasing i, and we then only require several rows of N. See more details in integration.

For a starting number of alleles i, we find the ith row of N:

Equation

where Ii,· is the _ith column of a _(2N - 1) x (2N - 1) identity matrix.

This system can be solved by LU decomposition of (I - Q)T. Once the decomposition is known, we can solve for different right-hand sides of the equation, such as when i ≥ 1.

To find matrix B, we solve:

Equation

where B·,0 is the column of B corresponding to i = 0 extinction. Since we have two absorbing states, we can compute:

Equation

The LU decomposition and solution is performed with MKL PARDISO routines. Parameters and settings for the MKL PARDISO calls can be found in the source code.

3.4 Entire fundamental matrix

If the entire fundamental matrix is required, it can be calculated with WFES Single in the fundamental mode. See Output N and Output V options in WFES Single usage.

3.5 Fixation only

The calculations as stated in the previous section applies to the absorbing mode of WFES Single, where both extinction and fixation states are absorbing. Other possible mode for the computation is fixation, where only the fixation state (i = 2N) is absorbing, and the extinction state (i = 0) is transient. In this case, matrix Q in Equation 4 is 2N x 2N.

If the extinction state is transient, it can be entered and left many times without terminating the Markov chain. This mode makes it easy to calculate Tb-fix - time between fixations - the total time it takes for a new allele to reach fixation (with the possibility of several extinctions along the way). More details of this calculation an applications can be found in [de Koning and de Sanctis, 2018].

The time between fixations, Tb-fix is calculated in a similar way as Tfix for the model with two absorbing states (eq. [6], [7], [8]). However, since there is only one absorbing state, no re-conditioning is required (eq. [7]). Then the Tb-fix is simply:

Equation

An advantage of this calculation is that we can safely assume that allele a starts in i = 0 copies. Then the integration over the starting number of copies is not necessary, since it is explicitly included in the model as transitions from i = 0 to i > 0 copies. This the means that we only need to find a single 0th row of the fundamental matrix.

3.6 Calculating the variance of the time spent in each state is of interest. It can be found as:

Equation

where Ndg is the matrix containing the diagonal of N, and Nsq is N element-wise squared.

If the Output V option is used in the fundamental mode of WFES Single, the entire Nvar matrix will be calculated.

In the fixation mode, the standard deviation of Tb-fix is calculated from the first row of N in Equation 13:

Equation

where N1 and N2 are found by solving:

Equation

3.7 Equilibrium and allele frequencies

The equilibrium distribution of allele frequencies is one of the key properties of the Wright-Fisher model. We use the method described by Paige, Styan and Watcher [1975] to solve for the equilibrium distribution (see also Harrod and Plemmons [1984]) of a non-absorbing Markoc chain. The equilibrium distribution of the Markov chain is defined as vector π, such as πP = π. This can be expressed in matrix form as:

Equation

where Π is a n x n matrix with π in each row. This can be re-written as:

Equation

We also have the constraint that Equation, which can be enforced by setting the last columns of (P-In) and 0n, to en = (1,1,...,1)T. We use the notation r(a) to denote that we set the last column of a to en.

Equation

We only require a single row of Π. Therefore, we can solve for any row Π·,x:

Equation

This equation is solved with the LU decomposition approach.

Note that the matrix P is a 2N + 1 matrix, since the absorbing states are included. This means that we require that forward and backward mutation rates are non-zero. In case where μA->a = 0 or μA->a = 0, the matrix P becomes absorbing, and the equilibrium distribution does not exist.

This calculation is performed by WFES Single in the equilibrium mode. See WFES Single usage for more details.

3.8 Allele age

For details on the allele age calculation, the user is directed to [de Sanctis, Krukov, de Koning, 2017]. Briefly, the paper describes a method to find moments of the allele age distribution given an observed number of copies in the WF model. The moments are calculated in an approach similar to those described above.

The calculation is performed by WFES Single in the allele age mode. The observed number of copies iset via the observed copies (x) parameter. See WFES Single usage for more details.

3.9 Switching models

WFES Switching implements an extended time-heterogeneous Wright-Fisher model. The classical WF model describes a single population of constant size. However, this assumption is rarely met in nature. Likewise, classical WF assumes that the rest of the parameters (selection, mutation) are time-invariant. In this section we describe an extension to the Wright-Fisher model with time-variable parameters. We combine a finite set of WF models in a joint Markov-modulated switching process. The switching process assigns a probability of switching between WF component models with different parameters. Each WF component model can have its own population size, selection coefficient, and mutation rate. Further, WFES Sweep combines absorbing and non-absorbing models.

Let _W1,...,Wn represent a finite list of distinct Wright-Fisher components with its own parameter set θi. Each component is a full Wright-Fisher Markov model. We also have transition probabilities rx->y of switching from Wx to Wy at any time. Each component _Wi has a transition probability matrix P(i), which is written in canonical form as Equation 4.

Equation

We want to describe the join process of switching between W1,...,Wn. We write the canonical form of the switching process transition probability matrix as:

Equation

The Γx,y matrix defines a matrix of switching from WF component _Wx to WF component _Wy. The dimensions of the matrix are (2Nx - 1) x (2Ny - 1) if both _Wx and _Wy have two absorbing states (or 2Nx x 2Ny if Wx and Wy each have one absorbing state).

The entries of Γx,y(i,·) are defined as Wright-Fisher transition probabilities given current i state is in process _Wx and next state (j) is in process Wy:

Equation

where αx,y is a switching rate between Wx and Wy.

This formulation essentially matches the frequencies of allele A between different component models. For example if Nx = 100 and Ny = 200 then ix = 10 would correspond to iy = 20. Note that Γx,y(i,·)_ describes a full Wright-Fisher generation, so we are transforming the entire distribution from Wx into Wy.

We additionally use a parameter _p1,...,pn denoting the probability of starting in each of the components W1,...,Wn This parameter is most relevant with αx,y imposing a non-reversible switching process.

The calculations on this extended model are similar to those for the simple Wright-Fisher model, since we are still dealing with a finite state absorbing Markov chain. The main difference is that we now deal with extinction and fixation events in each of the component model. To calculate the overall statistics of a model, we weight each component with the probability of starting within each component, pi. Consider the probability of fixation (probability of extinction is analogous):

Equation

where 0i is the 0th state of the ith component.

These calculations are implemented in WFES Switching, with absorption and fixation modes. Matrix parameter α is controlled by switching (r) parameter. Detailed usage information is found in Section 6.