3. Computation

This section explains computations performed in WFES in detail.

The main feature of WFES is to compute roes of the fundamental matrix of the Wright-Fisher model. Many properties of interest can be derived from the fundamental matrix. We first describe the calculation applied in WFES Single for a standard WF model.

3.1 Wright-Fisher model

The Wright-Fisher model describes a single bi-allelic locus in a population of fixed size. We denote a as the ancestral allele, and A as the derived, or focal allele. The organisms are diploid, so the total number of chromosomes in a population size N is 2N. Given i copies of the derived allele a at time t, the probability of having j copies in the next generation is:

Equation

Above, ψ_i is the binomial sampling probability for the number of individuals in the next generation. In the simple case of no mutation or selection, ψ_i only depends on the current number of copies, ψ_i = 1/2N. One way to parameterise the model with mutation and selection is:

Equation

Above, ω.. is the selection coefficient for a matricular genotype, μ_A->a is the backward mutation rate, μ_a->A is the forward mutation rate. Variables p and q are allele frequencies of A and a respectively: p=i/2N, q=1-p. The denominator is the average fitness of the population.

Equation 2 can be parametrized in an arbitrary manner. We follow Kimura [1964] and assign the following selection coefficients to the genotypes:

Genotype	Fitness
AA	1 + s
Aa	1 + sh
aa	1

Above h ∈ [1, 0] is the dominance coefficient, and s ∈ [1, 0] is the selection coefficient. With the above formulation, (2) simplifies to:

Equation

3.2 Fundamental matrix calculation

Equation 1 yields a discrete finite-state Markov chain, with time scale in Wright-Fisher generatios. State i = 0 corresponds to extinction of A, and i = 2N is fixation of A. The model has 2N + 1 states, where i = 0 and i = 2N are absorbing, and the rest are transient. The transition probability matrix P is _(2N + 1) x (2N + 1). The transition probability matrix can be re-ordered ti group the transient-to-transient entries (Q) and transient-to-absorbing (R) entries:

Equation

With two absorbing states, 0 is a 2 x 2 matrix of zeros, I₂ is a 2 x 2 identity matrix, and Q is a (2N - 1) x (2N - 1) matrix.

For any absorbing Markov chain, there exists a fundamental matrix N:

Equation

Each entry of N_ij is the expected number of generations spent with j copies, given that we started with i copies. Knowing the entries of N allows to write down many useful absorption properties of the Markov chain. For example, the probability of absorbing in state k, conditional on starting with i copies is found as the (i, k)^th entry of:

Equation

We can use B to find the expected number of generations in state j, conditional on starting in i and absorbing in k:

Equation

The conditional time to absorption in state k is then:

Equation

And times to extinction and fixation are:

Equation

Those properties are calculated in Wfes Single in absorption mode.

3.3 Solving sparse systems

Solving for the entire matrix N is expensive for large population sizes. However, since N_i,j expresses number of generations spent in a state j conditional on starting in i, we can simplify the calculation by explicitly conditioning on i. For example if we assume that allele A starts with one copy (i = 1), then only the first row, N_i is of interest. We can generalise this, by assuming a finite forward mutation rate v. In this case, for small 4 Nv < 1, there is a non-zero probability that with i ≤ 1. However, this probability vanishes quickly with increasing i, and we then only require several rows of N. See more details in integration.

For a starting number of alleles i, we find the i^th row of N:

Equation

where I_i,· is the _i^th column of a _(2N - 1) x (2N - 1) identity matrix.

This system can be solved by LU decomposition of (I - Q)^T. Once the decomposition is known, we can solve for different right-hand sides of the equation, such as when i ≥ 1.

To find matrix B, we solve:

Equation

where B_·,0 is the column of B corresponding to i = 0 extinction. Since we have two absorbing states, we can compute:

Equation

The LU decomposition and solution is performed with MKL PARDISO routines. Parameters and settings for the MKL PARDISO calls can be found in the source code.

3.4 Entire fundamental matrix

If the entire fundamental matrix is required, it can be calculated with WFES Single in the fundamental mode. See Output N and Output V options in WFES Single usage.

3.5 Fixation only

The calculations as stated in the previous section applies to the absorbing mode of WFES Single, where both extinction and fixation states are absorbing. Other possible mode for the computation is fixation, where only the fixation state (i = 2N) is absorbing, and the extinction state (i = 0) is transient. In this case, matrix Q in Equation 4 is 2N x 2N.

If the extinction state is transient, it can be entered and left many times without terminating the Markov chain. This mode makes it easy to calculate T_b-fix - time between fixations - the total time it takes for a new allele to reach fixation (with the possibility of several extinctions along the way). More details of this calculation an applications can be found in [de Koning and de Sanctis, 2018].

The time between fixations, T_b-fix is calculated in a similar way as T_fix for the model with two absorbing states (eq. [6], [7], [8]). However, since there is only one absorbing state, no re-conditioning is required (eq. [7]). Then the T_b-fix is simply:

Equation

An advantage of this calculation is that we can safely assume that allele a starts in i = 0 copies. Then the integration over the starting number of copies is not necessary, since it is explicitly included in the model as transitions from i = 0 to i > 0 copies. This the means that we only need to find a single 0^th row of the fundamental matrix.

3.6 Calculating the variance of the time spent in each state is of interest. It can be found as:

Equation

where N_dg is the matrix containing the diagonal of N, and N_sq is N element-wise squared.

If the Output V option is used in the fundamental mode of WFES Single, the entire N_var matrix will be calculated.

In the fixation mode, the standard deviation of T_b-fix is calculated from the first row of N in Equation 13:

Equation

where N₁ and N₂ are found by solving:

Equation

3.7 Equilibrium and allele frequencies

The equilibrium distribution of allele frequencies is one of the key properties of the Wright-Fisher model. We use the method described by Paige, Styan and Watcher [1975] to solve for the equilibrium distribution (see also Harrod and Plemmons [1984]) of a non-absorbing Markoc chain. The equilibrium distribution of the Markov chain is defined as vector π, such as πP = π. This can be expressed in matrix form as:

Equation

where Π is a n x n matrix with π in each row. This can be re-written as:

Equation

We also have the constraint that Equation , which can be enforced by setting the last columns of (P-I_n) and 0_n, to e_n = (1,1,...,1)^T. We use the notation r(a) to denote that we set the last column of a to e_n.

Equation

We only require a single row of Π. Therefore, we can solve for any row Π_·,x:

Equation

This equation is solved with the LU decomposition approach.

Note that the matrix P is a 2N + 1 matrix, since the absorbing states are included. This means that we require that forward and backward mutation rates are non-zero. In case where μ_A->a = 0 or μ_A->a = 0, the matrix P becomes absorbing, and the equilibrium distribution does not exist.

This calculation is performed by WFES Single in the equilibrium mode. See WFES Single usage for more details.

3.8 Allele age

For details on the allele age calculation, the user is directed to [de Sanctis, Krukov, de Koning, 2017]. Briefly, the paper describes a method to find moments of the allele age distribution given an observed number of copies in the WF model. The moments are calculated in an approach similar to those described above.

The calculation is performed by WFES Single in the allele age mode. The observed number of copies iset via the observed copies (x) parameter. See WFES Single usage for more details.

3.9 Switching models

WFES Switching implements an extended time-heterogeneous Wright-Fisher model. The classical WF model describes a single population of constant size. However, this assumption is rarely met in nature. Likewise, classical WF assumes that the rest of the parameters (selection, mutation) are time-invariant. In this section we describe an extension to the Wright-Fisher model with time-variable parameters. We combine a finite set of WF models in a joint Markov-modulated switching process. The switching process assigns a probability of switching between WF component models with different parameters. Each WF component model can have its own population size, selection coefficient, and mutation rate. Further, WFES Sweep combines absorbing and non-absorbing models.

Let _W₁,...,W_n represent a finite list of distinct Wright-Fisher components with its own parameter set θ_i. Each component is a full Wright-Fisher Markov model. We also have transition probabilities r_x->y of switching from W_x to W_y at any time. Each component _W_i has a transition probability matrix P_(i), which is written in canonical form as Equation 4.

Equation

We want to describe the join process of switching between W₁,...,W_n. We write the canonical form of the switching process transition probability matrix as:

Equation

The Γ_x,y matrix defines a matrix of switching from WF component _W_x to WF component _W_y. The dimensions of the matrix are (2N_x - 1) x (2N_y - 1) if both _W_x and _W_y have two absorbing states (or 2N_x x 2N_y if W_x and W_y each have one absorbing state).

The entries of Γ_x,y(i,·) are defined as Wright-Fisher transition probabilities given current i state is in process _W_x and next state (j) is in process W_y:

Equation

where α_x,y is a switching rate between W_x and W_y.

This formulation essentially matches the frequencies of allele A between different component models. For example if N_x = 100 and N_y = 200 then i_x = 10 would correspond to i_y = 20. Note that Γ_x,y(i,·)_ describes a full Wright-Fisher generation, so we are transforming the entire distribution from W_x into W_y.

We additionally use a parameter _p₁,...,p_n denoting the probability of starting in each of the components W₁,...,W_n This parameter is most relevant with α_x,y imposing a non-reversible switching process.

The calculations on this extended model are similar to those for the simple Wright-Fisher model, since we are still dealing with a finite state absorbing Markov chain. The main difference is that we now deal with extinction and fixation events in each of the component model. To calculate the overall statistics of a model, we weight each component with the probability of starting within each component, p_i. Consider the probability of fixation (probability of extinction is analogous):

Equation

where 0_i is the 0^th state of the i^th component.

These calculations are implemented in WFES Switching, with absorption and fixation modes. Matrix parameter α is controlled by switching (r) parameter. Detailed usage information is found in Section 6.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

3. Computation

3. Computation

3.1 Wright-Fisher model

3.2 Fundamental matrix calculation

3.3 Solving sparse systems

3.4 Entire fundamental matrix

3.5 Fixation only

3.6 Calculating the variance of the time spent in each state is of interest. It can be found as:

3.7 Equilibrium and allele frequencies

3.8 Allele age

3.9 Switching models

Clone this wiki locally