Skip to content

2. Applications

Alberto edited this page Nov 24, 2020 · 10 revisions

2. Applications

The package consist of several applications, each implementing a different type of Wright-Fisher model (WF). Each application has a set of inputs to specify the model parameters and the calculations to be performed. The different applications available in WFES are:

  • WFES Single: Standard Wright-Fisher model of a single population
  • WFES Sweep: Switching model with two parameter regimes. The first model is non-absorbing and the second is fixation only.
  • WFES Sequential: Similar to WFES Switching, but it switches between parameter regimes sequentially.
  • WFES Switching: Time-heterogeneous extension of the Wright-Fisher model known as Markov-modulated Wright-Fisher model. This model switch between a given set of parameter regimes, at a given switching rate.
  • WFAF-S: Calculate an approximate allele frequency spectrum by leveraging the Markov-modulated Wright-Fisher model.
  • WFAF-D: Calculate the expected allele frequency distribution for a given piece-wise demographic history.
  • Phase Type: Calculate the distribution of time to substitution. This is the fixation-only absorbing analogue of Time_dist.
  • Time Dist.: Iteratively calculates the distribution of time to fixation or extinction. Time Dist. Skip mode is analogue to Phase Type Dist., but excluding mutation time.

In the following subsections a brief explanation of each application and it modes is given.

2.1 WFES Single

WFES Single implements the standard Wright-Fisher model of a single population. It has the following modes:

  • Absorption mode assumes that absorption is possible at extinction and fixation boundaries. The calculations assume that the population starts with one or more copies of the allele (see Section 4.1 - Integration for details). This mode calculates the following outputs:
    • Values:
      • Pext: Probability of extinction.
      • Pfix: Probability of fixation.
      • Tabs: Expected number of generations before absorption.
      • Tabs-std: Standard deviation of Tabs.
      • Text: Expected number of generations before extinction.
      • Text-std: Standard deviation of Text.
      • Next: //TODO
      • Tfix: Expected number of generations before fixation.
      • Tfix-std: Standard deviation of Tfix.
      • Res: Save the calculated values in a CSV file. This output is checked by default.
    • Matrices:
      • Q: Transition probability matrix of transient-transient states. This output is not produced by default. You must check the Output Q option.
      • R: Transition probability matrix of transient-absorbing states. This output is not produced by default. You must check the Output R option.
      • N: Fundamental matrix of the WF model. In this mode, only required rows are calculated. This output is not produced by default. You must check the Output N option.
      • B: Matrix result of multiplying N and R. This matrix is useful for obtaining absorption properties of the model. In this mode, only required rows are calculated. This output is not produced by default. You must check the Output B option.
      • Next: //TODO
      • Nfix: //TODO
      • I: Initial probability distribution of alleles. This output is not produced by default. You must check the Output I option.
  • Fixation mode assumes that the extinction boundary is transient, and only the fixation boundary is absorbing. The calculations assume that the population starts with zero copies of the allele. This mode calculates the following outputs:
    • Values:
      • Tb-fix: Expected number of generations between two fixation events.
      • Tb-fix-std: Standard deviation of Tb-fix.
      • Rate: Rate of substitutions (1/Tb-fix).
      • Res: Save the calculated values in a CSV file. This output is checked by default.
    • Matrices:
      • Q: Transition probability matrix of transient-transient states. This output is not produced by default. You must check the Output Q option.
      • R: Transition probability matrix of transient-absorbing states. This output is not produced by default. You must check the Output R option.
      • N: Fundamental matrix of the WF model. In this mode, only required rows are calculated. This output is not produced by default. You must check the Output N option.
      • B: Matrix result of multiplying N and R. This matrix is useful for obtaining absorption properties of the model. In this mode, only required rows are calculated. This output is not produced by default. You must check the Output B option.
      • I: Initial probability distribution of alleles. This output is not produced by default. You must check the Output I option.
  • Establishment mode calculates establishment properties. //TODO review. This mode calculates the following outputs:
    • Values:
      • Fest: Frequency of establishment.
      • Pest: Probability of establishment.
      • Tseg: Expected number of generations before segregation.
      • Tseg-std: Standard deviation of Tseg.
      • Tseg-ext: Expected number of generations before segregation (Extinction).
      • Tseg-ext-std: Standard deviation of Tseg-ext.
      • Tseg-fix: Expected number of generations before segregation (Fixation).
      • Tseg-ext-std: Standard deviation of Tseg-fix
      • Test: Number of generations before establishment.
      • Test-std: Standard deviation of Test.
      • Res: Save the calculated values in a CSV file. This output is checked by default.
    • Matrices:
      • Q: Transition probability matrix of transient-transient states. This output is not produced by default. You must check the Output Q option.
      • R: Transition probability matrix of transient-absorbing states. This output is not produced by default. You must check the Output R option.
      • I: Initial probability distribution of alleles. This output is not produced by default. You must check the Output I option.
  • Fundamental mode calculates the entire fundamental matrix of the Wright-Fisher model. There is no assumption about the starting number of alleles. Note that this mode is slow for large matrices (N > 1000). This mode calculates the following outputs:
    • Matrices:
      • Q: Transition probability matrix of transient-transient states. This output is not produced by default. You must check the Output Q option.
      • R: Transition probability matrix of transient-absorbing states. This output is not produced by default. You must check the Output R option.
      • N: Fundamental matrix of the WF model. In this mode, the full matrix is calculated (this is the reason why this mode is slow). This output is not produced by default. You must check the Output N option.
      • V: Variance of the fundamental matrix. This output is not produced by default. You must check the Output V option.
      • I: Initial probability distribution of alleles. This output is not produced by default. You must check the Output I option.
  • Non Absorbing mode builds a non absorbing matrix of the WF model. This mode calculates the following outputs:
    • Matrices:
      • Q: Transition probability matrix of transient-transient states. This output is not produced by default. You must check the Output Q option.
      • I: Initial probability distribution of alleles. This output is not produced by default. You must check the Output I option.
  • Equilibrium mode calculates the equilibrium distribution of allele frequencies. Both boundaries are non-absorbing (this is required for the existence of equilibrium distribution). This mode calculates the following outputs:
    • Values:
      • Efreq-mut: //TODO
      • Efreq-wt: //TODO
      • Res: Save the calculated values in a CSV file. This output is checked by default.
    • Matrices:
      • E: Equilibrium distribution of allele frequencies. This output is not produced by default. You must check the Output E option.
      • I: Initial probability distribution of alleles. This output is not produced by default. You must check the Output I option.
  • Allele Age mode calculates moments of the allele age given a current allele frequency. Both extinction and fixation boundaries are absorbing. The calculations assume that the population starts with one or more copies of the allele (see Section 4.1 - Integration for details). This mode calculates the following outputs:
    • Values:
      • E(A): Expectation of the allele age.
      • S(A): Standard deviation of the allele age.
      • Res: Save the calculated values in a CSV file. This output is checked by default.
    • Matrices:
      • Q: Transition probability matrix of transient-transient states. This output is not produced by default. You must check the Output Q option.
      • R: Transition probability matrix of transient-absorbing states. This output is not produced by default. You must check the Output R option.
      • I: Initial probability distribution of alleles. This output is not produced by default. You must check the Output I option.

2.2 WFES Sweep

WFES Single implements a type of a switching model with two parameter regimes. The first model is non-absorbing (both extinction and fixation boundaries are transient), and the second model is fixation-only. his is a model of standing genetic variation with pre-adaptive and adaptive components. There is currently just one mode.

  • Fixation mode assumes that the extinction boundary is transient, and only the fixation boundary is absorbing. This mode calculates the following outputs:
    • Values:
      • Tb-fix: Expected number of generations between two fixation events.
      • Rate: Rate of substitutions (1/Tb-fix).
      • Res: Save the calculated values in a CSV file. This output is checked by default.
    • Matrices:
      • Q: Transition probability matrix of transient-transient states. This output is not produced by default. You must check the Output Q option.
      • R: Transition probability matrix of transient-absorbing states. This output is not produced by default. You must check the Output R option.
      • N: Fundamental matrix of the WF model. In this mode, only required rows are calculated. This output is not produced by default. You must check the Output N option.
      • B: Matrix result of multiplying N and R. This matrix is useful for obtaining absorption properties of the model. In this mode, only required rows are calculated. This output is not produced by default. You must check the Output B option.
      • I: Initial probability distribution of alleles. This output is not produced by default. You must check the Output I option.

2.3 WFES Sequential

WFES Single implements some calculations by switching over a set of Wright-Fisher models sequentially. There is only one mode where fixation and extinction are both absorbing.

  • Values:
    • Pext: Probability of extinction.
    • Pfix: Probability of fixation.
    • Ptmo: //TODO
    • Text: Expected number of generations before extinction.
    • Text-std: Standard deviation of Text.
    • Tfix: Expected number of generations before fixation.
    • Tfix-std: Standard deviation of Tfix.
    • Ttmo: //TODO
    • Ttmo-std: Standard deviation of Ttmo.
    • Res:
  • Matrices:
    • Q: Transition probability matrix of transient-transient states. This output is not produced by default. You must check the Output Q option.
    • R: Transition probability matrix of transient-absorbing states. This output is not produced by default. You must check the Output R option.
    • N: Fundamental matrix of the WF model. In this mode, only required rows are calculated. This output is not produced by default. You must check the Output N option.
    • B: Matrix result of multiplying N and R. This matrix is useful for obtaining absorption properties of the model. In this mode, only required rows are calculated. This output is not produced by default. You must check the Output B option.
    • Next: //TODO
    • Nfix: //TODO
    • Ntmo: //TODO

2.4 WFES Switching

WFES Switching implements a time-heterogeneous extension of the Wright-Fisher model (known as Markov-modulated Wright-Fisher model). It is possible to switch between different parameter regimes - for example different population sizes, selection parameters, or mutation rates. We refer to each parameter regime as "component". For example, an absorbing model of oscillating population sizes (N1 = 1000, N2 = 2000) has two components (corresponding to each population) and (2N1 - 1) + (2N2 - 1) = 7998 states. The switching between components is parameterised with the initial probability distribution (p), and the rate of switching from one component to the next (r). It has the following modes:

  • Absorption mode assumes that absorption is possible at extinction and fixation boundaries. This mode calculates the following outputs:
    • Values:
      • Pext: Probability of extinction.
      • Pfix: Probability of fixation.
      • Text: Expected number of generations before extinction.
      • Text-std: Standard deviation of Text.
      • Tfix: Expected number of generations before fixation.
      • Tfix-std: Standard deviation of Tfix.
      • Res: Save the calculated values in a CSV file. This output is checked by default.
    • Matrices:
      • Q: Transition probability matrix of transient-transient states. This output is not produced by default. You must check the Output Q option.
      • R: Transition probability matrix of transient-absorbing states. This output is not produced by default. You must check the Output R option.
      • N: Fundamental matrix of the WF model. In this mode, only required rows are calculated. This output is not produced by default. You must check the Output N option.
      • B: Matrix result of multiplying N and R. This matrix is useful for obtaining absorption properties of the model. In this mode, only required rows are calculated. This output is not produced by default. You must check the Output B option.
      • Next: //TODO
      • Nfix: //TODO
  • Fixation mode assumes that the extinction boundary is transient, and only the fixation boundary is absorbing. This mode calculates the following outputs:
    • Values:
      • Tb-fix: Expected number of generations between two fixation events.
      • Rate: Rate of substitutions (1/Tb-fix).
      • Res: Save the calculated values in a CSV file. This output is checked by default.
    • Matrices:
      • Q: Transition probability matrix of transient-transient states. This output is not produced by default. You must check the Output Q option.
      • R: Transition probability matrix of transient-absorbing states. This output is not produced by default. You must check the Output R option.
      • N: Fundamental matrix of the WF model. In this mode, only required rows are calculated. This output is not produced by default. You must check the Output N option.
      • B: Matrix result of multiplying N and R. This matrix is useful for obtaining absorption properties of the model. In this mode, only required rows are calculated. This output is not produced by default. You must check the Output B option.

2.5 WFAF-S

WFAF-S (Wright-Fisher Allele Frequency - Stochastic) calculates an approximate allele frequency spectrum by leveraging the Markov-modulated Wright-Fisher model. The demography is described by a non-reversible Markov chain, where the transition probabilities are inversely proportional to the expected times in each epoch. The details of implementation are in stochastic approximation.

  • Matrices:
    • Q: Transition probability matrix of transient-transient states. This output is not produced by default. You must check the Output Q option.
    • N: Fundamental matrix of the WF model. In this mode, only required rows are calculated. This output is not produced by default. You must check the Output N option.
    • B: Matrix result of multiplying N and R. This matrix is useful for obtaining absorption properties of the model. In this mode, only required rows are calculated. This output is not produced by default. You must check the Output B option.
    • Dist: Allele frequency distribution calculated by WFAF-S. This output is checked by default and cannot be unchecked.

2.6 WFAF-D

WFAF-D (Wright-Fisher Allele Frequency - Deterministic) calculates the expected allele frequency distribution for a given piece-wise demographic history. It uses an equilibrium distribution to initiate the calculation, and then iterates forward in time by fast matrix-vector multiplications. It is also possible to start from a given allele frequency distribution. The details of implementation are in allele frequency calculation.

  • Matrices:
    • Dist: Allele frequency distribution calculated by WFAF-S. This output is checked by default and cannot be unchecked.

2.7 Phase Type

Phase Type is formed by two executables that calculate distribution of time to substitution and the moments of that distribution. It has the following modes:

  • Phase Type Dist. calculates the distribution of time to substitution. It is the fixation-only absorbing analogue of Time Dist. (the WFES Single Fixation mode. The details of implementation are in distribution calculations. This mode calculates the following outputs:
    • Matrices:
      • Q: Transition probability matrix of transient-transient states. This output is not produced by default. You must check the Output Q option.
      • R: Transition probability matrix of transient-absorbing states. This output is not produced by default. You must check the Output R option.
      • P: Distribution of time to substitution. It outputs a CSV containing probability distributions over time. This output is checked by default and cannot be unchecked. The CSV file is organised as follows:
        • First column: Instant of time (generations).
        • Second column: Probability of substitution in each generation.
        • Third column: Cumulative probability of substitution.
  • Phase Type Moments calculates the moments of the distribution of time to substitution. Phase Type Moments implements the calculation described in Dayar, 2005 (algorithm 1). This calculates an arbitrary number of moments of Phase Type Distribution. This mode calculates the following outputs:
    • Values:
      • Mean: Mean of the moments.
      • Std: Standard deviation of the moments.
    • Matrices:
      • Q: Transition probability matrix of transient-transient states. This output is not produced by default. You must check the Output Q option.
      • R: Transition probability matrix of transient-absorbing states. This output is not produced by default. You must check the Output R option.

2.8 Time Dist.

Time Dist. is formed by four executables that calculate distribution of time to different events like fixation, extinction or substitution. It has the following modes:

  • Time Dist. iteratively calculates the distribution of time to fixation or extinction. The details of implementation are in distribution calculations. This mode calculates the following outputs:
    • Matrices:
      • Q: Transition probability matrix of transient-transient states. This output is not produced by default. You must check the Output Q option.
      • R: Transition probability matrix of transient-absorbing states. This output is not produced by default. You must check the Output R option.
      • P: Distribution of time to fixation and extinction. It outputs a CSV containing probability distributions over time. This output is checked by default and cannot be unchecked.The CSV file is organised as follows:
        • First column: Instant of time (generations).
        • Second column: Probability of extinction in each generation.
        • Third column: Probability of fixation in each generation.
        • Fourth column: Probability of absorption (either fixation or extinction) in each generation.
        • Fifth column: Cumulative probability of absorption.
  • Time Dist. SGV calculates the distribution of time to substitution under a model of standing genetic variation. This is a combination of Phase Type Dist. calculation with the WFES Sweep model. The details of implementation are in distribution calculations. This mode calculates the following outputs:
    • Matrices:
      • Q: Transition probability matrix of transient-transient states. This output is not produced by default. You must check the Output Q option.
      • R: Transition probability matrix of transient-absorbing states. This output is not produced by default. You must check the Output R option.
      • P: Distribution of time to substitution. It outputs a CSV containing probability distributions over time. This output is checked by default and cannot be unchecked.The CSV file is organised as follows:
        • First column: Instant of time (generations).
        • Second column: Probability of substitution in each generation.
        • Third column: Cumulative probability of substitution.
  • Time Dist. Skip is analogue of Phase Type Dist., but excluding the mutation time. The details of implementation are in distribution calculations. This mode calculates the following outputs:
    • Matrices:
      • Q: Transition probability matrix of transient-transient states. This output is not produced by default. You must check the Output Q option.
      • R: Transition probability matrix of transient-absorbing states. This output is not produced by default. You must check the Output R option.
      • P: Distribution of time to substitution. It outputs a CSV containing probability distributions over time. This output is checked by default and cannot be unchecked.The CSV file is organised as follows:
        • First column: Instant of time (generations).
        • Second column: Probability of substitution in each generation.
        • Third column: Cumulative probability of substitution.
  • Time Dist. Dual //TODO. This mode calculates the following outputs:
    • Matrices:
      • Q: Transition probability matrix of transient-transient states. This output is not produced by default. You must check the Output Q option.
      • R: Transition probability matrix of transient-absorbing states. This output is not produced by default. You must check the Output R option.
      • P: //TODO. This output is checked by default and cannot be unchecked.The CSV file is organised as follows:
        • First column: Instant of time (generations).
        • Second column: Probability of extinction in each generation.
        • Third column: Probability of fixation in each generation.
        • Fourth column: Probability of absorption (either fixation or extinction) in each generation.
        • Fifth column: Cumulative probability of absorption.