Skip to content

Generating Sequences

AADavin edited this page May 6, 2018 · 19 revisions

Basic mode (S)

With this method, it is possible to simulate an alignment (nucleotides, proteins or codons) for each gene that evolve inside the species tree. Sequences are simulated along the branches of the species tree, considering that the branch-length of the gene trees output by the G mode is measured in units of time.

Advanced modes

Mode Su - User control of substitution rates

It is possible to consider that certain lineages evolve at different paces in the species tree and that different gene families can also evolve at different speeds. For this reason, it is possible to include two multipliers. The Species substitution rate multiplier and the gene family substitution rate multiplier.

The number of substitutions taking place in a branch is computed by

Branch length (time) * substitution_rate_multiplier_species * substitution_rate_multiplier_gene_family

To run this, first you need to prepare the rates files:

python RateCustomizer.py S SequenceParameters.tsv ExperimentFolder

This will create two files:

  • GT_SubsitutionRates.tsv
  • ST_SubsitutionRates.tsv

Then, you can launch the simulation of sequences by running

python Zombi.py Su SequenceParameters.tsv ExperimentFolder

Output

Sequences: A folder with two fasta file per gene of the species tree. each fasta alignment contains the simulated sequences obtained at the leaves of the tree, not the internal nodes.

Parameters

SEQUENCE: codon, amino-acid or nucleotide

VERBOSE: If 1, it inputs the name of the gene family being computed

Clone this wiki locally