-
Notifications
You must be signed in to change notification settings - Fork 5
Generating Sequences
With this method, it is possible to simulate an alignment for each gene that evolve inside the species tree. This method can simulate non-coding DNA sequences, coding DNA sequences and amino-acids sequences.
Sequences are simulated along the branches of the species tree, considering that the branch-length of the gene trees output by the mode G is measured in units of time.
It is possible to consider that certain lineages evolve at different paces in the species tree and that different gene families can also evolve at different speeds. For this reason, it is possible to include two multipliers. The Species substitution rate multiplier and the gene family substitution rate multiplier.
The number of substitutions taking place in a branch is computed by
Branch length (time) * substitution_rate_multiplier_species * substitution_rate_multiplier_gene_family
To run this, first you need to prepare the rates files:
python RateCustomizer.py S SequenceParameters.tsv ExperimentFolder
This will create two files:
- GT_SubsitutionRates.tsv
- ST_SubsitutionRates.tsv
Then, you can launch the simulation of sequences by running
python Zombi.py Su SequenceParameters.tsv ExperimentFolder
Sequences: A folder with one fasta file per gene of the species tree. each fasta alignment contains the simulated sequences obtained at the leaves of the tree, not the internal nodes.