Skip to content
Adrian Fritz edited this page Jul 14, 2022 · 11 revisions

If more genomes are requested than are available, artificial strains will be generated using sgEvolver. In the configuration file every community has the options genomes_total and genomes_real. The difference of both is the number of artificial strains that will be generated.

Amount of artificial strains

The aim is to have up to ~10 artificial strains of a genome, geometrically drawn (p=0.3). This is done for random genomes, until the given maximum number of genomes of a sample is reached. In a rare case that more than a 9 is drawn, it will be lowered to 9.
This way a total number of strains can be 10, the original strain included.

Simulation of artificial strains

To generate artificial strains the tool sgEvolver developed by Aaron Darling is used. For any given strain, artificial strains are simulated based on a distance tree in newick format. The file given in the strain_simulation_template option of the configuration file. The number of leaves determine the number of strains simulated. The distance to the root determines how strongly a artificial strain will be evolved. The default template simulates 40 artificial strains for each given strain.
A gff formatted file with annotated genes is required so genes regions can be handled different from other regions. For the strain simulation to work, the genome_to_id.tsv and genome_to_gff.tsv files have to contain absolute paths to the genomes and gff files.

Example of drawn amounts

genomes_total=75
genomes_real=30

[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
 2, 2, 2, 2, 2, 2, 
 3, 3, 3, 3, 
 4, 4, 
 5, 5, 
 6, 
 7, 7]  

In this example a total of 30 genomes would be used. 17 of them will have simulated strains made from them.
The 13 '1' means, that 13 will have no strains simulated.