Skip to content
Adrian Fritz edited this page Jun 15, 2017 · 5 revisions

When genomes are selected, various methods are used to maximize diversity. Part of the meta data of genomes must be the novelty category and OTU.

A novelty category reflects how close a genome seems to be related to known whole genomes in reference databases. An equal amount of genomes is drawn from each category.
In case a category has insufficient genomes to draw from, equally more are drawn from the remaining categories.
Within a category genomes are drawn at random, unless too many are within the same OTU.

Closely related genomes are grouped into OTU. Drawing many too closely related genomes is avoided by setting a maximum, max_strains_per_otu in the configuration file. Only if no other genomes are available is the maximum is exceeded. In such a case, genomes are drawn at random from the pool of previously excluded genomes.

Strains

To increase the chance of getting closely related, genomes within an OTU are drawn at random until a given OTU maximum is reached. Alternatively multiple strains of a species can be simulate artificial strains using sgEvolver.