Skip to content

Explicitly defining synonym species

Donovan Parks edited this page Sep 26, 2021 · 1 revision

In general, GTDB species are defined by a set of quantitative rules which are implemented in the GTDB Species Cluster Toolkit. Deviations from these rules should be well justified. The only current instance of this is the decision to define Shigella species as synonyms in E. coli starting in GTDB R207. The rationale for this is described in here.

In order to allow species to become synonyms of another species requires the following changes:

  1. All GTDB species clusters which will be impacted by the change need to be explicitly disbanded. In the case of Shigella species becoming synonyms of E. coli, this includes E. flexneri, E. dysenteriae, E. coli_C, and E. coli_D. Note that it is insufficient to just disband E. flexneri and E. dysenteriae since the current rules for updating species clusters aims to retain any previously defined clusters. As such, if E. coli_C and E. coli_D aren't explicitly disbanded they will be retained in the next release. Species clusters are disbanded by adding them to the gtdb_disband_cluster.tsv ledger.

  2. Any species with 1 or more type strain genomes that are to become synonyms need to be explicitly indicate in the initialization method of update_select_reps.py. This will ensure these genomes are not treated as type strains of the species and thus used to form a new GTDB species clusters. If additional species are going to be handled as exceptions to the GTDB species clustering rules, perhaps this information should be moved into a ledger.