Skip to content

7. premade prepTG dbs

Rauf Salamzade edited this page Nov 28, 2023 · 4 revisions

We provide premade databases for 18 bacterial taxa (mostly at the genus level). These databases are not all inclusive - but fai & zol certainly have the capabilities to handle searches on 5,000+ genomes, as we showed in the manuscript. Rather these premade databases only contain distinct representative genomes selected using our tool skDER with the greedy clustering approach to sufficiently sample the known pangenome space of the taxa. This is to keep the size of the databases relatively low to aid with download speeds (not super fast currently as is).

The databases are stored on Zenodo (ESKAPE genera, BGC-rich taxa, and other commonly studied genera) and also feature GToTree based phylogenies which can be used as input to the -st argument in fai to generate phylogenetic-heatmaps showcasing the presence of query gene clusters.

    - Acinetobacter - 1,643 rep genomes (17.8% of 9,221 total genomes considered)
    - Bacillales - 3,150 rep genomes (35.9% of 8,766 total genomes considered)
    - Corynebacterium - 726 rep genomes (43.0% of 1,688 total genomes considered)
    - Cutibacterium - 27 rep genomes (5.4% of 502 total genomes considered)
    - Enterobacter - 878 rep genomes (19.9% of 4,408 total genomes considered)
    - Enterococcus - 937 rep genomes (14.6% of 6,426 total genomes considered)
    - Escherichia - 2,436 rep genomes (7.1% of 34,358 total genomes considered)
    - Klebsiella - 1,022 rep genomes (5.6% of 18,145 total genomes considered)
    - Lactobacillus - 541 rep genomes (30.9% of 1,747 total genomes considered)
    - Listeria - 353 rep genomes (6.9% of 5,062 total genomes considered)
    - Micromonospora - 211 rep genomes (73.3% of 288 total genomes considered)
    - Mycobacterium - 744 rep genomes (6.9% of 10,657 total genomes considered)
    - Neisseria - 414 rep genomes (12.8% of 3,235 total genomes considered)
    - Pseudomonas - 2,666 rep genomes (18.9% of 14,066 total genomes considered)
    - Salmonella - 308 rep genomes (2.2% of 14,109 total genomes considered)
    - Staphylococcus - 496 rep genomes (2.5% of 19,627 total genomes considered)
    - Streptococcus - 2,452 rep genomes (13.3% of 18,492 total genomes considered)
    - Streptomyces - 1,555 rep genomes (57.7% of 2,697 total genomes considered)