Skip to content
Teal Furnholm edited this page Nov 24, 2020 · 6 revisions

Welcome to the Universal-Taxonomy-Database wiki!

  • Note: Between curating the taxonomy for all 2.2+ million named organisms and the curating all 529+ million sequenced genes https://github.com/TealFurnholm/Meta-NGS_Reference_Database with all available functional info and identifying each gene's common ancestor and all genome contaminants, I've developed some .. snark. You may notice it in my writing. Some shaming of the scientific community needs to be done, and ideally by people higher up than myself.

Why Curate A Universal Taxonomy Database?

Computers are dumb when it comes to pattern recognition. We see "E. coli K112" and "E. coli K 112" and "E. Coli K112" and can know it is the same organism. A computer does not. A computer, not recognizing these three names as the same, will give a gene/read/OTU a lowest common ancestor of "Escherichia" = loss of sensitivity. Unfortunately, current public databases are rife with this and many other issues. My effort to correct organism taxonomy will increase sensitivity of metagenome and metatranscriptome community phylogenies. By curating all organisms together to conform to various naming conventions, one can make sure there are no conflicts or synonyms and analyze the complex mixture of viruses, fungi, protists, bacteria, archaea, and host as a single comparable, graphable unit.

Public databases are a mess.

Their naming is a mess. Their organization is a mess. Taxonomy, Genes, or Genomes, all messes. Their not bothering to give strains their own unique id # even though it is the easiest thing in the world to to so now all strains of E.coli, including the harmless ones in you poop and the ones that make you bleed from the kidneys, all have the same identification #, is a mess. "Rules of Naming" page provides semi-detailed list of problems I've discovered and fixed in the taxonomy. Whether you agree with my approach or not, organism names change all the time because as we sequence more organisms phylogenetic lineages become more refined.

Outcome: The phylogenetic tree of all named species

Tree of Life: Genus-level

The phylogenies for all 1.7 million named organisms taken from NCBI and JGI were standardized to 7 primary ranks (Kingdom, Phylum, Class, Order, Family, Genus, Species). Due to computational limitations of graphing software, the tree was limited to the genus-level (does not visually include species). Line color color indicates the presence and number of sequenced species/strains in a lower rank, while the node color indicates the total named species in the rank. The cut-away shows the tree structure in greater detail, including the carnivorous genera Ursus (bears) Canis (dogs) and felines such as Panthera (panthers) or Felis (cats).

* The latest version of the software now goes to the strain-level and represents 2.2+ million organisms