Skip to content

Implementation

jaclew edited this page Jun 25, 2024 · 7 revisions

FlexTaxD

Overview

FlexTaxD is an open source taxonomy database maintenance tool implemented in Python, version 3. Taxonomic information is efficiently stored in an SQL database, in which FlexTaxD saves only the minimally required information (no duplicates) to represent the taxonomic relationships. In addition, FlexTaxD allows non-used taxonomy nodes to be removed from the database for downstream efficiency. The core functionality (database maintenance) of FlexTaxD depends only on standard Python libraries, while the downstream options require third-party software. FlexTaxD is implemented in a modular fashion that allows easy integration with future taxonomy formats and classification tools. The source code of FlexTaxD is publicly available as a GitHub repository. The repository has a wiki section complete with an example walk-through, descriptions of source formats, script parameters, and other information.

Hononymns/synonymns

FlexTaxD uses a minimal approach for storing information in the database which means that if two taxonomies have identical names (and not a pre-defined unique taxonomic id) they will be stored using the same id and name. However, on visualisation or print, this issue will be resolved. On visualisation, a prompt will ask the user to specify parent, while on print FlexTaxD will ensure all identical nodes have a unique ID. If the parameter --dump_descriptions is is used there is no need for unique IDs and the tree will be printed correctly.

Dependencies

The core functionality of FlexTaxD (adding and merging taxonomic databases) does not require any dependencies. However, visualisation and building downstream metagenomic classification databases require additional libraries.

Visualisation

Module dependencies for visualisation

biopython
matplotlib
inquirer

Modules

FlexTaxD is mostly implemented in modules. In the future a recipe for adding custom taxonomic input modules may be added.