Skip to content
Freya Arthen edited this page Feb 20, 2024 · 7 revisions

taXaminer - examine the taxonomic diversity in genome assemblies. Designed to detect and differentiate contamination and horizontal gene transfer.

taXaminer combines a reference-free and an alignment-based approach to detect and differentiate contamination and horizontal gene transfer in genome assemblies. It uses a total of 16 intrinsic features to describe the gene set. Among these are the read coverage, sequence composition, gene length and the size of the scaffold it is annotated on (see details here). To identify genes which discern from the average, a Principal Component Analysis is used to cluster genes with similar features. The taxonomic assignment targets at identifying the true taxon of origin for each gene. It is based on their protein sequence to reduce the need of having the exact reference in the database.

The results can be interactively explored in the accompanying dashboard.

The Quick start guide helps to start your first analysis.


Commands overview

# clone the repository
git clone https://github.com/BIONF/taXaminer.git
# install package
pip install ./taXaminer
# setup additional dependencies with conda and the database (NCBI nr)
taxaminer.setup --conda --db nr -d </path/to/database/directory/>
# run taXaminer with config file 'config.yml'
taxaminer.run <config.yml>