Skip to content
Freya Hubert edited this page Mar 4, 2022 · 7 revisions

taXaminer enables the interactive exploration of taxonomic footprints in gene sets. The specific goal is to detect and differentiate contamination and horizontal gene transfer.

Besides the taxonomic assignment of genes, taXaminer uses a total of 16 further indicators to faciliate this. Among these indicators are read coverage, sequence composition, gene length and position of the gene within its scaffold (see details here). To identify genes which deviate from the mean set of genes, a principal component analysis (PCA) is used as it condenses data to fewer dimensions. Genes with similar values for certain variables are thereby clustered together, making deviations visible. The results can be interactively examined in a 3D scatterplot, where the dot position respresents a combination of coverage, sequence composition and spatial information provided by the PCA and the color shows the taxonomic assignment.

The Quick start guide gives you all information you need to start your first analysis.


Commands overview

# clone the repository
git clone https://github.com/fdhubert/taXaminer.git
# setup of conda environment 'taxaminer'
./setup_conda.sh
# running taXaminer locally with config file 'config.yml'
python taxaminer.py config.yml
# running taXaminer on cluster with config file 'config.yml'
sbatch taxaminer.slurm 'config.yml'
# running taXaminer on cluster with 'seedfile.txt' containing paths to config files
sbatch taxaminer.slurm -s seedfile.txt
Clone this wiki locally