JolTax

A vectorized taxonomy library for Python.

JolTax is a tool for working with large taxonomies like NCBI or GTDB. It stores the entire tree in contiguous NumPy arrays, enabling fast (like a jolt) traversals, clade queries, and mass annotation of datasets using Polars.

Key Features

Search: Exact and approximate matching to find TaxIDs from strings.
Clade Queries: Instantly identify all descendants of a TaxID.
Annotate: Instantly return a Polars dataframe with the complete canonical taxonomy (for any number of TaxIDs), for easy annotation of your own datasets.
Batch Processing: Get Lowest Common Ancestor (LCA) and node-to-node distances for thousands of TaxID pairs at once.
Array-Based Core: Uses NumPy operations for property lookups and tree traversals.
Pre-build: Build and save (cache) your taxonomies for instant loading later.

If you prefer an interactive experience for building and exploring taxonomies, a command-line interface is also available: JolTax-CLI.

Installation

From Bioconda (Recommended)

conda install -c bioconda joltax

From PyPI

pip install joltax

Requires: numpy, polars, rapidfuzz.

Quick Start

from joltax import JolTree

# Build from NCBI DMP files (dir where names.dmp and nodes.dmp are)
tree = JolTree('/path/to/ncbi/taxonomy/')

# Save a binary cache in dir "taxonomy_cache" for instant loading later
tree.save('taxonomy_cache')

# Load the cache
tree = JolTree.load('taxonomy_cache')

# Find a TaxID by name (fuzzy=False by default)
results = tree.search_name('Escherchia', fuzzy=True)

# Annotate a list of TaxIDs with their full canonical rank lineages
# Returns a Polars DataFrame with columns prefixed by 't_' (e.g., t_phylum, t_genus)
df = tree.annotate([9606, 562])

# Batch LCA calculation
lcas = tree.get_lca_batch(ids1, ids2)

Documentation

For a detailed API reference and a step-by-step guide, see USAGE.md.

Name		Name	Last commit message	Last commit date
Latest commit History 45 Commits
assets		assets
conda		conda
joltax		joltax
tests		tests
.gitignore		.gitignore
GEMINI.md		GEMINI.md
LICENSE		LICENSE
README.md		README.md
USAGE.md		USAGE.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

JolTax

Key Features

Installation

From Bioconda (Recommended)

From PyPI

Quick Start

Documentation

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

JolTax

Key Features

Installation

From Bioconda (Recommended)

From PyPI

Quick Start

Documentation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages