Fast and flexible GTDB querying from the command line, built in Rust.
xgt is a command-line tool for querying the Genome Taxonomy Database (GTDB) directly from your terminal or scripts. It covers the core GTDB REST API, genome cards, metadata, taxonomic history, taxon lineages, and search, and adds features designed for real research workflows: batch input from files or stdin, automatic pagination, retry logic, flexible output formats (JSON, CSV, TSV), and per-item file splitting for large datasets.
It is written in Rust for speed, portability, and a single self-contained binary with no runtime dependencies.
Download the binary for your platform from the releases page and place it somewhere on your $PATH.
| Platform | Download |
|---|---|
| Linux x86_64 | xgt-vX.X.X-x86_64-unknown-linux-gnu.tar.xz |
| Linux aarch64 | xgt-vX.X.X-aarch64-unknown-linux-gnu.tar.xz |
| macOS Apple Silicon | xgt-vX.X.X-aarch64-apple-darwin.tar.xz |
| macOS Intel | xgt-vX.X.X-x86_64-apple-darwin.tar.xz |
| Windows x86_64 | xgt-vX.X.X-x86_64-pc-windows-msvc.zip |
SHA-256 checksums are provided alongside each archive.
Requires Rust 1.85 or later (install via rustup).
git clone https://github.com/Ebedthan/xgt.git
cd xgt
cargo build --release
# binary is at target/release/xgt
cp target/release/xgt ~/.local/bin/# Search all genomes assigned to a genus
xgt search g__Escherichia
# Get the full card for a genome
xgt genome GCA_000005845.2
# Get the taxonomic classification of a taxon
xgt taxon g__Escherichia
# Compare a genome's taxonomy between two GTDB releases
xgt diff GCA_000005845.2 --from R214 --to R220xgt search [OPTIONS] [QUERY]
Searches the GTDB for genomes matching a query string. The query can be a
taxon name, accession, organism name, or any field depending on --field.
Options
| Flag | Short | Description |
|---|---|---|
--field STR |
-F |
Search field: all (default), acc, org, gtdb, ncbi |
--word |
-w |
Match whole words only |
--rep |
-r |
Restrict to GTDB species representatives |
--type |
-t |
Restrict to NCBI type material |
--id |
-i |
Print only genome accessions |
--count |
-c |
Print only the count of matched genomes |
--file FILE |
-f |
Read queries from FILE, one per line; use - for stdin |
--out FILE |
-o |
Write output to FILE instead of stdout |
--outfmt STR |
-O |
Output format: csv (default), tsv, json |
--insecure |
-k |
Disable SSL certificate verification |
Examples
# Search for a genus, output as JSON
xgt search g__Escherichia -O json
# Search from a file, output TSV to a file
xgt search -f genera.txt -O tsv -o results.tsv
# Read from stdin, count results per query
xgt search -f - --count
# Restrict to species representatives, print accessions only
xgt search g__Bacillus --rep --id
# Pipe to downstream tools
xgt search g__Salmonella --id | wc -lThe search automatically paginates through all results regardless of dataset size. Queries on large genera like Escherichia return complete results without truncation.
xgt genome [OPTIONS] [ACCESSION]
Fetches data for one or more genome accessions. By default returns the
full genome card (taxonomy, assembly statistics, CheckM quality, NCBI
metadata). Use --metadata for a lightweight metadata-only response,
or --history for the full taxonomic history across all GTDB releases.
Options
| Flag | Short | Description |
|---|---|---|
--metadata |
-m |
Retrieve genome metadata instead of full card |
--history |
-H |
Retrieve taxonomic history across all releases |
--file FILE |
-f |
Read accessions from FILE, one per line; use - for stdin |
--out FILE |
-o |
Write output to FILE instead of stdout |
--outfmt STR |
-O |
Output format: json (default), csv, tsv |
--insecure |
-k |
Disable SSL certificate verification |
Examples
# Full genome card
xgt genome GCA_000005845.2
# Lightweight metadata in CSV
xgt genome GCA_000005845.2 --metadata -O csv
# Taxonomic history across all GTDB releases
xgt genome GCA_000005845.2 --history
# Batch: process 200 accessions from a file, write CSV
xgt genome -f accessions.txt -O csv -o results.csv
# Batch: read accessions from stdin
cat accessions.txt | xgt genome -f -
# Batch: write one JSON file per accession
xgt genome -f accessions.txt --split --split-dir genome_cards/When processing a file of accessions, a progress bar is shown on stderr so that stdout output can be safely piped to downstream tools.
xgt taxon [OPTIONS] [NAME]
Retrieves information about a GTDB taxon. The taxon name must use the
standard rank prefix format (e.g. g__Escherichia, s__Escherichia coli).
Valid prefixes are d__, p__, c__, o__, f__, g__, s__.
Options
| Flag | Short | Description |
|---|---|---|
--search |
-s |
Search for a taxon name in the current release |
--all |
Search for a taxon name across all releases | |
--genomes |
-g |
List genomes assigned to the taxon |
--reps |
-r |
With --genomes, return species representatives only |
--word |
-w |
Match whole words only |
--file FILE |
-f |
Read taxon names from FILE, one per line; use - for stdin |
--out FILE |
-o |
Write output to FILE instead of stdout |
--outfmt STR |
-O |
Output format: json (default), csv, tsv |
--insecure |
-k |
Disable SSL certificate verification |
Examples
# Full taxonomic lineage for a genus
xgt taxon g__Escherichia
# Search for a taxon name in the current release
xgt taxon --search g__Escherichia
# Search across all GTDB releases
xgt taxon --all g__Escherichia
# List all genomes in a taxon
xgt taxon g__Escherichia --genomes
# List only species representatives
xgt taxon g__Escherichia --genomes --reps
# Output genome list as CSV
xgt taxon g__Escherichia --genomes -O csv -o genomes.csvxgt diff [OPTIONS] [ACCESSION] --from RELEASE
Shows how the taxonomic classification of a genome changed between two
GTDB releases. Requires --from; if --to is omitted, the latest
available release for the genome is used. Release identifiers use the
format R<number> (e.g. R214, R220, R226).
Options
| Flag | Short | Description |
|---|---|---|
--from RELEASE |
Source release to compare from (required) | |
--to RELEASE |
Target release to compare to (default: latest) | |
--file FILE |
-f |
Read accessions from FILE, one per line; use - for stdin |
--out FILE |
-o |
Write output to FILE instead of stdout |
--outfmt STR |
-O |
Output format: json (default), csv, tsv |
--split |
-s |
Write one file per accession |
--split-dir DIR |
Directory for per-item files (requires --split) |
|
--insecure |
-k |
Disable SSL certificate verification |
Examples
# Compare a single genome between two releases
xgt diff GCA_000005845.2 --from R214 --to R220
# Compare against the latest release
xgt diff GCA_000005845.2 --from R207
# Batch comparison, CSV output
xgt diff -f accessions.txt --from R214 --to R220 -O csv -o changes.csv
# One diff file per accession
xgt diff -f accessions.txt --from R214 --to R220 --split --split-dir diffs/Output (JSON)
{
"query": "GCA_000005845.2",
"from_release": "R214",
"to_release": "R220",
"changed": true,
"changes": [
{
"rank": "species",
"from": "Escherichia coli",
"to": "Escherichia coli_D"
}
],
"from_taxonomy": {
"release": "R214",
"domain": "d__Bacteria",
"phylum": "p__Pseudomonadota",
"class": "c__Gammaproteobacteria",
"order": "o__Enterobacterales",
"family": "f__Enterobacteriaceae",
"genus": "g__Escherichia",
"species": "s__Escherichia coli"
},
"to_taxonomy": { "..." : "..." }
}# Generate an accession list from another tool and pipe directly
grep "Escherichia" my_study_metadata.tsv | cut -f1 | xgt genome -f - -O csv# Get all accessions in a genus, then fetch their cards
xgt search g__Rhizobium --id > accessions.txt
xgt genome -f accessions.txt -O csv -o rhizobium_cards.csv# Find which genomes in your study changed between R214 and R220
xgt diff -f study_accessions.txt --from R214 --to R220 \
-O csv -o taxonomy_changes.csv
# Filter to only changed genomes
awk -F',' '$4 == "true"' taxonomy_changes.csv# One JSON file per genome in a results directory
xgt genome -f accessions.txt --split --split-dir results/All subcommands support --outfmt csv|tsv|json (except search, which
defaults to csv; all others default to json).
CSV and TSV output is suitable for loading directly into R, Python, or spreadsheet tools. JSON output preserves all nested fields and is recommended when the full genome card is needed.
All output goes to stdout by default. Use --out FILE to write to a
file, or --split to write one file per query when processing batches.
Generate a completion script for your shell:
# Bash
xgt completions bash > ~/.local/share/bash-completion/completions/xgt
# Zsh (ensure ~/.zfunc is in your fpath)
xgt completions zsh > ~/.zfunc/_xgt
# Fish
xgt completions fish > ~/.config/fish/completions/xgt.fishFound a bug or want to request a feature? Open an issue.
Please include:
- OS and architecture
- xgt version (
xgt --version) - The command you ran and the output or error you received
Dual-licensed under the MIT License and the Apache 2.0 License. You may use either at your option.
- Minimum Rust version: 1.85
- Follows Semantic Versioning
- Contributions welcome! Please open an issue before submitting large changes