Skip to content

v0.1.0 — scikit-bio for DuckDB/SQL

Choose a tag to compare

@rustyconover rustyconover released this 01 Jul 19:54

First release of vgi-scikit-bio, a VGI worker exposing
scikit-bio to DuckDB/SQL — ~90 functions across 5 schemas:

  • sequence — GC content, reverse complement, complement, transcription,
    translation (incl. six-frame), validation, sequence distances, k-mer & residue
    composition
  • alignment — global/local pairwise alignment (scores and aligned strings)
  • diversity — the full alpha-diversity metric family (aggregates),
    beta-diversity distance matrices, phylogenetic Faith's PD & UniFrac, and
    rarefaction
  • stats — PCA/CA/PCoA ordination, PERMANOVA/ANOSIM/Mantel tests,
    CLR/ILR/ALR (and inverse) compositional transforms, and ANCOM /
    Dirichlet-multinomial differential abundance
  • tree — neighbour joining / UPGMA / minimum evolution, Newick inspection,
    and tree comparison (Robinson–Foulds, cophenetic)

Runs over stdio (DuckDB spawns it) or HTTP; a multi-arch container image is
published to ghcr.io/query-farm/vgi-scikit-bio. MIT licensed.

INSTALL vgi FROM community; LOAD vgi;
ATTACH 'skbio' (TYPE vgi, LOCATION 'vgi-scikit-bio');
SELECT skbio.sequence.gc_content('ATGCGGATTACAGG');