Skip to content

v1.3.0: gnomAD Exome Frequency Cache

Choose a tag to compare

@dial481 dial481 released this 08 Jun 05:12
· 3 commits to main since this release

Added

  • gnomAD population allele frequencies. New GnomadAnnotator enriches
    report annotations with population frequency context from gnomAD v4.1 exomes
    (~16M variants, 730K individuals). Pre-built cache downloaded from HuggingFace
    via db update. Frequency column in terminal, HTML, and JSON reports.
    --no-gnomad flag to skip.
  • CPIC fallback for PharmGKB. db update succeeds when CPIC API is
    unreachable — reuses cached allele function data. Recovery auto-triggers on
    next successful check.
  • Graceful db update. Individual annotator download failures print an error
    and continue to remaining annotators instead of aborting the entire update.
  • scripts/build_gnomad_cache.py — streaming VCF build script. Downloads ~120GB
    gnomAD exome VCFs over HTTPS (or reads local files with --local-dir), never
    saves VCFs to disk, outputs ~6GB SQLite (~3GB gzipped).
  • JSON report schema_version bumped to "2" (added allele_frequency field).
    Diff engine accepts both v1 and v2 baselines.
  • gnomAD ODbL v1.0 attribution in HTML and JSON reports.
  • CI workflow (.github/workflows/ci.yml) — lint + test on push/PR to main.

Fixed

  • Offline claim in README corrected: analysis runs offline by default with opt-out
    freshness check, not opt-in network access.
  • __del__ partial-init crash on PharmGKB constructor failure.
  • .gitignore updated for GWAS Catalog test data.

Changed

  • Pre-push hook reduced to version-tag check only (CI runs the full suite).

Technical

  • Composite primary key (chrom, pos, ref, alt) on gnomad_frequencies
    preserves multi-allelic sites (rsID-only PK silently dropped ~20% of records).
  • Coordinate columns indexed for future AlphaMissense/CADD integration.
  • MAX(af) GROUP BY rsid in lookup queries handles multiple rows per rsID.
  • 951 tests, 93%+ coverage.