Skip to content

Releases: aayush4vedi/drift-spark

v0.5.0

11 Jun 02:48

Choose a tag to compare

First public release of Drift — a Spark-native embedding lifecycle library.

Added

  • embed() — declarative batch embedding with cross-run dedup (MD5 hash scoped to (model, sink)), batching, exponential backoff, idempotent point IDs, and per-run cost tracking. shadow_mode=True runs with deterministic mock vectors and no API key.
  • watch() — incremental CDC refresh over the Delta Change Data Feed; handles insert / update_postimage / delete and writes the version watermark back to the ledger.
  • migrate() — model upgrades via dual-write and drift-adapter (Orthogonal Procrustes) strategies, with an ARR ≥ 0.97 quality gate (AdapterQualityError).
  • Ledger — SQLite lineage ledger (~/.drift/ledger.db) with cost_by_model(), provenance(), and recent_runs().
  • drift CLIembed, watch, migrate, status.
  • Qdrant and pgvector sinks (pgvector write-only for now).