Skip to content

feat(etl): Wave 1 — usage_events + marts schema + base classes + backfill orchestrator#72

Merged
0bserver07 merged 1 commit into
mainfrom
feat/etl-foundation
May 5, 2026
Merged

feat(etl): Wave 1 — usage_events + marts schema + base classes + backfill orchestrator#72
0bserver07 merged 1 commit into
mainfrom
feat/etl-foundation

Conversation

@0bserver07
Copy link
Copy Markdown
Owner

Summary

  • Wave 1 of the ETL refactor (docs/specs/etl-architecture.md): adds the canonical usage_events fact table, 5 marts (daily_mart, session_mart, project_mart, provider_day_mart, model_day_mart), mart_watermark, and the indexes the spec calls out. Additive migration — existing tables untouched, all routes keep working.
  • New stackunderflow.etl package: Normalizer + MartBuilder ABCs, last-wins registries, watermark helpers (get_watermark / set_watermark / refresh_all_marts), and the backfill(conn, *, force=False) -> BackfillReport orchestrator skeleton. Empty registries until Wave 2 lands, so the orchestrator is a no-op for now — but the contract (BackfillReport field-set, refresh_all_marts return type, registry overwrite semantics) is pinned by 39 new tests so Waves 2 + 3 can dispatch in parallel.
  • New CLI: stackunderflow etl backfill [--force]. Smoke-tested against the maintainer's real ~1.9 GB store: applied v006_etl_layer.sql cleanly (user_version 5 → 6, all 7 new tables created), printed events_inserted=0, marts_refreshed=(none registered), existing stackunderflow status still reports the same totals.

Spec edits

  • Migration renumbered v004 → v006. The spec was written before v004 (synthetic-models cleanup) and v005 (cursor-workspace redistribute) shipped, so the spec's v004_etl_layer.sql collides with an existing file. The migration file is v006_etl_layer.sql; spec doc updated with a numbering note.
  • Status line updated to "Wave 1 landed."
  • Wave-dependency diagram now marks Wave 1 as complete and references migration v006.

Test plan

  • pytest tests/ -q1375 passed, 2 skipped (was 1373 + 2 pre-Wave-1; +39 new ETL tests, -2 hard-coded user_version == 5 literals updated to schema.CURRENT_VERSION).
  • ruff check stackunderflow/etl/ — clean.
  • Smoke test against ~/.stackunderflow/store.db: events_inserted=0, marts_refreshed={} as expected.
  • No version bump (still v0.6.1).
  • Migration is additive — verified existing routes (stackunderflow status) still serve the same totals after migration.

What's next

  • Wave 2 (parallel, 3 agents): claude/codex/cursor/cline normalizers, 5 mart builders, watchfiles-based filesystem watcher.
  • Wave 3 (parallel, 6 routes): swap cost-data / dashboard-data / projects / compare / optimize / yield from in-process aggregation to mart SELECTs.

…fill orchestrator

Wires the three-layer ETL foundation from docs/specs/etl-architecture.md.
Migration v006 adds usage_events + 5 marts + mart_watermark (additive,
existing tables untouched). New stackunderflow.etl package: Normalizer
+ MartBuilder ABCs with last-wins registries, watermark helpers, and a
backfill orchestrator skeleton. New CLI: stackunderflow etl backfill —
no-op until Wave 2 fills in the per-provider normalizers and mart
builders, but the contract (BackfillReport shape, refresh_all_marts
return type, registry semantics) is locked so Waves 2 and 3 can dispatch
in parallel against it.

Renumbered the migration from the spec's v004 to v006 (v004/v005 were
taken by the synthetic-models cleanup and cursor-workspace redistribute
that shipped between the spec and Wave 1) — the spec doc is updated to
match.

39 new tests (v006 migration shape, registry overwrite semantics,
watermark round-trip, backfill orchestrator + force=True). 1375 backend
tests pass total.
@0bserver07 0bserver07 merged commit cc0ca93 into main May 5, 2026
9 checks passed
0bserver07 added a commit that referenced this pull request May 6, 2026
Bumps to 0.7.0. Consolidates the [Unreleased] CHANGELOG entries from
the 11 ETL PRs (#72, #73, #74, #75, #76, #79, #81, #80, #78, #77, #82)
into a single [0.7.0] section.

New: docs/HANDOFF.md — state-of-the-codebase walkthrough for incoming
agents. Architecture map, recent history, key gotchas, what's left,
files-to-read-first.

End-state on the maintainer's real store:
  150,337 usage_events
  Marts populated and watermarks in sync
  Dashboard cold-load 2.5s → <50ms warm
  Watcher 155ms end-to-end source-file-write → dashboard-data-fresh

1598 backend tests passing, 2 skipped, 11 deselected (slow suite).
Frontend typecheck + build clean.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
0bserver07 added a commit that referenced this pull request May 20, 2026
…fill orchestrator (#72)

Wires the three-layer ETL foundation from docs/specs/etl-architecture.md.
Migration v006 adds usage_events + 5 marts + mart_watermark (additive,
existing tables untouched). New stackunderflow.etl package: Normalizer
+ MartBuilder ABCs with last-wins registries, watermark helpers, and a
backfill orchestrator skeleton. New CLI: stackunderflow etl backfill —
no-op until Wave 2 fills in the per-provider normalizers and mart
builders, but the contract (BackfillReport shape, refresh_all_marts
return type, registry semantics) is locked so Waves 2 and 3 can dispatch
in parallel against it.

Renumbered the migration from the spec's v004 to v006 (v004/v005 were
taken by the synthetic-models cleanup and cursor-workspace redistribute
that shipped between the spec and Wave 1) — the spec doc is updated to
match.

39 new tests (v006 migration shape, registry overwrite semantics,
watermark round-trip, backfill orchestrator + force=True). 1375 backend
tests pass total.
0bserver07 added a commit that referenced this pull request May 20, 2026
Bumps to 0.7.0. Consolidates the [Unreleased] CHANGELOG entries from
the 11 ETL PRs (#72, #73, #74, #75, #76, #79, #81, #80, #78, #77, #82)
into a single [0.7.0] section.

New: docs/HANDOFF.md — state-of-the-codebase walkthrough for incoming
agents. Architecture map, recent history, key gotchas, what's left,
files-to-read-first.

End-state on the maintainer's real store:
  150,337 usage_events
  Marts populated and watermarks in sync
  Dashboard cold-load 2.5s → <50ms warm
  Watcher 155ms end-to-end source-file-write → dashboard-data-fresh

1598 backend tests passing, 2 skipped, 11 deselected (slow suite).
Frontend typecheck + build clean.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@0bserver07 0bserver07 deleted the feat/etl-foundation branch May 20, 2026 03:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant