feat(etl): Wave 1 — usage_events + marts schema + base classes + backfill orchestrator#72
Merged
Merged
Conversation
…fill orchestrator Wires the three-layer ETL foundation from docs/specs/etl-architecture.md. Migration v006 adds usage_events + 5 marts + mart_watermark (additive, existing tables untouched). New stackunderflow.etl package: Normalizer + MartBuilder ABCs with last-wins registries, watermark helpers, and a backfill orchestrator skeleton. New CLI: stackunderflow etl backfill — no-op until Wave 2 fills in the per-provider normalizers and mart builders, but the contract (BackfillReport shape, refresh_all_marts return type, registry semantics) is locked so Waves 2 and 3 can dispatch in parallel against it. Renumbered the migration from the spec's v004 to v006 (v004/v005 were taken by the synthetic-models cleanup and cursor-workspace redistribute that shipped between the spec and Wave 1) — the spec doc is updated to match. 39 new tests (v006 migration shape, registry overwrite semantics, watermark round-trip, backfill orchestrator + force=True). 1375 backend tests pass total.
0bserver07
added a commit
that referenced
this pull request
May 6, 2026
Bumps to 0.7.0. Consolidates the [Unreleased] CHANGELOG entries from the 11 ETL PRs (#72, #73, #74, #75, #76, #79, #81, #80, #78, #77, #82) into a single [0.7.0] section. New: docs/HANDOFF.md — state-of-the-codebase walkthrough for incoming agents. Architecture map, recent history, key gotchas, what's left, files-to-read-first. End-state on the maintainer's real store: 150,337 usage_events Marts populated and watermarks in sync Dashboard cold-load 2.5s → <50ms warm Watcher 155ms end-to-end source-file-write → dashboard-data-fresh 1598 backend tests passing, 2 skipped, 11 deselected (slow suite). Frontend typecheck + build clean. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
0bserver07
added a commit
that referenced
this pull request
May 20, 2026
…fill orchestrator (#72) Wires the three-layer ETL foundation from docs/specs/etl-architecture.md. Migration v006 adds usage_events + 5 marts + mart_watermark (additive, existing tables untouched). New stackunderflow.etl package: Normalizer + MartBuilder ABCs with last-wins registries, watermark helpers, and a backfill orchestrator skeleton. New CLI: stackunderflow etl backfill — no-op until Wave 2 fills in the per-provider normalizers and mart builders, but the contract (BackfillReport shape, refresh_all_marts return type, registry semantics) is locked so Waves 2 and 3 can dispatch in parallel against it. Renumbered the migration from the spec's v004 to v006 (v004/v005 were taken by the synthetic-models cleanup and cursor-workspace redistribute that shipped between the spec and Wave 1) — the spec doc is updated to match. 39 new tests (v006 migration shape, registry overwrite semantics, watermark round-trip, backfill orchestrator + force=True). 1375 backend tests pass total.
0bserver07
added a commit
that referenced
this pull request
May 20, 2026
Bumps to 0.7.0. Consolidates the [Unreleased] CHANGELOG entries from the 11 ETL PRs (#72, #73, #74, #75, #76, #79, #81, #80, #78, #77, #82) into a single [0.7.0] section. New: docs/HANDOFF.md — state-of-the-codebase walkthrough for incoming agents. Architecture map, recent history, key gotchas, what's left, files-to-read-first. End-state on the maintainer's real store: 150,337 usage_events Marts populated and watermarks in sync Dashboard cold-load 2.5s → <50ms warm Watcher 155ms end-to-end source-file-write → dashboard-data-fresh 1598 backend tests passing, 2 skipped, 11 deselected (slow suite). Frontend typecheck + build clean. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
docs/specs/etl-architecture.md): adds the canonicalusage_eventsfact table, 5 marts (daily_mart,session_mart,project_mart,provider_day_mart,model_day_mart),mart_watermark, and the indexes the spec calls out. Additive migration — existing tables untouched, all routes keep working.stackunderflow.etlpackage:Normalizer+MartBuilderABCs, last-wins registries, watermark helpers (get_watermark/set_watermark/refresh_all_marts), and thebackfill(conn, *, force=False) -> BackfillReportorchestrator skeleton. Empty registries until Wave 2 lands, so the orchestrator is a no-op for now — but the contract (BackfillReport field-set,refresh_all_martsreturn type, registry overwrite semantics) is pinned by 39 new tests so Waves 2 + 3 can dispatch in parallel.stackunderflow etl backfill [--force]. Smoke-tested against the maintainer's real ~1.9 GB store: appliedv006_etl_layer.sqlcleanly (user_version5 → 6, all 7 new tables created), printedevents_inserted=0, marts_refreshed=(none registered), existingstackunderflow statusstill reports the same totals.Spec edits
v004_etl_layer.sqlcollides with an existing file. The migration file isv006_etl_layer.sql; spec doc updated with a numbering note.Test plan
pytest tests/ -q— 1375 passed, 2 skipped (was 1373 + 2 pre-Wave-1; +39 new ETL tests, -2 hard-codeduser_version == 5literals updated toschema.CURRENT_VERSION).ruff check stackunderflow/etl/— clean.~/.stackunderflow/store.db:events_inserted=0, marts_refreshed={}as expected.stackunderflow status) still serve the same totals after migration.What's next