feat(etl): Wave 2B — 5 mart builders (daily, session, project, provider_day, model_day) by 0bserver07 · Pull Request #74 · 0bserver07/StackUnderflow

0bserver07 · 2026-05-05T04:28:58Z

Summary

Wave 2B of the three-layer ETL refactor (see docs/specs/etl-architecture.md). Ships five MartBuilder subclasses under stackunderflow/etl/marts/ — indexed read-side rollups derived from usage_events:

DailyMartBuilder — (day, project_id, provider, model, speed) rollup
SessionMartBuilder — one row per session with lifetime aggregates + is_one_shot flag + primary_model
ProjectMartBuilder — one row per project with lifetime totals
ProviderDayMartBuilder — (day, provider) rollup for the by-provider chart
ModelDayMartBuilder — (day, model, speed) rollup for compare-across-agents

Each builder is watermarked (mart_watermark.last_event_id per mart, independently tracked) and idempotent. Two refresh patterns:

Additive marts (daily, provider_day, model_day): INSERT ... ON CONFLICT DO UPDATE adds incremental events into existing rows. After the additive upsert, COUNT(DISTINCT session_id) columns are recomputed for affected keys via a follow-up UPDATE — DISTINCT counts don't sum across refresh windows without double-counting sessions that span the boundary.
Per-entity marts (session, project): INSERT OR REPLACE over a subquery that re-aggregates from scratch for affected session_id / project_id values. New events invalidate prior per-entity aggregates, so the row gets rewritten in full.

rebuild_from_scratch() does DELETE FROM <mart>; refresh(conn, since_event_id=0) — drops + full backfill, same final state as incremental.

The Wave 1 foundation pieces (Normalizer/MartBuilder ABCs, marts/normalize registries, watermark helpers, v006_etl_layer.sql schema migration) are scaffolded in this branch so the marts can be exercised end-to-end before Wave 1 lands. They mirror the spec contract; Wave 1's PR will reconcile any textual overlap when it merges first.

Notes

No version bump — still v0.6.1.
Migration file is v006_etl_layer.sql (not v004_* as the spec text says): two migrations (v004 synthetic-models cleanup, v005 cursor-workspace redistribute) shipped between the spec being written and this PR.
Marts only depend on the usage_events table contents — no provider-specific logic, no import from stackunderflow.etl.normalize.
Test count: 1341 → 1374 (+33 new tests, 2 skipped unchanged).

Test plan

Per-mart unit tests — tests/stackunderflow/etl/marts/test_<mart>.py (8 daily, 4 provider_day, 4 model_day, 7 session, 5 project)
Integration test — 100 synthetic events spanning 3 days × 2 providers × 3 models, full pipeline + cost conservation + watermark contract + two-window-incremental matches one-shot rebuild
pytest tests/ -q clean: 1374 passed, 2 skipped
ruff check stackunderflow/etl/ clean
ruff format stackunderflow/etl/ --check clean

…er_day, model_day) Wave 2B of the three-layer ETL pipeline (see docs/specs/etl-architecture.md). Five MartBuilder subclasses under stackunderflow/etl/marts/, each watermarked and idempotent. Additive marts (daily, provider_day, model_day) refresh via INSERT ... ON CONFLICT DO UPDATE so re-runs after partial failure self-heal. Their COUNT(DISTINCT session_id) columns are recomputed for affected keys after the additive upsert — DISTINCT counts don't sum across refresh windows without double-counting sessions that span the boundary. Per-entity marts (session, project) use INSERT OR REPLACE over a re-aggregated subquery so a new event for an existing session/project invalidates the prior aggregate and recomputes from all events. rebuild_from_scratch() drops + repopulates each mart from scratch for the --rebuild path. Same final state as a clean incremental run. Foundation pieces (Normalizer/MartBuilder ABCs, marts/normalize registries, watermark helpers, v006_etl_layer.sql migration) are scaffolded here so the marts can be exercised end-to-end before Wave 1 lands. They mirror the spec contract exactly; Wave 1's PR will reconcile any textual overlap when it merges. 33 new tests in tests/stackunderflow/etl/marts/ (8 daily, 4 provider_day, 4 model_day, 7 session, 5 project, 5 integration). Total suite: 1341 → 1374 passing. No version bump (still v0.6.1).

Bumps to 0.7.0. Consolidates the [Unreleased] CHANGELOG entries from the 11 ETL PRs (#72, #73, #74, #75, #76, #79, #81, #80, #78, #77, #82) into a single [0.7.0] section. New: docs/HANDOFF.md — state-of-the-codebase walkthrough for incoming agents. Architecture map, recent history, key gotchas, what's left, files-to-read-first. End-state on the maintainer's real store: 150,337 usage_events Marts populated and watermarks in sync Dashboard cold-load 2.5s → <50ms warm Watcher 155ms end-to-end source-file-write → dashboard-data-fresh 1598 backend tests passing, 2 skipped, 11 deselected (slow suite). Frontend typecheck + build clean. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…er_day, model_day) (#74) Wave 2B of the three-layer ETL pipeline (see docs/specs/etl-architecture.md). Five MartBuilder subclasses under stackunderflow/etl/marts/, each watermarked and idempotent. Additive marts (daily, provider_day, model_day) refresh via INSERT ... ON CONFLICT DO UPDATE so re-runs after partial failure self-heal. Their COUNT(DISTINCT session_id) columns are recomputed for affected keys after the additive upsert — DISTINCT counts don't sum across refresh windows without double-counting sessions that span the boundary. Per-entity marts (session, project) use INSERT OR REPLACE over a re-aggregated subquery so a new event for an existing session/project invalidates the prior aggregate and recomputes from all events. rebuild_from_scratch() drops + repopulates each mart from scratch for the --rebuild path. Same final state as a clean incremental run. Foundation pieces (Normalizer/MartBuilder ABCs, marts/normalize registries, watermark helpers, v006_etl_layer.sql migration) are scaffolded here so the marts can be exercised end-to-end before Wave 1 lands. They mirror the spec contract exactly; Wave 1's PR will reconcile any textual overlap when it merges. 33 new tests in tests/stackunderflow/etl/marts/ (8 daily, 4 provider_day, 4 model_day, 7 session, 5 project, 5 integration). Total suite: 1341 → 1374 passing. No version bump (still v0.6.1).

Bumps to 0.7.0. Consolidates the [Unreleased] CHANGELOG entries from the 11 ETL PRs (#72, #73, #74, #75, #76, #79, #81, #80, #78, #77, #82) into a single [0.7.0] section. New: docs/HANDOFF.md — state-of-the-codebase walkthrough for incoming agents. Architecture map, recent history, key gotchas, what's left, files-to-read-first. End-state on the maintainer's real store: 150,337 usage_events Marts populated and watermarks in sync Dashboard cold-load 2.5s → <50ms warm Watcher 155ms end-to-end source-file-write → dashboard-data-fresh 1598 backend tests passing, 2 skipped, 11 deselected (slow suite). Frontend typecheck + build clean. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

0bserver07 force-pushed the feat/etl-marts branch from 872198d to 38716c1 Compare May 5, 2026 04:35

0bserver07 merged commit cc4ce29 into main May 5, 2026
7 of 9 checks passed

0bserver07 deleted the feat/etl-marts branch May 5, 2026 04:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(etl): Wave 2B — 5 mart builders (daily, session, project, provider_day, model_day)#74

feat(etl): Wave 2B — 5 mart builders (daily, session, project, provider_day, model_day)#74
0bserver07 merged 1 commit into
mainfrom
feat/etl-marts

0bserver07 commented May 5, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

0bserver07 commented May 5, 2026

Summary

Notes

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant