feat: multi-provider foundation (SessionRef extension + provider pricers) by 0bserver07 · Pull Request #20 · 0bserver07/StackUnderflow

0bserver07 · 2026-04-30T22:19:30Z

Summary

Implements Steps 1 + 2 of docs/specs/multi-provider/spec.md — the prerequisite work so Wave 2 (Cursor + Cline adapters) can land without further refactoring of the adapter contract or cost layer. No new providers in this PR.

§1 — Adapter contract extension

SessionRef gains optional source_kind: Literal["file", "database"] = "file" and source_hint: dict[str, Any] | None = None. JSONL adapters need zero changes.
ingest_log migrated from file_path PRIMARY KEY to (id INTEGER PRIMARY KEY, file_path, session_id TEXT, storage_kind TEXT CHECK(...), mtime, size, processed_offset NULLABLE, last_rowid NULLABLE, last_ingest_ts) plus two partial unique indexes — one for file-mode rows (session_id IS NULL), one for database-mode rows. Existing rows survive with session_id=NULL, storage_kind='file', last_rowid=NULL. Migration is via CREATE … INSERT SELECT … DROP … RENAME inside one transaction; idempotent because apply() reads PRAGMA user_version and skips already-applied migrations.
run_ingest() branches on ref.source_kind: file-mode resumes via processed_offset keyed on WHERE session_id IS NULL; database-mode resumes via last_rowid keyed on (file_path, session_id).
Writer stores max(record.seq) for both kinds — for database mode that's a rowid, for file mode it's the byte offset of the last consumed line. Adapters skip records with seq <= since_offset so the storage-aware contract test (test_read_since_offset_is_storage_aware, spec §1.4) passes for both providers.

§2 — Provider pricer scaffold

New stackunderflow/infra/providers/ package: ProviderPricer ABC plus AnthropicPricer and OpenAIPricer extracted from the old infra/costs.py heuristic. get_pricer(provider) returns a singleton, falls back to Anthropic for unknown providers.
infra/costs.compute_cost(tokens, model, provider="anthropic") is now a thin shim — the default keeps every existing call site working unchanged. Aggregator collectors take a provider= constructor arg resolved once from ds.records[0].provider; records carry it through from RawEntry/TaggedEntry populated from projects.provider in build_enriched_dataset.
Codex normalization moved out of adapters/codex.py into OpenAIPricer.normalize_tokens. The adapter now delegates to the pricer instead of inlining the cached-input subtraction + reasoning fold. A parametrised regression test (tests/stackunderflow/infra/providers/test_codex_cost_equivalence.py) proves the move is cost-neutral by computing the same (model, raw-token) bundle with both the legacy convention (caller pre-normalises) and the new one (pricer normalises) — they match for 5 fixtures including a 100%-cached-input edge case.

Conservative choice (not in spec)

The spec wanted adapters/codex.py to emit raw OpenAI shape into Record. The 4-slot Record dataclass (input_tokens, output_tokens, cache_create_tokens, cache_read_tokens) doesn't have a clean place for reasoning_output_tokens to live without re-purposing cache_create_tokens (which would corrupt downstream cache stats). I chose the smallest-diff path: the adapter still emits 4-slot canonical shape into the DB, but the flattening logic now physically lives in OpenAIPricer.normalize_tokens — the adapter calls into the pricer instead of inlining the math. The cost-equivalence regression test guarantees the dollar number doesn't change. A future PR can switch the adapter to raw shape once Record grows a fifth slot or the encoding question gets resolved.

Test plan

pytest tests/ -q — 484 passed, 2 skipped (baseline was 444 + 2; +40 covering the new pricers, the contract test, the v002 migration, and the cost-equivalence regression).
pytest tests/stackunderflow/adapters/contract.py tests/stackunderflow/adapters/test_claude.py tests/stackunderflow/adapters/test_codex.py -q — 29 passed.
cd stackunderflow-ui && npm run typecheck — clean.
cd stackunderflow-ui && npm run build — clean.
Smoke-ingest against a local Codex project to verify the v002 migration applies cleanly to a real ~/.stackunderflow/store.db.

🤖 Generated with Claude Code

…ers) Implements steps 1+2 of docs/specs/multi-provider/spec.md — the prerequisite work for Wave 2 (Cursor + Cline adapters). No new providers in this PR. §1 adapter contract: - SessionRef gains optional source_kind ("file" | "database") and source_hint dict; defaults keep existing JSONL adapters working unchanged. - ingest_log migrated from `file_path PRIMARY KEY` to a new shape with session_id, storage_kind, last_rowid columns plus two partial unique indexes (one for file-mode session_id IS NULL rows, one for db-mode session_id IS NOT NULL rows) so SQLite's NULL-distinct-in-UNIQUE behaviour doesn't let duplicate file rows in. Existing rows survive with session_id=NULL, storage_kind='file', last_rowid=NULL. - run_ingest branches on ref.source_kind to compute resume offset; writer stores max(seq) for both kinds (rowid for db, byte offset of last line for file). Adapters now skip records with seq <= since_offset so the storage-aware contract test from spec §1.4 passes for both Claude and Codex fixtures. §2 provider pricers: - New infra/providers/ package: ProviderPricer ABC + AnthropicPricer + OpenAIPricer, registered by name with singleton instances and an Anthropic fallback for unknown providers. - infra/costs.compute_cost is now a thin shim: pricer.normalize_tokens(...) → pricer.compute(...). Default provider="anthropic" keeps every existing call site (~10 in stats/aggregator.py, plus routes/cost, store/queries) working. - Codex's cached-input subtraction moved out of adapters/codex.py and into OpenAIPricer.normalize_tokens. Adapter now delegates to the pricer instead of inlining the subtract+fold-reasoning logic. A parametrised regression test (test_codex_cost_equivalence.py) proves the move is cost-neutral against the legacy normalisation. - Aggregator collectors take provider= per project; the value is resolved once from ds.records[0].provider (records carry it through from RawEntry/TaggedEntry, populated from projects.provider in build_enriched_dataset). Tests: 484 passing, 2 skipped (was 444 + 2 baseline; +40 covering the new pricers, the contract test, the migration, and the regression fixture). Frontend typecheck + build clean. Conservative-equivalence note for the PR body: the Codex adapter still emits 4-slot canonical token shape into Record (input_tokens excludes cached, output_tokens includes reasoning) so the SQLite messages table and per-message cache stats see the same values as before. The flattening logic moved to OpenAIPricer.normalize_tokens — the adapter calls into the pricer rather than inlining the math, satisfying "normalization lives in the pricer." A future change can switch the adapter to emit raw shape and have the pricer flatten on read. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…UI polish Wave 2 of the multi-provider initiative. Extends the adapter contract to support SQLite-backed and vscdb sources alongside the existing JSONL flow, ships beta adapters for Cursor and Cline (both opt-in via env var), and adds provider chips + estimated-cost markers to the dashboard. Bundles PRs #20, #21, #22, #23, #24. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…UI polish (#25) Wave 2 of the multi-provider initiative. Extends the adapter contract to support SQLite-backed and vscdb sources alongside the existing JSONL flow, ships beta adapters for Cursor and Cline (both opt-in via env var), and adds provider chips + estimated-cost markers to the dashboard. Bundles PRs #20, #21, #22, #23, #24. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

0bserver07 and others added 2 commits April 30, 2026 18:18

fix: ruff E501 in aggregator trends entry

2644bf0

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

0bserver07 merged commit 637d965 into main Apr 30, 2026
9 checks passed

0bserver07 mentioned this pull request Apr 30, 2026

release: 0.4.0 — multi-provider foundation, Cursor + Cline adapters, UI polish #25

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: multi-provider foundation (SessionRef extension + provider pricers)#20

feat: multi-provider foundation (SessionRef extension + provider pricers)#20
0bserver07 merged 2 commits intomainfrom
wave2/foundation

0bserver07 commented Apr 30, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

0bserver07 commented Apr 30, 2026

Summary

§1 — Adapter contract extension

§2 — Provider pricer scaffold

Conservative choice (not in spec)

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant