Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,8 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
- **Prescriptive cost (Optimize surface)** — `/api/optimize/prescriptions` turns waste findings into actions: model-routing recommendations built from per-model spend + v026 reasoning-token attribution (with dollar deltas priced through `compute_cost`, never invented), and a CLAUDE.md slimming preview (`POST /api/optimize/claudemd-preview`) that emits a unified diff + per-rule savings. Preview-only by construction — the server never writes user files (enforced by an AST source-scan test).
- **Backup state capture** — `backup create` now copies the four critical `~/.stackunderflow` artifacts (`store.db`, `search_index.db`, `qa_pairs.db`, `tags.json`) into `<backup>/stackunderflow-state/` via the SQLite online-backup API. Previously a fresh backup failed its own `backup verify` and a restore silently lost search/Q&A/tags.
- **Context-window replay (issue #96, beta)** — reconstruct what the model "saw" at a point in a session. `stackunderflow context-replay <session> --at <seq>` / `GET /api/context-replay/{session}?at=<seq>` return the ordered message sequence that had accumulated as the model's context up to that turn — each turn's role, a content preview, an estimated token footprint, its tool calls, and a running token total. Read-only + advisory (an unknown session is empty-but-valid, never an error); same-project fencing and a per-session read-through cache like `/api/forks`; `--json` emits the shared `stackunderflow.memory/1` agent-output envelope. A new beta **Context Replay** dashboard tab scrubs the `seq` cutoff and watches the token total grow. MVP semantics = "the session's message sequence up to `seq`" (harness-side context eviction is a later refinement).
- **Multi-device sync (issue #100, opt-in)** — sync your cost/usage **aggregates** (never raw transcripts) across machines through your own S3-compatible bucket, client-side-encrypted with `age` (zero-knowledge; the bucket sees only ciphertext). `stackunderflow sync init/push/pull/status` (import-guarded `[sync]` extra). Push writes only encrypted mart shards under this device's prefix (two-phase manifest commit, skip-if-unchanged); `pull` merges peers into a unified view at `GET /api/sync/overview?scope=all-devices` — aggregates union at the stable `(provider, slug)` grain, sessions deduped by id. Default-off and byte-identical when unconfigured; additive `v028`/`v029` migrations.
- **Proactive nudges (issue #97, opt-in, default-off)** — before a Bash command that has failed in a cluster of your past sessions (or after a recurring error signature), a hook injects a one-line, locally-derived heads-up ("`npm install` has failed in 3 recent sessions here — mostly Timeout"; "this error recurred in 4 sessions; the ones that moved past it ran X next"). A governance layer (per-session dedupe/cap, cross-session cooldown, dismiss-driven adaptive quieting; state in a file-locked JSON, never the DB) prevents nudge fatigue; hooks stay fast, LLM-free, silent-on-failure, and never block a tool. A "What almost bit me" panel in the Health tab surfaces recent would-have-fired nudges with dismiss controls.

### Fixed

Expand Down
48 changes: 44 additions & 4 deletions docs/specs/active-surfacing.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,13 @@

*Design spec for issue #97. Product design owned by the maintainer — this is a spec, not an implementation. No code, no schema migration, no version edits.*

> **Status:** Phase 0 (governance retrofit), Phase 1 (command-cluster nudge), and
> **Phase 2 (error-signature foresight + "What almost bit me" dashboard panel)**
> have shipped (`stackunderflow/hooks/proactive.py`, `hooks/handlers.py`+`templates.py`,
> `routes/patterns.py`, `stackunderflow-ui/.../CodingHealthTab.tsx`). Phase 3
> (prompt-similarity) remains speculative / embeddings-gated. See §9 for the
> per-phase "as built" notes.

## 0. Corrections to the issue body (read first)

The issue was written before campaign #5/#6 shipped. Three parts of its implementation plan should change, and one framing is now inaccurate:
Expand All @@ -27,10 +34,10 @@ Everything else in the issue (default-off, opt-in flag, no schema, "just a nudge
| File-about-to-be-edited has failure/revert history | ✅ | `recall.py`, `inject._pre_tool_use_context` | Retrofit governance only |
| Prompt lexically matches a past decision | ✅ | `inject._user_prompt_context` | — |
| Project digest at session start | ✅ | `inject._session_start_context` | — |
| **Command about to run is in a failure cluster** | | `patterns.command_clusters` computed, surfaced nowhere live | **NEWMVP** |
| **Recurring error signature + what fixed it last time** | | `patterns.error_signatures` + `resolution_hints`, surfaced nowhere live | **NEW — Phase 2** |
| **Command about to run is in a failure cluster** | | `proactive.command_cluster_block` (PreToolUse/Bash) | **SHIPPEDPhase 1** |
| **Recurring error signature + what fixed it last time** | | `proactive.error_signature_block` / `build_posttool_nudge` (PostToolUse/Bash) | **SHIPPED — Phase 2** |
| **Prompt *semantically* similar to a `failed` session** | ❌ | needs embeddings (Spec 10) | **Speculative — Phase 3** |
| **Any anti-fatigue governance** (cap, dedupe, snooze, adaptive quieting) | | nothing exists | **NEW — MVP, the crux** |
| **Any anti-fatigue governance** (cap, dedupe, snooze, adaptive quieting) | | `proactive.should_surface` / `admit` + `proactive_state.json` | **SHIPPED — MVP, the crux** |

**Verdict, stated plainly:** the file-edit nudge is done. The load-bearing new value in #97 is **(a) command-cluster triggers, (b) a governance layer the shipped hooks lack entirely, and (c) an optional dashboard surface for phrasing + tuning.** If we skip (b) and just add more triggers, we make the shipped feature *worse* (more noise, no throttle). Governance is not a "hard part" bullet at the end — it is the reason to do this spec.

Expand Down Expand Up @@ -183,9 +190,42 @@ Wrap the *existing* `recall.py` output in the governance layer (§4): per-sessio
**Phase 1 — Command-cluster nudge (the MVP new value).**
Precompute/cache `command_clusters`; add the PreToolUse/Bash command-head lookup + template renderer to the recall path, under governance. Ship template-only. This is the "smallest genuinely-useful *new* nudge."

**Phase 2 — Error-signature foresight + dashboard panel.**
**Phase 2 — Error-signature foresight + dashboard panel. ✅ SHIPPED (campaign #8).**
New PostToolUse/Bash nudge using `error_signatures` + `resolution_hints`. Add the "What almost bit me" section to `CodingHealthTab` with dismiss controls writing governance state. Optional Tier-2 LLM phrasing behind `proactive_llm_phrasing`.

*As built:*
- **Hook:** new id `stackunderflow-posttool-nudge` (PostToolUse, matcher `Bash`),
installed alongside recall/inject by `hooks install --inject`, dispatched via
`handlers.run` → `proactive.build_posttool_nudge`. It extracts the errored
`tool_response` body (`proactive._error_body_from_response` — stderr/error/
`is_error` content only; a clean result is silent), normalises it with
`patterns._normalise_signature` **reused verbatim** for signature-key parity,
looks it up O(1) in the precomputed cache (`refresh_signal_cache` now also
emits `error_signatures`), and fires only when `session_count >= 2` **and**
`resolution_hints` is non-empty. Rides the *same* governance layer as Phase 1
(`should_surface`/`admit`, dedupe/cap/cooldown/adaptive-quieting). Emits only
`hookSpecificOutput.additionalContext` — a PostToolUse hook can never block
the tool (it already ran); never a `decision`/deny; error/timeout → empty,
exit 0.
- **Type gate:** `error-signature` is a first-class type in `_KNOWN_TYPES`, so it
is governed by the `proactive_types` allowlist like the others. The shipped
`settings.py` default (`command-cluster,file-risk`) does **not** include it yet
(that default + the `--proactive` install flag are maintainer-owned), so until
the maintainer widens the default, enabling it is `proactive_enabled=1` **plus**
adding `error-signature` to `proactive_types` (or `STACKUNDERFLOW_PROACTIVE_TYPES`).
- **Dashboard:** `CodingHealthTab` gained a "What almost bit me" panel listing the
would-have-fired nudges (command-cluster + file-risk + error-signature) from the
existing `/api/patterns` report, each with **Dismiss** (fingerprint scope) and
**Don't show again** (type scope) controls. Both call the new
`POST /api/patterns/dismiss`, which computes the fingerprint with the *same*
`proactive.make_signal` Tier-1 uses and calls `proactive.record_dismissal` —
so a dashboard dismiss lands on the exact governance key the in-session gate
reads (round-trip verified). The endpoint writes only `proactive_state.json`,
never the store.
- **Deferred:** Tier-2 LLM phrasing (`proactive_llm_phrasing`) — the template
renderer is the shipped floor, as spec'd; LLM polish stays a later, opt-in,
dashboard-only add.

**Phase 3 — Prompt-similarity (only if embeddings ship and Phases 1–2 earn trust).**
Semantic match vs. `failed` sessions on UserPromptSubmit. Highest annoyance risk; build last, guard hardest, or defer indefinitely.

Expand Down
12 changes: 8 additions & 4 deletions docs/specs/multi-device-sync.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Multi-Device Sync — opt-in, client-side-encrypted, bring-your-own bucket

**Status:** Design spec (issue #100, Spec 28 — `needs-design`, `size-xl`, wave-6). No code yet.
**Status:** Phases 1–2 shipped (issue #100, Spec 28 — `size-xl`, wave-6). Phase 1 (one-way encrypted backup, schema v028) and Phase 2 (two-way multi-device read, schema v029) are implemented under `stackunderflow/sync/` + `routes/sync.py`; Phases 3–4 remain design.
**Audience:** maintainer; anyone implementing cross-device sync.
**Scope:** aggregate one person's several machines (laptop + work box + dev container) into one analytics view, by pushing **client-side-encrypted aggregates** to the user's **own** S3-compatible bucket. Zero-knowledge: the bucket stores ciphertext only, no StackUnderflow-hosted service exists, and **raw transcripts never leave the machine**.
**Unblocks the issue's gate:** issue #100 stays `needs-design` "until `docs/specs/sync-protocol-v1.md` lands." This document is that spec (filename `docs/specs/multi-device-sync.md`).
Expand Down Expand Up @@ -309,11 +309,15 @@ Each phase is independently useful and shippable.

**Phase 0 — design.** This document. Satisfies the issue's `needs-design` gate.

**Phase 1 — MVP: one-way, encrypted backup-to-bucket.**
**Phase 1 — MVP: one-way, encrypted backup-to-bucket. (SHIPPED — schema v028.)**
`sync init`, `sync push`, `sync status`. Encrypt the Overview/Cost-core marts → the user's own prefix. No pull, no merge. Delivers an **off-site, zero-knowledge encrypted backup of your aggregates** and exercises the whole stack — keys, `age`, `ObjectStore`, canonical serialization, outbox, manifest commit — end to end at minimum risk. New module `stackunderflow/sync/`: `keys.py`, `cipher.py`, `bucket.py`, `serialize.py`, `runner.py`; the `sync` Click group beside `backup` in `cli.py`; the additive migration (`sync_identity`, `sync_outbox`).

**Phase 2 — two-way: multi-device read.**
`sync pull` + `sync/merge.py` union overlay + `sync_cursors` + `sync_remote_devices` + `<mart>_remote` tables + the `?scope=all-devices` read path and `GET /api/sync/status`. This is the issue's headline goal: laptop + work + dev-container in one analytics view.
**Phase 2 — two-way: multi-device read. (SHIPPED — schema v029.)**
`sync pull` + `sync/merge.py` union overlay + `sync_cursors` + `sync_remote_devices` + the five `<mart>_remote` landing tables + the `?scope=all-devices` read path and `routes/sync.py` (`GET /api/sync/status` + `GET /api/sync/overview`). This is the issue's headline goal: laptop + work + dev-container in one analytics view.

`sync pull` LISTs every *other* device's prefix (skipping our own), fetches + decrypts each `manifest.age`, enforces the monotonic-generation replay guard (§3.4), and downloads only shards whose content-hash moved since the last pull — **idempotent: an unchanged peer downloads nothing** (only the tiny per-device manifest, the commit point, is re-read). Each shard's plaintext hash is re-verified before it REPLACE-lands into `<mart>_remote` (month-scoped, so re-ingesting one month never wipes a device's others) and its `sync_cursors` row advances. Pull is strictly **read-only against the bucket** — it never PUTs to any prefix — and never writes `usage_events` / `price_book` / transcripts. `sync/merge.py` then overlays `local (JOIN projects for slug) UNION ALL <mart>_remote`, SUMming at the stable `(provider, slug, …)` grain, and dedups `session_mart` by the globally-unique `session_id` (deterministic local-then-lowest-device tiebreak) into a `merge_warnings` counter (§5.3).

**Read surface — default-off, byte-identical.** The merged view is opt-in behind an explicit **`?scope=all-devices`**; the default `this-device` scope runs no union at all, so the existing dashboard path is unchanged and off the mart `<100ms` fast-path. `GET /api/sync/overview?scope=all-devices` returns the merged totals / per-day trend / per-project / per-provider-day / per-device breakdown + `merge_warnings`; `GET /api/sync/status` reports local config, known peers, and whether cross-device data is available. CLI: `stackunderflow sync pull` (add `--json` for a scriptable envelope); the merged dashboard read is then live at `/api/sync/overview?scope=all-devices`.

**Phase 3 — daemon, pruning, hardening.**
`sync auto --enable` — a daemon-thread continuous push modeled on `etl/watcher.py` (`watchfiles`, debounce) under a single-instance lock modeled on `etl/lock.py`. Retention: prune shards older than `--keep-months` **after** merge confirmation; GC orphan shards not referenced by the current manifest. Optional opaque-manifest layout (§4.4).
Expand Down
5 changes: 3 additions & 2 deletions docs/specs/session-schema-v1.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Session Schema v1 — open exchange format for AI coding sessions

**Status:** v1 (pinned to `schema_version = 28`).
**Status:** v1 (pinned to `schema_version = 29`).
**Audience:** anyone writing a tool that wants to read from, or write to, the StackUnderflow store without reverse-engineering the SQL.
**Scope:** the local SQLite schema at `~/.stackunderflow/store.db`. This document is the source of truth for the on-disk shape; the migrations under `stackunderflow/store/migrations/` (`.sql` DDL and `.py` data migrations) are the reference implementation.

Expand All @@ -20,7 +20,7 @@ The schema described here is **additive-only**. Any future column requires a new

## Schema version

Pin to `schema_version = 28`. The current migration set is:
Pin to `schema_version = 29`. The current migration set is:

| version | file | what it adds |
|---|---|---|
Expand Down Expand Up @@ -51,6 +51,7 @@ Pin to `schema_version = 28`. The current migration set is:
| 26 | `v026_reasoning_tokens.sql` | adds `usage_events.reasoning_tokens` (reasoning/"thinking" attribution — an additive-metadata SUBSET of `output_tokens`, never priced; `DEFAULT 0`) |
| 27 | `v027_worktree_of.sql` | adds `projects.worktree_of` (nullable parent-project slug for git-worktree fragment projects; NULL = normal project — campaign #8) |
| 28 | `v028_sync_identity_outbox.sql` | `sync_identity` + `sync_outbox` (opt-in multi-device sync — device identity + per-shard push watermark; #100 Phase 1) |
| 29 | `v029_sync_pull_landing.sql` | `sync_cursors` + `sync_remote_devices` + five `<mart>_remote` landing tables (opt-in multi-device sync — pull watermarks + remote aggregate landing for the cross-device merge; #100 Phase 2) |

Version 15 was reserved during planning and never created — the sequence skips from 14 to 16 by design. The migration runner keys on the leading `vNNN`, so the gap is harmless.

Expand Down
Loading
Loading