Spec 25: fork mode — replay session up to message K, swap model, regenerate

## Goal
The "what-if" tree. Replay a session up to message K, swap the model (or system prompt, or tools), regenerate from there, persist the fork as a child session linked to the parent. Lets you ask "would Sonnet have done this in fewer turns?" empirically.

## Why now
Comparative benchmark (Spec 26) needs this primitive. And separately, this is the killer "blackbox" feature — replay + fork is what makes a flight recorder useful.

## Schema
**v021** — `session_forks`:

```sql
CREATE TABLE session_forks (
  fork_session_id TEXT PRIMARY KEY,    -- the new session id
  parent_session_id TEXT NOT NULL,
  forked_at_message_seq INTEGER NOT NULL,
  forked_at_ts TEXT NOT NULL,
  fork_reason TEXT NOT NULL,            -- 'manual' | 'benchmark' | 'what-if'
  swap_model TEXT,                      -- new model id (NULL = unchanged)
  swap_system_prompt TEXT,              -- new system prompt (NULL = unchanged)
  swap_tools_json TEXT,                 -- new tool list (NULL = unchanged)
  status TEXT NOT NULL,                 -- 'running' | 'complete' | 'failed' | 'cancelled'
  created_ts TEXT NOT NULL,
  completed_ts TEXT,
  raw_outcome_json TEXT
);
CREATE INDEX idx_sf_parent ON session_forks(parent_session_id);
```

Plus add `is_fork BOOLEAN` and `parent_session_id TEXT` to `sessions` (additive).

## User-visible surface
- **CLI**: `stackunderflow fork session <id> --at-message <seq> --swap-model <new> [--swap-system-prompt FILE] [--reason what-if]`.
- **CLI**: `stackunderflow fork list <parent_session_id>`.
- **API**: `POST /api/playback/{id}/fork` with body `{at_message_seq, swap_model?, swap_system_prompt?, reason}` returns the new fork session id; `GET /api/playback/{id}/forks` lists children.
- **Meta-agent tool**: `fork_session(id, at_message_seq, swap_model)`.
- **UI**: extend Playback tab with a "Fork from here" button on each event.

## Implementation plan
1. v021 migration.
2. New service `stackunderflow/services/session_fork.py`:
   - `fork(conn, parent_id, at_seq, *, swap_model, swap_system_prompt, swap_tools, reason)` — uses Spec 24 (`reconstruct_context_at`) to rebuild the model context, calls Ollama (or queues for cloud-API depending on swap_model), persists assistant turns into a new session.
   - Background-task runner (similar to backfill_jobs.py pattern) — fork can take minutes; don't block the API.
3. CLI + API + meta-agent.
4. UI button.

## Tests
- Simple fork: 3-message parent, fork at msg 2, swap model → assert child session exists with parent link, msg 1 + msg 2 copied verbatim, msg 3+ are new.
- Mocked LLM: assert the LLM was called with the reconstructed context (per Spec 24).
- Cancellation: fork running, user cancels → status='cancelled', no zombie data.
- Idempotency: forking the same parent at the same seq with the same swap_model returns existing fork (don't duplicate).

## Hard parts
- LLM call orchestration. Forks against Ollama are simple (use the meta-agent route's machinery). Forks against cloud APIs need credentials + cost-cap (default refuse cloud; require explicit `--allow-cloud` + budget cap).
- Tool execution during fork. If the original session called `Bash`, what does the fork do? v1 answer: don't actually execute tools — record the model's tool-call intent only. v2 could execute in a sandboxed dir. Document this constraint loudly.
- Context cost. Reconstructing 200K tokens of context per fork costs real money. Surface estimated cost before forking; refuse if over a threshold.

## Out of scope
- Sandboxed tool execution during fork (defer).
- Cloud-API forks by default (opt-in only, with budget cap).
- "Fork tree" visualization (defer to v2).

## Dependencies
- **Blocked by**: Spec 24 (context-window replay).
- Consumed by Spec 26 (comparative benchmark).

## Estimated effort
**Size XL** — single agent, ~3-4 hr. Background-job orchestration + LLM-loop machinery is the bulk.

## Hard rules
- DO NOT touch versions / CHANGELOG headings.
- Pre-assigned schema slot: **v021**.
- Branch: `feat/session-fork-mode` off main.
- Default to LOCAL Ollama only; cloud requires explicit opt-in flag.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Spec 25: fork mode — replay session up to message K, swap model, regenerate #98

Goal

Why now

Schema

User-visible surface

Implementation plan

Tests

Hard parts

Out of scope

Dependencies

Estimated effort

Hard rules

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Spec 25: fork mode — replay session up to message K, swap model, regenerate #98

Description

Goal

Why now

Schema

User-visible surface

Implementation plan

Tests

Hard parts

Out of scope

Dependencies

Estimated effort

Hard rules

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions