Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
60 changes: 60 additions & 0 deletions .codex/ledger/2026-05-06-MEMORY-sqlite-migration-drift.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
Goal (incl. success criteria):

- Fix global SQLite migration identity drift that prevents seamless `./bin/agh daemon stop/start`.
- Success: canonical migration registry preserves already-recorded versions 17-20, observed-history DB upgrades to v22, guardrail lesson/instructions land, focused tests and `make verify` pass or blockers are reported with evidence.

Constraints/Assumptions:

- Use root-cause fix only; do not weaken migration integrity checks and do not manually edit the live `~/.agh/agh.db`.
- Persist accepted Plan Mode plan under `.codex/plans/` before execution.
- Conversation in BR-PT; code/docs/artifacts in English.
- Use RTK for shell commands and avoid destructive git commands.

Key decisions:

- Restore registry order to observed DB history: v17 task orchestration profile, v18 review gate, v19 notification cursors, v20 bridge task subscriptions, v21 network conversation containers, v22 memory v2 events.
- No one-pass repair unless evidence appears for DBs created by the inverse broken order.
- Add durable guardrails in `docs/_memory/lessons`, root `AGENTS.md`/`CLAUDE.md`, and internal `AGENTS.md`/`CLAUDE.md`.

State:

- Registry, tests, lesson, and instruction guardrails are patched. Focused Go tests, isolated daemon restart proof, and full `make verify` passed.

Done:

- Confirmed live DB records v17-v20 as task/bridge migrations.
- Confirmed current code expects network migration at v17 and shifts v17-v20 to v18-v21.
- Accepted plan produced in chat.
- Persisted accepted plan in `.codex/plans/2026-05-06-sqlite-migration-append-only.md`.
- Restored `internal/store/globaldb.globalSchemaMigrations` append-only order: v17 task orchestration profile, v18 review gate, v19 notification cursors, v20 bridge subscriptions, v21 network conversations, v22 memory events.
- Added migration identity/order contract helpers and observed-history upgrade coverage for the real v17-v20 prefix.
- Added lesson `docs/_memory/lessons/L-021-schema-migration-identity-is-append-only.md` and updated lessons index.
- Added guardrails to root/internal `AGENTS.md` and `CLAUDE.md`.
- Ran `gofmt` on touched Go files.
- Focused verification passed: `go test ./internal/store ./internal/store/globaldb -count=1 -race`.
- Attempted `scripts/check-test-conventions.py`, but the helper script is absent in this repository.
- Built local `./bin/agh` with the fixed registry (`make build` exit 0; Vite chunk-size warning emitted).
- Isolated daemon proof passed against a SQLite backup of `/Users/pedronauck/.agh/agh.db`: `AGH_HOME=/tmp/agh-migration-proof.HvgC9J`, HTTP `127.0.0.1:63350`, socket `/tmp/agh-migration-proof.HvgC9J/daemon.sock`; first start appended v21 `rebuild_network_conversation_containers` and v22 `memv2_memory_events`, stop/start repeated without integrity mismatch.
- Full verification passed: `make verify` exit 0. Output included the existing Vite chunk-size warning and macOS linker warning from `golangci-lint`; no command failed.

Now:

- Prepare final report with verification evidence.

Next:

- None.

Open questions (UNCONFIRMED if needed):

- None.

Working set (files/ids/commands):

- `.codex/plans/2026-05-06-sqlite-migration-append-only.md`
- `.codex/ledger/2026-05-06-MEMORY-sqlite-migration-drift.md`
- `internal/store/globaldb/global_db.go`
- `internal/store/globaldb/global_db*_test.go`
- `docs/_memory/lessons/L-021-schema-migration-identity-is-append-only.md`
- `docs/_memory/lessons/README.md`
- `AGENTS.md`, `CLAUDE.md`, `internal/AGENTS.md`, `internal/CLAUDE.md`
50 changes: 50 additions & 0 deletions .codex/plans/2026-05-06-sqlite-migration-append-only.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
# Fix Global SQLite Migration Order Drift And Add Guardrails

## Summary

- Root cause confirmed: the daemon correctly refuses to start because `globalSchemaMigrations` changed the identity of an already-applied migration version.
- The observed `/Users/pedronauck/.agh/agh.db` records `17=add_task_orchestration_profile_schema`, `18=add_task_review_gate_schema`, `19=add_notification_cursors`, `20=add_bridge_task_subscriptions`.
- Current code incorrectly expects `17=rebuild_network_conversation_containers` and shifted the previously recorded task/bridge migrations to `18..21`.
- The fix restores append-only migration identity, keeps strict integrity mismatch failures, adds regression coverage for this exact history, and documents the rule in durable project memory plus active agent instructions.

## Key Changes

- Restore the canonical global migration order in `internal/store/globaldb/global_db.go`:
- `17 add_task_orchestration_profile_schema`
- `18 add_task_review_gate_schema`
- `19 add_notification_cursors`
- `20 add_bridge_task_subscriptions`
- `21 rebuild_network_conversation_containers`
- `22 memv2_memory_events`
- Update network conversation migration tests to use `networkConversationMigrationVersion = 21` and seed legacy network DBs from the corrected pre-network history.
- Add a regression test that seeds a DB matching the observed local history through migration `20`, with legacy `network_timeline_log.interaction_id`, then opens it through `OpenGlobalDB` and asserts no integrity mismatch, network migration v21, memory migration v22, intact task/bridge schema, and idempotent reopen.
- Add an append-only registry contract test for the known global migration sequence, emphasizing versions `17..22`.
- Preserve strict integrity behavior in `store.RunMigrations`; do not accept arbitrary mismatches or edit `schema_migrations` in place.
- Do not add one-pass repair unless real DBs are found with the broken inverse sequence.

## Documentation Guardrails

- Add `docs/_memory/lessons/L-021-schema-migration-identity-is-append-only.md`.
- Update `docs/_memory/lessons/README.md` with `L-021`.
- Update root `AGENTS.md` and `CLAUDE.md` under `### Schema Migrations` with the append-only registry rule.
- Update `internal/AGENTS.md` and `internal/CLAUDE.md` with an `internal/store` migration invariant.

## Public Interfaces / Data Contract

- No HTTP, UDS, CLI, OpenAPI, web, or config contract changes.
- The internal data contract is made explicit: global SQLite migration numbers, names, and checksums are immutable once applied anywhere meaningful.
- Fresh DB final schema remains the same. Existing DBs with the observed `17..20` history upgrade by applying only missing migrations `21` and `22`.

## Test Plan

- Run `go test ./internal/store ./internal/store/globaldb -count=1 -race`.
- Run an isolated daemon upgrade proof using a temp copy of `/Users/pedronauck/.agh/agh.db`.
- Verify lesson and instruction guardrails landed in the intended files.
- Run `make verify`.

## Assumptions

- The selected implementation scope is registry and tests, not generic migration-runner redesign.
- The observed local DB history is valid and must be preserved.
- The live `/Users/pedronauck/.agh/agh.db` will not be manually mutated during validation.
- Persistent artifacts are written in English.
Original file line number Diff line number Diff line change
@@ -0,0 +1,203 @@
# ACP SDK v0.6.3 to v0.12.2 Breaking-Change Audit

Task: provider-model-catalog Task 06

This audit was produced before migrating AGH code from `github.com/coder/acp-go-sdk` `v0.6.3` to `v0.12.2`.

## Sources Checked

- Current AGH usage under `internal/acp`, `internal/session`, and API contract conversion code.
- Local module cache for `github.com/coder/acp-go-sdk@v0.6.3`.
- Local module cache for `github.com/coder/acp-go-sdk@v0.12.2`.
- Zed ACP references under `.resources/zed/crates/agent_ui/src/config_options.rs`, `.resources/zed/crates/acp_thread/src/connection.rs`, and `.resources/zed/crates/agent_servers/src/acp.rs`.
- Harnss ACP config cache/set reference under `.resources/harnss/src/types/window.d.ts`.

## AGH ACP Symbols Currently Used

AGH production and test code currently uses these ACP SDK symbols directly:

- `acpsdk.Agent`, `acpsdk.AgentSideConnection`, `acpsdk.NewAgentSideConnection`
- `acpsdk.ClientCapabilities`, `acpsdk.AgentCapabilities`, `acpsdk.InitializeRequest`, `acpsdk.InitializeResponse`
- `acpsdk.FileSystemCapability`
- `acpsdk.NewSessionRequest`, `acpsdk.NewSessionResponse`
- `acpsdk.LoadSessionResponse`
- `acpsdk.SessionId`
- `acpsdk.SessionModeId`, `acpsdk.SessionModeState`, `acpsdk.AvailableSessionMode`
- `acpsdk.SessionModelState`, `acpsdk.ModelInfo`, `acpsdk.ModelId`
- `acpsdk.SetSessionModeRequest`, `acpsdk.SetSessionModeResponse`
- `acpsdk.SetSessionModelRequest`, `acpsdk.SetSessionModelResponse`
- `acpsdk.CancelNotification`
- `acpsdk.PromptRequest`, `acpsdk.PromptResponse`, `acpsdk.PromptStopReason`, `acpsdk.PromptResponseStopReasonEndTurn`
- `acpsdk.SessionNotification`, `acpsdk.SessionUpdate`
- `acpsdk.RequestError`
- `acpsdk.RequestPermissionRequest`, `acpsdk.RequestPermissionToolCall`
- `acpsdk.KillTerminalCommandRequest`, `acpsdk.KillTerminalCommandResponse`
- `acpsdk.ContentBlock`, `acpsdk.ContentBlockText`
- `acpsdk.AgentMethodSessionLoad`, `acpsdk.AgentMethodSessionPrompt`, `acpsdk.AgentMethodSessionCancel`
- `acpsdk.AgentMethodSessionSetMode`, `acpsdk.AgentMethodSessionSetModel`
- `acpsdk.ClientMethodSessionUpdate`

AGH also intentionally uses a local `wireLoadSessionRequest` wrapper because AGH needs to preserve the existing `additional_dirs` wire field used by its ACP integration.

## Changed Symbols and Required AGH Impact

### Session Creation and Loading

`NewSessionResponse` and `LoadSessionResponse` now include:

- `ConfigOptions []SessionConfigOption json:"configOptions,omitempty"`
- `Meta map[string]any json:"meta,omitempty"` instead of an unconstrained `any` meta field.

AGH impact:

- `captureCaps` must accept and store `ConfigOptions` from both `session/new` and `session/load`.
- Resume/load paths must capture config options exactly like new-session paths.
- Existing mode/model capture remains valid, but config options take precedence for active session model/reasoning controls.

### Config Option Types

`v0.12.2` introduces these wire types:

- `SessionConfigOption`
- `SessionConfigOptionSelect`
- `SessionConfigOptionBoolean`
- `SessionConfigId`
- `SessionConfigValueId`
- `SessionConfigSelectOptions`
- `SessionConfigSelectOptionsUngrouped`
- `SessionConfigSelectOptionsGrouped`
- `SessionConfigSelectOption`
- `SessionConfigOptionCategory`
- `SessionConfigOptionUpdate`

AGH impact:

- Add AGH-owned session config option state rather than leaking SDK union types into public session state.
- Convert only known option shapes needed by AGH consumers. Select options are required for model/reasoning changes; boolean options should be preserved for contract visibility but not used for model/reasoning selection.
- Flatten grouped and ungrouped select values into a stable payload while preserving each option ID, label, current value, and valid values.

### Config Option Mutation Method

`v0.12.2` adds:

- `AgentMethodSessionSetConfigOption = "session/set_config_option"`
- `SetSessionConfigOptionRequest`
- `SetSessionConfigOptionResponse`
- `SetSessionConfigOptionValueId`
- `SetSessionConfigOptionBoolean`
- `ClientSideConnection.SetSessionConfigOption`

AGH impact:

- Model changes must prefer `session/set_config_option` when a conservative model config option exists and contains the requested value.
- Reasoning effort must prefer `session/set_config_option` when a conservative reasoning config option exists and contains the requested value.
- AGH should update active session config option state from the response's returned `configOptions`.

### Legacy Model Mutation

`AgentMethodSessionSetModel` remains on the wire as `session/set_model`, but the request/response symbols changed:

- Removed or renamed: `SetSessionModelRequest`
- Removed or renamed: `SetSessionModelResponse`
- New names: `UnstableSetSessionModelRequest`, `UnstableSetSessionModelResponse`
- The agent-side interface moved this handler to `AgentExperimental.UnstableSetSessionModel`.

AGH impact:

- Production fallback must use `UnstableSetSessionModelRequest` and `UnstableSetSessionModelResponse`.
- ACP test helper agents must implement `UnstableSetSessionModel` instead of `SetSessionModel`.
- Fallback must only run when config options are absent and legacy `SessionModelState.AvailableModels` advertises the requested model.

### Session Update Notifications

`SessionUpdate` now includes:

- `ConfigOptionUpdate *SessionConfigOptionUpdate`
- `SessionInfoUpdate`
- Existing `AvailableCommandsUpdate`, `CurrentModeUpdate`, and `UsageUpdate` remain.

AGH impact:

- `config_option_update` notifications must mutate the active process/session config option state even when no prompt event is currently being emitted.
- Notification translation can continue producing a system event, but state capture must not depend on prompt activity.

### Session Model and Mode State

`SessionModeState`, `AvailableSessionMode`, `SessionModelState`, and `ModelInfo` remain structurally compatible for AGH's existing needs, with meta fields now represented as `map[string]any`.

AGH impact:

- Existing mode capture and `session/set_mode` behavior can remain.
- Existing model list capture can remain as the legacy fallback surface.

### Client Capabilities

`ClientCapabilities.Fs` remains, but the filesystem capability type was renamed:

- Removed or renamed: `FileSystemCapability`
- New name: `FileSystemCapabilities`

AGH impact:

- Initialization must construct `acpsdk.FileSystemCapabilities`.

### Prompt Metadata

`PromptRequest.Meta` now uses `map[string]any`.

AGH impact:

- AGH's structured `PromptMeta` must be converted to a map before being assigned to the SDK prompt request.
- The conversion must avoid ignored marshal/unmarshal errors.

### Terminal Kill Request Types

The client-side terminal kill symbols were renamed:

- Removed or renamed: `KillTerminalCommandRequest`
- Removed or renamed: `KillTerminalCommandResponse`
- New names: `KillTerminalRequest`, `KillTerminalResponse`

AGH impact:

- The AGH terminal kill handler signature and return values must use the new symbols.

### Permission Tool Call Type

`RequestPermissionRequest.ToolCall` is now the existing `ToolCallUpdate` type instead of a dedicated `RequestPermissionToolCall` type.

AGH impact:

- Permission display helpers must accept `acpsdk.ToolCallUpdate`.

### Cancellation and Request Errors

`CancelNotification` and `RequestError` remain available. `v0.12.2` adds `NewRequestCancelled` and uses a cancellation error code constant.

AGH impact:

- Existing request error handling should continue compiling unless code referenced removed field names.
- No AGH production code currently depends on renamed cancellation fields in the audited ACP paths.

## Conservative Matching Rules for AGH

Model config option detection:

- Prefer exact config option ID `model`.
- Allow only explicitly documented local fixture IDs after `model`; do not match display names or categories.
- Send only values present in the select option's advertised values.

Reasoning config option detection:

- Prefer exact config option ID `reasoning_effort`.
- Then allow exact `effort`.
- Send only values present in the select option's advertised values.
- Never derive valid reasoning levels from catalog metadata such as `supports_reasoning`.

## Required Verification After Migration

- Focused ACP tests for `session/new`, `session/load`, `config_option_update`, set-config model, set-config reasoning, legacy fallback, and no invented reasoning levels.
- Focused session manager tests for start option propagation and legacy resume behavior.
- Contract tests proving a named `SessionConfigOptionPayload` is exposed.
- `make codegen` and `make codegen-check` if the public API contract changes.
- Full `make verify` before completion and again after the local commit.
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@
# BUG-NNN: <short-title>

**Severity:** Critical | High | Medium | Low
**Priority:** P0 | P1 | P2 | P3
**Type:** Functional | UI | Performance | Security | Data | Crash
**Status:** Open
**Discovered During:** TC-FUNC-NNN | TC-INT-NNN | TC-PERF-NNN | TC-SEC-NNN | TC-UI-NNN | TC-REG-NNN | TC-SCEN-NNN
**Reporter:** <agent / operator>
**Created:** YYYY-MM-DD
**Last Updated:** YYYY-MM-DD

## Environment

- **Build:** <commit SHA>
- **OS:** <darwin | linux>
- **Browser:** <chromium-version> (only for UI bugs)
- **URL / Endpoint:** <route or surface>
- **Bootstrap manifest:** <path inside qa/lab/>
- **Lab root / runtime home / ports:** <from manifest>
- **Live provider/LLM:** <provider-backed evidence, exact blocked boundary, or "not in scope">

## Summary

<One-paragraph observable failure description.>

## Behavioral Impact

- **Operator/User Goal:** <goal blocked or degraded>
- **Agent Behavior:** <expected vs actual agent behavior, if applicable>
- **Business Outcome:** <outcome blocked, degraded, or at risk>
- **Cross-Surface State:** <CLI/API/Web/runtime mismatch, or "none">

## Reproduction

```bash
# Verbatim commands (paths from bootstrap manifest)
```

Observed before fix:

- <observable result line by line>

## Expected

<Correct behavior, with TechSpec / ADR reference.>

## Root Cause

<Source-of-failure analysis, file paths and line numbers when known.>

## Fix

<Production change description.>

## Verification

- <narrow reproduction rerun command>
- <regression test added: file path>
- <broader gate rerun: `make verify`, focused suite, etc.>

## Impact

- **Users Affected:** <all / subset / specific role>
- **Frequency:** <always / sometimes / rarely>
- **Workaround:** <describe or "none">

## Related

- Test Case: <TC-ID>
- TechSpec Invariant: <SI-N>
- ADR: <ADR-NN>
- Logs / artifacts: <paths>
Loading
Loading