Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
62 changes: 62 additions & 0 deletions .codex/ledger/2026-05-06-MEMORY-memory-llm-mode.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
Goal (incl. success criteria):

- Verify and fix the config validation finding: `memory.controller.mode = "llm"` must be rejected when `memory.controller.llm.enabled = false`.
- Success means the issue is proven against current code, fixed minimally, covered by focused tests, and validated.

Constraints/Assumptions:

- Always prefix shell commands with `rtk`.
- Do not run destructive git commands (`git restore`, `git checkout`, `git reset`, `git clean`, `git rm`) without explicit user permission.
- Conversation in Brazilian Portuguese; code/tests/artifacts in English.
- Go runtime work requires `internal/CLAUDE.md`, `agh-code-guidelines`, `golang-pro`, and bug/test skills.
- `make verify` is the required final project gate unless explicitly blocked by unrelated pre-existing failures.

Key decisions:

- Treat the CodeRabbit item as a still-unverified bug until current `internal/config` code and tests prove it.
- Keep the fix localized to config validation unless investigation shows another validator path.
- The nearby `1656-1675` block is `MemoryControllerLLMConfig.Validate`, not a second `MemoryControllerConfig` validator; the guard belongs in the parent validator because it needs both `Mode` and `LLM.Enabled`.

State:

- Started 2026-05-06 in `/Users/pedronauck/Dev/compozy/agh3`.
- Loaded RTK, `internal/CLAUDE.md`, required Go/test/debugging/no-workaround skills, and relevant mem-v2 ledgers.
- Confirmed current code reproduces the issue: focused test returned `Validate() error = nil` for `mode="llm"` and `LLM.Enabled=false`.
- Complete; final gate passed after all code changes.

Done:

- Scanned existing ledger files for cross-agent awareness.
- Read relevant prior Memory v2 ledgers: `mem-v2`, `memory-contract`, `memv2-real-qa`.
- Added regression coverage in `internal/config/memory_v2_config_test.go`.
- Patched `MemoryControllerConfig.Validate` to reject `mode="llm"` when `LLM.Enabled=false`.
- Verified red/green behavior with the focused regression.
- `go test ./internal/config -count=1` passed: 581 tests.
- `go test -race ./internal/config -count=1` passed: 581 tests.
- `make lint` passed with `0 issues`.
- `git diff --check` passed.
- `make verify` passed with exit code 0.

Now:

- Final response.

Next:

- None.

Open questions (UNCONFIRMED if needed):

- None.

Working set (files/ids/commands):

- `.codex/ledger/2026-05-06-MEMORY-memory-llm-mode.md`
- `internal/config/config.go`
- `internal/config/memory_v2_config_test.go`
- `rtk go test ./internal/config -run TestMemoryV2ConfigValidationRejectsInvalidValues/Should_reject_llm_controller_mode_with_disabled_llm -count=1`
- `rtk go test ./internal/config -count=1`
- `rtk go test -race ./internal/config -count=1`
- `rtk make lint`
- `rtk git diff --check`
- `rtk make verify`
148 changes: 148 additions & 0 deletions .codex/ledger/2026-05-06-MEMORY-memv2-real-qa.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,148 @@
Goal (incl. success criteria):

- Run a release-grade real-scenario QA pass for `.compozy/tasks/mem-v2`.
- Use `real-scenario-qa` and the QA bootstrap flow, with real CLI/API/Web/runtime evidence.
- Per latest user instruction, cover both `consumer-saas-growth` and `northstar-pay` playbooks for a faithful QA spread.
- Latest user requirement: live LLM validation must use Claude Code as ACP because agent memory behavior is critical.
- Delegate the QA report authoring portion to Claude Code Opus through `compozy`, per user request.
- Success means the auditor verdict, runtime observations, provider boundary, and bootstrap continuation block are recorded; real AGH defects are fixed with verification if found.
- New user correction: do not stop at reporting QA errors; fix the `$qa-execution` errors/root-cause runtime defects and re-verify.

Constraints/Assumptions:

- Conversation in Brazilian Portuguese; artifacts in English.
- Always use `/Users/pedronauck/.codex/RTK.md`; shell commands must be prefixed with `rtk`.
- Never run destructive git commands (`git restore`, `git checkout`, `git reset`, `git clean`, `git rm`) without explicit user permission.
- After operator kickoff, do not send further prompts to agents under test; stalls are bugs.
- Claude Code ACP must be attempted as a real provider-backed path. If unavailable/uncredentialed, record exact boundary in `provider-attempt.json` and verdict cannot be live-provider PASS.
- `make verify` is the required final gate after code changes; if no code changes, record applicable QA verification instead.
- New independent QA pass should use a fresh lab unless continuing a known active manifest.

Key decisions:

- Treat this as a new independent QA pass for `mem-v2` unless a same-loop manifest is discovered.
- Use `compozy exec --ide claude --model opus --reasoning-effort xhigh` for the report portion, not for directing runtime agents under test.
- Selected playbook `consumer-saas-growth` because the selection matrix maps persistence/web read-model changes to that playbook.
- Added `northstar-pay` after user explicitly requested using both; real-scenario-qa runs exactly one playbook per lab, so this session will maintain two isolated labs and consolidate evidence.
- Claude Code ACP is now a blocking evidence lane for this QA run, not optional smoke coverage.
- Root-cause fix must keep `ClaimNextRun(criteria)` as the only autonomous claim primitive; scheduler/daemon may start or wake role sessions but must not claim runs directly.
- Pool-owned task runs must be claimable only by the matching agent name (`owner.kind=pool`, `owner.ref=<agent>`), and scheduler wake targeting must use the same owner match.
- Enqueued pool-owned runs need daemon role-session activation so queued work can progress even when only the lead/operator session exists.

State:

- Goal created for Codex Loop: `memv2-real-qa`.
- Both selected labs have been bootstrapped and launched against the real `claude` ACP provider.
- Consumer/Lumen completed a long multi-agent Claude ACP run with deliverables written to the lab workspace.
- Northstar completed only the lead Claude ACP prompt and explicitly reported that the other agents had not started; treat as a real QA failure/stall, not an operator prompt gap.
- QA artifacts, bug reports, strict audit reports, Claude Opus report, and verification reports have been written for both labs.
- Repository `make verify` passed with exit code 0 after the QA run.
- Follow-up fix work is active. Need investigate and fix runtime defects before claiming completion.
- Root-cause investigation confirmed two production gaps: `internal/store/globaldb.ClaimNextRun` ignores `tasks.owner_kind/owner_ref`, and `internal/scheduler.isEligibleSession` ignores task ownership.
- Root-cause investigation also confirmed that task-run enqueue has coordinator recovery but no role-agent activation path for `owner.kind=pool` tasks, so Northstar queued work never materializes matching sessions.
- Production fix and focused regressions are implemented locally.
- Full `rtk make verify` passed after the patch.
- Focused post-fix QA evidence is recorded in both QA labs. Northstar post-fix daemon recovery activated 9 live Claude ACP sessions, recovered queued pool-owned work into matching role sessions, and proved manual pool-run claim rejection. Lumen evidence was updated to show the pre-fix human/local-user pool-claim defect and cross-link the post-fix guard.
- Both isolated QA daemons were stopped after evidence capture.
- Strict scenario audits were refreshed and still FAIL for full playbook completion (`northstar-pay`: C10/C11/C16/C17; `consumer-saas-growth`: C17 review-cycle shortfall). These are residual full-scenario completion blockers, not failures of the focused owner/session regression fix.

Done:

- Read RTK instructions.
- Created this ledger.
- Loaded initial `real-scenario-qa`, `compozy`, `codex-loop`, and `agh-qa-bootstrap` skill instructions from visible prompt / disk.
- Loaded `qa-execution` and `qa-report` skill instructions.
- Read relevant prior ledgers: `mem-v2-tasks`, `qa-workflow-debug`, and `qa-skill-hardening`.
- Confirmed `.compozy/tasks/mem-v2` has 26 task files and existing QA plan/cases.
- Confirmed current `compozy` version is `0.2.1`; `compozy tasks validate --name mem-v2 --format json` passes with `scanned: 26`.
- Noted `_tasks.md` still marks tasks as `pending`, but current code/search surfaces show Memory v2 implementation symbols and OpenAPI routes exist.
- Validated `consumer-saas-growth` playbook successfully: 7 agents, 4 channels, 11 open tasks.
- Bootstrapped fresh lab with `REUSED_LAB=false`.
- Verified bootstrap has no `UNFILLED` charter placeholders, 11 open tasks, 7 agent files, playbook JSON, knowledge files, and scenario contract minimums.
- Confirmed local Claude Code is installed and authenticated; do not record private account details in artifacts.
- Read provider docs and L-016: Claude Code is built-in provider `claude`, harness `acp`, command `npx -y @agentclientprotocol/claude-agent-acp@latest`, auth mode `native_cli`, default model `claude-sonnet-4-6`, and `home_policy=operator` should preserve the operator HOME.
- Validated `northstar-pay` playbook successfully: 11 agents, 10 channels, 12 open tasks.
- Bootstrapped fresh `northstar-pay` lab with `REUSED_LAB=false`, no `UNFILLED` charter placeholders, 12 open tasks, 11 agent files, playbook JSON, knowledge files, and scenario contract minimums.
- Materialized executable AGH `AGENT.md` definitions for both labs under `.agh/agents/`, using provider `claude`, model `claude-sonnet-4-6`, and `approve-all` permissions.
- Registered both workspaces in their isolated AGH homes and verified agent lists.
- First Claude ACP session attempt exposed a real environment boundary: the daemon could not resolve `npx` because its inherited `PATH` lacked the mise Node install path.
- Restarted both daemons with the Node/mise bin directory in `PATH`; subsequent `agh session new` calls succeeded with provider `claude` and ACP capabilities.
- Consumer kickoff posted to session `sess-941ffe8afb6ed7b3` / ACP session `ffa05fc9-f126-4a1c-99a2-3e85c900e2d7`; output captured in `operator-kickoff.jsonl` with empty stderr.
- Consumer seeded 11 task runs and observed many live Claude ACP sessions, including role sessions plus memory extractor sessions; daemon status showed active sessions in the 7-10 range and total sessions growing to roughly 80.
- Northstar kickoff posted to session `sess-1f67dfd3489e27cf` / ACP session `7647f3c3-492f-4787-a3eb-d2452e10462d`; output captured in `operator-kickoff.jsonl` with empty stderr.
- Northstar seeded 12 task runs but observed only the lead session completing; the lead wrote one decision file and reported all other deliverables unstarted because other agents were not online.
- Recorded structured CLI/API/Web/runtime/provider evidence in both `journey-log.jsonl` files.
- Filled both `provider-attempt.json` files with live Claude ACP proof and the initial daemon PATH boundary.
- Ran `observe-runtime.py` in a short post-run window for both labs; both reported stalls.
- Filed Lumen `BUG-001-task-state-not-closed-after-agent-output.md`.
- Filed Northstar `BUG-001-multi-agent-startup-stalls-after-lead.md`.
- Ran `compozy exec --ide claude --model opus --reasoning-effort xhigh --format json` for the report-only analysis; saved output to `opus-report.md`.
- Ran `make verify`; exit code 0, log copied to both QA roots.
- Wrote `verification-report.md` for both labs.
- Ran strict `audit-qa-evidence.py` for both labs after report update.
- Read `qa-execution`, `systematic-debugging`, `no-workarounds`, Go/test skills, `internal/CLAUDE.md`, RTK, current ledger, and relevant scheduler/claim/execution-boundary ledgers.
- Inspected scheduler selection, global DB claim SQL, task manager claim/lease behavior, daemon task session bridge, hooks observer path, coordinator runtime observer pattern, and CLI/native claim surfaces.
- Added owner-aware `ClaimNextRun` SQL filtering, scheduler owner/session matching, daemon task role session activation/recovery, explicit pool-claim rejection, and task session bridge owner/channel binding.
- Added regressions in `internal/store/globaldb`, `internal/scheduler`, `internal/daemon`, and `internal/task`.
- Verification passed: focused tests for owner claim, owner wake, explicit pool claim rejection, task role activation, task session bridge; broader `go test ./internal/task ./internal/store/globaldb ./internal/scheduler ./internal/daemon -count=1`; scheduler integration `go test -tags integration ./internal/scheduler -count=1`; `git diff --check`.
- First `rtk make verify` found a lint `unparam` in `internal/daemon/task_role_runtime.go`; fixed by removing the unnecessary return value.
- `rtk make lint` passed after the lint fix.
- Second `rtk make verify` passed with exit code 0.
- Reused the Northstar lab to validate boot recovery from the exact queued-work failure state.
- Saved post-fix Northstar evidence: `post-fix-daemon-status.json`, `post-fix-session-list.json`, `post-fix-role-session-summary.json`, `post-fix-task-list.json`, `post-fix-task-summary.json`, `post-fix-run-owner-query.txt`, `post-fix-explicit-claim-guard.txt`, `post-fix-explicit-claim-state.txt`, and `post-fix-daemon-stop.json`.
- Confirmed `agh task run claim run-2a648c626fa633aa -o json` exits 1 with permission denied for a pool-owned frontend task and leaves the run queued.
- Saved Lumen pre-fix ownership evidence in `pre-fix-human-pool-claims.tsv`, showing five pool-owned runs claimed by `human/local-user` without `session_id`.
- Updated both bug reports and verification reports with confirmed root causes, fix details, focused post-fix evidence, refreshed strict audit outcomes, and stopped-daemon bootstrap status.
- Refreshed strict QA audits after report updates. Northstar still fails full scenario audit with 6 blockers; Lumen still fails full scenario audit with 1 C17 blocker.
- Stopped the Northstar post-fix daemon and the original Consumer/Lumen daemon; subsequent status checks reported `stopped`.

Now:

- Final sanity checks: git status/diff summary, test evidence recap, and no running QA daemons.

Next:

- Produce the final response with changed files, verification evidence, residual full-scenario audit blockers, and QA bootstrap paths.

Open questions (UNCONFIRMED if needed):

- None currently blocking.

Working set (files/ids/commands):

- `.codex/ledger/2026-05-06-MEMORY-memv2-real-qa.md`
- `.compozy/tasks/mem-v2/`
- `.compozy/tasks/mem-v2/qa/test-plans/memory-v2-test-plan.md`
- `.compozy/tasks/mem-v2/qa/test-cases/TC-SCEN-001.md`
- `.compozy/tasks/mem-v2/qa/test-cases/TC-SCEN-002.md`
- `.agents/skills/real-scenario-qa/`
- `.agents/skills/agh-qa-bootstrap/`
- `rtk compozy tasks validate --name mem-v2 --format json`
- Lab root: `/Users/pedronauck/dev/qa-labs/agh-memv2-real-qa-20260506-051216-337090-lab`
- Manifest: `/Users/pedronauck/dev/qa-labs/agh-memv2-real-qa-20260506-051216-337090-lab/qa-artifacts/qa/bootstrap-manifest.json`
- QA output: `/Users/pedronauck/dev/qa-labs/agh-memv2-real-qa-20260506-051216-337090-lab/qa-artifacts`
- Runtime home: `/var/folders/7x/xg204hnd04b81fczcxvjlhzr0000gn/T/aghqa-223e473847ee/runtime`
- Base URL: `http://127.0.0.1:50106`
- Playbook: `consumer-saas-growth`
- Consumer lead session: `sess-941ffe8afb6ed7b3`
- Consumer ACP session: `ffa05fc9-f126-4a1c-99a2-3e85c900e2d7`
- Consumer verification report: `/Users/pedronauck/dev/qa-labs/agh-memv2-real-qa-20260506-051216-337090-lab/qa-artifacts/qa/verification-report.md`
- Consumer audit report: `/Users/pedronauck/dev/qa-labs/agh-memv2-real-qa-20260506-051216-337090-lab/qa-artifacts/qa/qa-audit-report.json`
- Consumer audit verdict: FAIL, blocker `C17 review_cycles_min 2 < required 3`
- Consumer/Lumen post-fix focused regression verdict: PASS for confirmed owner/claim defect; strict full scenario audit still FAIL with blocker `C17 review_cycles_min 2 < required 3`.
- Northstar lab root: `/Users/pedronauck/dev/qa-labs/agh-memv2-real-qa-northstar-20260506-051548-387796-lab`
- Northstar manifest: `/Users/pedronauck/dev/qa-labs/agh-memv2-real-qa-northstar-20260506-051548-387796-lab/qa-artifacts/qa/bootstrap-manifest.json`
- Northstar QA output: `/Users/pedronauck/dev/qa-labs/agh-memv2-real-qa-northstar-20260506-051548-387796-lab/qa-artifacts`
- Northstar runtime home: `/var/folders/7x/xg204hnd04b81fczcxvjlhzr0000gn/T/aghqa-b4ca55d36007/runtime`
- Northstar base URL: `http://127.0.0.1:51661`
- Northstar playbook: `northstar-pay`
- Northstar lead session: `sess-1f67dfd3489e27cf`
- Northstar ACP session: `7647f3c3-492f-4787-a3eb-d2452e10462d`
- Northstar verification report: `/Users/pedronauck/dev/qa-labs/agh-memv2-real-qa-northstar-20260506-051548-387796-lab/qa-artifacts/qa/verification-report.md`
- Northstar audit report: `/Users/pedronauck/dev/qa-labs/agh-memv2-real-qa-northstar-20260506-051548-387796-lab/qa-artifacts/qa/qa-audit-report.json`
- Northstar audit verdict: FAIL, blockers `C10`, `C11`, `C16`, `C17`
- Northstar post-fix focused regression verdict: PASS for BUG-001; strict full scenario audit still FAIL with blockers `C10`, `C11`, `C16`, `C17`.
- Claude Opus report: `/Users/pedronauck/dev/qa-labs/agh-memv2-real-qa-20260506-051216-337090-lab/qa-artifacts/qa/opus-report.md`
- Make verify log: `/Users/pedronauck/dev/qa-labs/agh-memv2-real-qa-20260506-051216-337090-lab/qa-artifacts/qa/make-verify.log`
- Code under investigation/fix: `internal/store/globaldb/global_db_task_claim.go`, `internal/scheduler/scheduler.go`, `internal/daemon/task_runtime.go`, planned new daemon role-session runtime/tests.
- Added/modified code: `internal/daemon/task_role_runtime.go`, `internal/daemon/task_role_runtime_test.go`, `internal/daemon/boot.go`, `internal/daemon/task_runtime.go`, `internal/daemon/task_runtime_test.go`, `internal/scheduler/scheduler.go`, `internal/scheduler/scheduler_test.go`, `internal/store/globaldb/global_db_task_claim.go`, `internal/store/globaldb/global_db_task_claim_test.go`, `internal/task/manager.go`, `internal/task/manager_test.go`.
Loading
Loading