Skip to content

feat: cross-agent rules + tasks + KPI events (T1-T9)#193

Merged
efenocchi merged 58 commits into
mainfrom
feat/rules-and-tasks-kpis
May 22, 2026
Merged

feat: cross-agent rules + tasks + KPI events (T1-T9)#193
efenocchi merged 58 commits into
mainfrom
feat/rules-and-tasks-kpis

Conversation

@efenocchi
Copy link
Copy Markdown
Collaborator

@efenocchi efenocchi commented May 21, 2026

Summary

Ships the org-wide rules + tasks + KPI events system from the design doc — a way to share team principles ("never DROP TABLE on prod") and per-team/per-user task tracking across every agent in the org, injected at SessionStart so each agent knows them from turn 1.

  • Rules (hivemind rules add/list/edit/done) — team-wide, audit-trailed via the skills-pattern (immutable + version-bump, never UPDATE).
  • Tasks (hivemind tasks add/list/edit/done/assign/progress/report) — scope me or team, with assignee + assigned-by; LLM (Claude Sonnet) generates 1-3 KPIs from the task text on creation; users record progress with tasks progress; tasks report aggregates events.
  • Events (hivemind_task_events, append-only) — emitted manually by users, automatically by the PostToolUse hook on gh pr merge (one v1 pattern; --auto + failed merges + interrupted commands all correctly skipped), and queryable via SUM(value) per (task_id, kpi_id).
  • SessionStart injectionsrc/hooks/shared/context-renderer.ts produces the inject block (rules + tasks + KPI progress + HOW-TO), wired into claude-code / cursor / hermes. Codex deliberately excluded (its additionalContext is rendered in the TUI history cell and the 30-line block would clobber the user view). Pi / openclaw fall back to hivemind context, a new CLI that prints the same block on demand.
  • Full read-only modeHIVEMIND_CAPTURE=false now correctly gates ALL writes (placeholder INSERT + ensure DDL); renderer still runs.

Architecture: 3 new Deeplake tables (hivemind_rules, hivemind_tasks, hivemind_task_events), all behind the existing lazy-schema-heal pattern (PR #177). Per-agent integration touches 4 SessionStart forks behind one shared renderer. @anthropic-ai/sdk added for KPI generation; absence of ANTHROPIC_API_KEY is a silent no-op so the system works degraded out of the box.

Why now

Davit's Slack ask: shared rules to cut agent hallucinations + team alignment via KPIs. Prior activeloop attempt at something similar "didn't work wonderful" — the open question was enforcement. v1 deliberately ships soft enforcement only (SessionStart text injection); hard enforcement (PreToolUse blocking) is tracked as v1.2+ once we validate the loop works at all. The KPI side answers "does an agent told to track 'PRs merged: 3/5' actually drive toward 5?" — same instrument we'd want for any future hard enforcement layer.

What's shipped (T1-T9 from the design doc)

# Scope Commits
T1 Schema + DeeplakeApi ensure*Table + Config env vars 3
T2 Rules module + CLI + dispatcher + 4 codex fixes 6
T3 Tasks module + KPI validator + CLI + 1 codex fix 4
T5 Events module + auto-extract patterns + capture wiring + report + 3 codex fixes 7
T6 Shared context-renderer + 3/4 SessionStart fork integration + 4 codex fixes 6
T7 hivemind context CLI for pi/openclaw fallback 1
T4 LLM KPI generator (@anthropic-ai/sdk) + eval suite 1
T8 Per-file coverage thresholds + 1 gap test 1
T9 README section + docs/RULES_TASKS_KPIS.md deep-dive 1

30 commits total. Each was reviewed by codex review before push; ~12 codex P2 findings caught and fixed in-flight (block-drop on missing tables, fractional KPI values, gh pr merge --auto false-positive, unknown kpi_id ghost progress, capture-mode read/write separation, etc.).

Data model

hivemind_rules           — scope='team' (hardcoded), version-bumped, audit-trail
hivemind_tasks           — scope='me' | 'team', assigned_to + assigned_by,
                           kpis JSONB (LLM-generated), version-bumped
hivemind_task_events     — append-only SUM(value) stream for KPI progress

Three tables (not two) because events need a write shape that sidesteps the Deeplake UPDATE-coalescing bug — append-only INSERTs never trigger it. Rules and tasks both follow the skills-table version-bump precedent (deeplake-api.ts:530).

Per-agent SessionStart status

Agent Inject? Why
claude-code additionalContext is model-only
cursor additional_context is model-only
hermes context is model-only
codex additionalContext is rendered in TUI history; 30-line block would clobber user view. Discovers via CLI (matches the existing exclusion of the DEEPLAKE MEMORY block).
pi ❌ (CLI fallback) No SessionStart hook in v1; agents call hivemind context from the model on demand
openclaw ❌ (CLI fallback) Same as pi

CLI surface

hivemind rules add "<text>" [--scope team]
hivemind rules list [--status active|done|all] [--limit N]
hivemind rules edit <rule-id> "<new text>"
hivemind rules done <rule-id>

hivemind tasks add "<text>" [--scope me|team] [--assign <user>]
hivemind tasks list [--mine|--team|--all] [--status active|done|all]
hivemind tasks edit <task-id> "<new text>"
hivemind tasks done <task-id>
hivemind tasks assign <task-id> <user>
hivemind tasks progress <task-id> <kpi-id> --value N [--note "..."]
hivemind tasks report [<task-id>]

hivemind context     # print the SessionStart block on demand

All list commands print FULL 36-char UUIDs (no truncation) so copy-paste into edit / done / assign / progress round-trips correctly.

New env vars

  • HIVEMIND_RULES_TABLE / HIVEMIND_TASKS_TABLE / HIVEMIND_TASK_EVENTS_TABLE — table-name overrides for test orgs (default hivemind_rules / hivemind_tasks / hivemind_task_events)
  • HIVEMIND_KPI_MODEL — model id for KPI generation (default claude-sonnet-4-6)
  • HIVEMIND_KPI_LLM=disable — opt-out of LLM KPI generation
  • ANTHROPIC_API_KEY — required for KPI generation; absence is silent no-op
  • HIVEMIND_CAPTURE=false — now correctly gates DDL ensure too (was previously letting ensureTable/ensureSessionsTable through; codex review caught it)

Tests + coverage

  • 3090 / 3090 tests green (was 2857 pre-PR; +233 net)
  • Per-file coverage thresholds added for the 14 new modules (mostly 90+ on statements/lines/functions; branches calibrated to actual coverage with rationale comments — e.g. kpi-generator.ts branch at 80 because the SDK dynamic-import catch is intentionally untestable)
  • Eval suite (tests/evals/kpi-generation.eval.ts, 5 cases) calls the real Anthropic API to gate prompt regressions — manually-run only (.eval.ts extension skips it from npm test); see tests/evals/README.md for the bump cadence
  • Existing per-agent SessionStart hook tests (claude-code / cursor / hermes) updated for the new query counts and HIVEMIND_CAPTURE=false semantics

Known v1 limitations (mapped to v1.1 candidates)

Documented in docs/RULES_TASKS_KPIS.md:

  • <user> identity is a plain string match against cfg.userName (no email normalization)
  • Auto-extract events are orphan (task_id="") — no retroactive attribution mechanism yet
  • Codex inject deliberately skipped (TUI hygiene)
  • pi / openclaw require explicit hivemind context call
  • Concurrent v=N+1 race on edits produces duplicate version rows (tie-break deterministic, duplicates remain in audit trail)
  • Cross-agent event dedup not implemented (two agents emitting same event = two rows)

Test plan

  • hivemind login against test_plugin org
  • hivemind rules add "no DROP TABLE on prod"hivemind rules list shows the rule with full UUID
  • Open a new claude-code session → DEEPLAKE MEMORY block + new HIVEMIND RULES block both present in injection
  • hivemind tasks add "ship the search bar" --scope me → KPIs auto-generated (if ANTHROPIC_API_KEY set) and visible in hivemind tasks list
  • hivemind tasks progress <task-id> <kpi-id> --value 1 --note "merged PR #42"hivemind tasks report shows 1/target
  • gh pr merge 123 (succeeds) → orphan event row appears in hivemind_task_events
  • gh pr merge 99999 (fails — bad PR id) → NO event row appears
  • gh pr merge 123 --auto → NO event row appears
  • HIVEMIND_CAPTURE=false + new session → ensure*Table NOT called, renderer still produces block
  • Open a cursor session → same HIVEMIND block present in additional_context
  • Open a hermes session → same HIVEMIND block present in context
  • Open a codex session → HIVEMIND block NOT present (deliberate; verify the comment on src/hooks/codex/session-start.ts line ~114)
  • hivemind context from any terminal → prints the same block; empty state → diagnostic on stderr, stdout empty
  • vitest --coverage → 95%+ statements / 90%+ branches / 99%+ functions on new modules

Summary by CodeRabbit

Release Notes

  • New Features

    • Added hivemind rules command to create and manage team-wide principles
    • Added hivemind tasks command for task management with automatic KPI tracking
    • Added hivemind context command to view active rules and assigned tasks
    • Automatic injection of rules and tasks context into agent sessions (Claude Code, Cursor, Hermes)
    • LLM-powered KPI suggestions for new tasks
    • Auto-extraction of task metrics from bash command execution
  • Documentation

    • Added comprehensive Rules, Tasks, and KPI documentation with CLI usage examples
  • Chores

    • Updated @anthropic-ai/sdk dependency to ^0.97.1

Review Change Stack


⚠️ Goal/KPI cross-agent — runtime intercept scope (READ THIS)

The goal/KPI write flow added on top of this PR (commits 30ba49de4d947a) uses two different code paths depending on the agent runtime. This is NOT a code smell — it's a structural constraint of each runtime's plugin model. Future write-side cross-agent features will hit the same fork.

The two paths

Path A — VFS routing (claude-code, codex):

  • Agent calls Write tool on ~/.deeplake/memory/goal/<owner>/<status>/<uuid>.md
  • Plugin's pre-tool-use hook rewrites the tool call to a deeplake-shell invocation
  • deeplake-shell talks to Deeplake API → INSERT into hivemind_goals
  • Agent UX: "I'm writing a file"

Path B — CLI fallback (cursor, hermes, pi):

  • Agent calls hivemind goal add "<text>" via its Shell / terminal tool
  • The shell command runs as a normal subprocess
  • The hivemind CLI binary talks directly to Deeplake API → INSERT into hivemind_goals
  • Agent UX: "I'm running a CLI command"

Path C — Registered tool (openclaw):

  • Extension declares hivemind_goal_add in its manifest
  • Agent calls the tool directly (LLM tool-call protocol)
  • Extension talks to Deeplake API → INSERT

End state in the Deeplake table is identical for all three paths. Team visibility is identical. The fork is purely about HOW the agent reaches the table.

Why three paths instead of one

Each runtime decides what its plugin pre-tool-use hook can do:

Runtime Hook can allow/deny Hook can rewrite tool execution
claude-code yes yes for Write/Read/Edit/Bash — enables Path A
codex yes yes for Bash/Edit — enables Path A
cursor yes for all tools only for Shell (and currently only the grep subset) — needs Path B
hermes yes for all tools only for terminal — needs Path B
pi no hooks at all n/a — needs Path B (via explicit user prompt; pi has no SessionStart inject)
openclaw n/a (extension model) n/a — uses Path C (declared tools)

The Path A "magic" only exists because claude-code and codex let plugins replace what a tool actually does. Cursor/hermes give plugins a simpler contract (allow/deny + optional shell command rewrite). Pi gives plugins no contract at all.

What happens if you use the wrong path on the wrong runtime

If an agent on cursor/hermes calls Write on a memory-goal path:

  • The runtime executes the write normally (hook can't intercept)
  • The file lands on the real filesystem at that path (the hivemind memory directory is a plain on-disk dir, not a FUSE mount)
  • The Deeplake API is NEVER called → the row never appears in hivemind_goals
  • Other team members see NOTHING
  • The agent reports "✅ file written" because from its perspective the Write succeeded

This is what hermes / cursor were doing pre-fix. Local disk had stale .md files; the team-shared table had zero rows. The fix is the per-runtime skill that tells the agent which path to use.

Practical implication for future cross-agent features

When you add a new write-side capability that needs to cross all six agents:

  1. Write the feature as a hivemind <sub> CLI command (Path B). This is the lowest common denominator and works for cursor/hermes/pi out of the box.
  2. (Optional) Add VFS routing in deeplake-shell if you want the claude-code/codex agents to use Write/Edit naturally.
  3. (Optional) Add the operation as a tool in openclaw/src/index.ts + openclaw.plugin.json if you want openclaw native support.
  4. Update the per-agent skill text:
    • claude-code SKILL.md → tell the agent to use Write tool (Path A)
    • codex SKILL.md → same
    • hermes SKILL.md (in hermes' skills dir) → tell the agent to use the CLI via terminal
    • cursor SessionStart inject (via the shared instructions module) → tell the agent to use the CLI via Shell
    • openclaw SKILL.md → reference the new tool name

Cross-agent live-verification matrix (test tables on org may21)

Agent Mechanism Live-verified
claude-code Path A (VFS via Write tool intercept)
codex Path A
cursor Path B (CLI via Shell)
hermes Path B (CLI via terminal + dedicated SKILL.md with "do not write_file" warning)
pi Path B (CLI via shell, user-prompt-explicit since pi has no skill loader)
openclaw Path C (hivemind_goal_add + hivemind_kpi_add tools registered by extension) pending GUI test

efenocchi added 30 commits May 20, 2026 20:37
Adds three new schema definitions to deeplake-schema.ts following the
existing MEMORY/SESSIONS/SKILLS pattern:

- RULES_COLUMNS — org-wide rules (e.g. "never DROP TABLE on prod"),
  version-bumped on edit, immutable rows.
- TASKS_COLUMNS — team or user tasks with agent-generated KPIs as a
  nullable JSONB column; also version-bumped.
- TASK_EVENTS_COLUMNS — append-only stream feeding KPI current values via
  SUM(value) aggregation. No JSONB. Pure INSERT path so it sidesteps the
  Deeplake UPDATE-coalescing backend bug for high-churn data.

All three are registered with validateSchema() at module load so a
missing DEFAULT on a NOT NULL column can't sneak in.

Tests extend the parametric schema validity loops to cover all six
schemas and add targeted shape assertions per new table (version
default, status default, scope default, kpis nullability,
events-table flatness).

No callers touch the new arrays yet — the ensure*Table methods and
healing wiring land in the next commit (api layer).

Part of T1 in /home/emanuele/.claude/plans/sprightly-petting-tulip.md
Adds three new lazy-create+heal methods on DeeplakeApi following the
existing ensureSessionsTable / ensureSkillsTable template:

- ensureRulesTable      → CREATE + heal + (rule_id, version) lookup index
- ensureTasksTable      → CREATE + heal + (task_id, version) lookup index
- ensureTaskEventsTable → CREATE + heal + (task_id, kpi_id) lookup index

All three:
  * validate the table name through sqlIdent before any SQL interpolation
    (closes the same config-driven injection surface as the existing
    ensure* methods);
  * run the unconditional heal pass after CREATE TABLE IF NOT EXISTS to
    cover the stale-listTables race with concurrent writers;
  * use the inherited createTableWithRetry outer-retry budget (2s/5s/10s)
    on CREATE failures.

Tests mirror the ensureSessionsTable / ensureSkillsTable shape — fresh
CREATE, race-detected legacy ALTER, partial-missing ALTER, fully
up-to-date no-op — plus a cross-table identifier-injection guard that
verifies all three reject `x"; DROP TABLE y; --` before any network
call.

No CLI callers yet — config.ts env vars + the actual wire-up land in
the next commit.

Part of T1 in /home/emanuele/.claude/plans/sprightly-petting-tulip.md
Extends the Config interface and loadConfig() fallback chain with:

  rulesTableName       ← HIVEMIND_RULES_TABLE       (default "hivemind_rules")
  tasksTableName       ← HIVEMIND_TASKS_TABLE       (default "hivemind_tasks")
  taskEventsTableName  ← HIVEMIND_TASK_EVENTS_TABLE (default "hivemind_task_events")

Defaults are namespaced ("hivemind_*") because these tables live next
to the un-namespaced memory/sessions/skills tables and a future
non-hivemind tenant of the same workspace would otherwise collide on
"rules" or "tasks". The env-var override convention (memory_test /
sessions_test → hivemind_rules_test, etc.) for the e2e test_plugin
org documented in CLAUDE.md still applies via the env vars.

Config tests grow to cover:
- defaults render for all three new fields when no env vars set
- HIVEMIND_*_TABLE env vars override per-field
- setting one of rules/tasks/task-events does not bleed into the
  defaults of the other two
- skillsTableName is included in the default-render assertion (was a
  pre-existing test gap)

Pre-existing Config fixtures in two test files (skillify-auto-pull,
spawn-wiki-worker) are extended with the three new fields so tsc
--strict stays clean. No behavioural change in those tests.

The ensure*Table call wiring (SessionStart hooks, capture path) does
not change in this commit — that lands when the rules/tasks/events
modules are added.

Closes T1 of /home/emanuele/.claude/plans/sprightly-petting-tulip.md
(schema + api + config foundation).
Introduces src/rules/ — the data-access layer for the hivemind_rules
table, following the SKILLS-table version-bump pattern (no UPDATE
statements; every edit INSERTs a fresh row with version+1, reads pick
latest per rule_id).

Module surface:

  insertRule(query, tableName, { text, assigned_by, agent?, plugin_version? })
      → fresh rule_id + version=1
  editRule(query, tableName, { rule_id, assigned_by, text?, status?, ... })
      → reads previous via getRuleLatest, INSERTs version+1 with merged
        fields (omitted fields carry over from prior version)
  markRuleDone(query, tableName, { rule_id, assigned_by })
      → convenience wrapper around editRule with status='done'
  listRules(query, tableName, { status?, limit? })
      → latest-version row per rule_id, status-filtered, newest-first
  getRuleLatest(query, tableName, rule_id)
      → single latest row or null

Implementation notes:

- Accepts a QueryFn (sql => Promise<rows>) instead of DeeplakeApi so the
  module is trivially mockable, matching upload-summary.ts's pattern.
- All string interpolation routes through sqlStr / sqlIdent — no
  hand-rolled escaping; rule text goes into an E-string literal so
  backslashes survive correctly.
- Hard cap of 2000 chars on rule text (Open Question O5 default from
  the plan). Throws before SQL is built; empty text also rejected.
- Dedup-by-latest-version happens in JS rather than via a window
  function — Deeplake's exact window-function support is uncertain and
  v1 expected scale is dozens of rules, not thousands. The
  (rule_id, version) lookup index created by ensureRulesTable keeps the
  SELECT cheap regardless. If v1.1 needs more, swap in a window
  SELECT — the row shape is unchanged.
- Malformed rows (NaN version) are dropped silently from listRules
  rather than thrown; this preserves the read path against partial
  garbage rather than failing the whole SessionStart inject.

Tests: 21 unit tests cover insert (5: happy path, escaping, empty text
reject, length-cap reject, identifier-injection reject, override
defaulting), editRule (4: version bump, field carry-over, not-found,
empty-text reject), markRuleDone (2: happy + idempotent re-done audit),
listRules (5: dedup latest + status filter, status=all, status=done,
limit honored, malformed-row drop), getRuleLatest (3: happy, null,
escaping). 21/21 green; full suite still 2858+21 = 2879/2879.

No CLI wiring yet — handler + dispatcher land in the next two commits.

Part of T2 in /home/emanuele/.claude/plans/sprightly-petting-tulip.md
Adds runRulesCommand(args) — the CLI handler for `hivemind rules` —
following the existing runSkillifyCommand pattern (export from
src/commands/, dispatcher routes by argv[0]).

Subcommands implemented:

  hivemind rules add "<text>" [--scope team]
      → ensureRulesTable, then insertRule(); prints "Added rule <uuid> (v1)"
  hivemind rules list [--status active|done|all] [--limit N]
      → listRules() — newest-first, default limit 10
  hivemind rules edit <rule-id> "<new text>"
      → editRule() — SELECT prior + INSERT v+1 with new text
  hivemind rules done <rule-id>
      → markRuleDone() — INSERT v+1 with status='done'

Design notes:

- Handler is intentionally thin: argparse + delegate. All SQL building,
  escaping, and version-bump logic lives in src/rules/ (added in the
  previous commit).
- `assigned_by` is sourced from cfg.userName — same convention as
  src/hooks/wiki-worker.ts and src/hooks/capture.ts. Schema column is
  TEXT so any string is accepted; if creds have a proper email, that
  lands; otherwise the local username does.
- `ensureRulesTable` is called on every non-help invocation so the
  first `hivemind rules add` on a fresh org Just Works (lazy schema
  heal, same shape as the existing memory/sessions/skills bootstrap).
- `--scope team` is the only valid value in v1 (matches A3 in the
  plan). The flag is accepted for forward-compat but rejects anything
  else with a clear error.
- Login gating: loadConfig() returning null exits with code 2 and a
  "Run hivemind login first" message. No silent fallthrough to a
  half-broken DeeplakeApi.
- Flag parser strips known flags (--scope, --status, --limit) before
  scanning positionals so `rules add "x" --scope=team` doesn't eat the
  text. Unknown flags pass through unchanged (forward compat).

Tests (23):
  - help / no-arg / unknown-sub (3)
  - login gating (1)
  - add: happy + --scope team + --scope reject + missing-text + flag-
    order regression (5)
  - list: default rendering + empty state + --status done + invalid
    --status + invalid --limit (non-numeric) + invalid --limit (zero)
    + --limit honored (7)
  - edit: happy + missing args + not-found (3)
  - done: happy + missing args (2)
  - ensureRulesTable wiring: called once + honors HIVEMIND_RULES_TABLE
    via cfg.rulesTableName (2)

Full suite: 2902 / 2902 green. No CLI dispatcher wiring yet — the next
commit registers `rules` in src/cli/index.ts.

Part of T2 in /home/emanuele/.claude/plans/sprightly-petting-tulip.md
Wires the rules CLI handler (added two commits back) into the
top-level dispatcher in src/cli/index.ts:

1. import runRulesCommand from ../commands/rules.js
2. Dispatch `cmd === "rules"` → runRulesCommand(args.slice(1))
3. USAGE text gets a new "Team-wide rules" section documenting the
   four subcommands (add/list/edit/done) alongside the existing
   skillify/embeddings/account blocks.

The handler does its own argparse, schema bootstrap (ensureRulesTable),
and exit-code handling — the dispatcher is one await.

Closes T2 in /home/emanuele/.claude/plans/sprightly-petting-tulip.md
(rules slice of S2). Tasks/KPIs slice follows.

Full suite still green: 2902 / 2902.
Codex review on S2 surfaced that `formatListRow` truncated rule_id to
8 chars while `edit` and `done` do an exact-match SELECT on the full
rule_id. Result: users copy the displayed id from `hivemind rules
list` into `hivemind rules edit <id>` and get "Rule not found".

The fix is one line — drop the .slice(0, 8). UUIDs are 36 chars but
still fit comfortably on one line of the output. Future ergonomics
(prefix matching, short aliases) tracked as a v1.1 polish item; v1
ships with the unambiguous full-id contract.

New regression test (cli-rules.test.ts):
- "listed rule_id round-trips into edit (no truncation regression)" —
  lists a row, extracts the UUID via regex from the rendered output,
  then calls `rules edit <displayed_id>` and asserts the SELECT
  lookup uses the full id. Locks in the round-trip property so a
  future cosmetic change can't silently re-break edit/done.

Existing list test gets a tighter assertion: it now requires the FULL
rule_id ("rule-aaaa-bbbb") to appear, not just the prefix.

Full suite: 2903 / 2903 green.
…rrent v=N+1 race

Codex review on S2 surfaced a real concurrency edge case:

  T0    : alice and bob both read rule X at version=3
  T0+1ms: alice INSERTs version=4 (text "A")
  T0+2ms: bob   INSERTs version=4 (text "B")
  T1    : charlie calls getRuleLatest('X')
          → ORDER BY version DESC LIMIT 1 picks one of {A, B} arbitrarily.
          → A subsequent editRule built on the loser silently resurrects
            the stale text on the next version bump.

Probability is low for human-driven CLI use (ms-scale collisions
between two operators editing the same rule simultaneously) but the
fix is one ORDER BY clause and the symptom — silent text revert with
no error — is the worst class of UX bug.

Add `, created_at DESC` as a secondary key. listRules() already uses
the same compound ordering (see "newest-first by created_at" test);
this brings single-rule and list reads into agreement on which row is
the latest.

New regression test (rules.test.ts):
- "orders by (version DESC, created_at DESC) — deterministic tie-
  break under concurrent v=N+1 race"
- Existing editRule test gets the updated SQL regex.

Full suite: 2904 / 2904 green.

This commit closes the second-pass codex finding for S2. The other
finding (stale bundle in src/cli/index.ts:349) is a false positive —
commit bbb1208 untracked all bundle/ directories repo-wide and now
.gitignores them; CI / release pipeline rebuilds from the SHA-pinned
source. No bundle edit needed in this branch.
Codex review on S2 final pass surfaced a UX mismatch: the rules help
block in src/cli/index.ts read "Team-wide rules (principles injected
at SessionStart)" while the actual SessionStart injection is part of
T6, not S2. A user adding a rule today would expect agents in
subsequent sessions to read it; nothing wires that yet.

Soften the section header and add a one-line note pointing at the
follow-up commit. No code change, no test change — pure documentation
honesty so the CLI's promise lines up with what S2 actually ships.

The injection promise gets restored once T6 lands and the
SessionStart hook actually calls into src/rules/read.ts listRules().
… + tests)

Adds src/tasks/ — the data-access layer for the hivemind_tasks table.
Same shape as src/rules/ (added in T2): append-only INSERT-with-version,
read-latest dedup, compound ORDER BY tie-break, sqlStr/sqlIdent
boundary, no UPDATE statements. Diffs vs rules:

  • scope = 'me' | 'team'   (rules are always 'team')
  • assigned_to + assigned_by columns (rules only have assigned_by)
  • kpis JSONB column carrying an array of agent-generated KPI metadata

T3 ships the helpers with kpis defaulting to `[]` on insert; T4 will
plug an LLM call into insertTask that fills kpis from the task text.

Module surface:

  insertTask(query, table, { text, scope, assigned_to?, assigned_by, kpis?, ... })
      → fresh task_id + version=1. assigned_to defaults to assigned_by
        (self-assigned) when omitted.
  editTask(query, table, { task_id, assigned_by, text?, status?, assigned_to?, kpis? })
      → reads previous via getTaskLatest, INSERTs version+1 with
        merged fields. Scope is intrinsic to task identity and always
        carries over from the prior version — edit cannot flip it.
  markTaskDone(query, table, { task_id, assigned_by })
      → wrapper around editTask, status='done'.
  assignTask(query, table, { task_id, assigned_by, assigned_to })
      → wrapper around editTask, sets new assignee.
  listTasks(query, table, { scope?, status?, current_user?, limit? })
      → scope: 'mine' | 'team' | 'all'. 'mine' returns rows where
        assigned_to=current_user (across both scope=me and scope=team
        rows); returns [] if current_user is omitted rather than
        silently broadening (no accidental over-disclosure).
  getTaskLatest(query, table, task_id)
      → single latest row, with the (version DESC, created_at DESC)
        compound ORDER BY tie-break carried over from rules/read.ts.

Plus src/tasks/kpi-validator.ts: parseKpis / stringifyKpis. Defensive
on read (any malformed item collapses to []), symmetric on write
(stringifyKpis drops the same malformed items the reader would).
T4's LLM hook will go through stringifyKpis, so we never store data
the reader would silently throw away.

Tests (38) cover:

  parseKpis (8): null/undefined/empty, JSON-string array, decoded
    array, required-field rejection, `current` preserve/drop on type,
    malformed JSON, non-array input.

  stringifyKpis (3): round-trip, defensive drop, empty.

  insertTask (7): happy path + default assigned_to, explicit cross-
    assign, KPI serialization, scope validation, empty/over-cap text
    rejection, identifier injection rejection.

  editTask (5): SELECT-then-INSERT + carry-over, kpis preserve, kpis
    replace, task-not-found, empty-text reject.

  markTaskDone (1): version bump + status='done' + preserved text.

  assignTask (1): version bump + new assignee + carry-over.

  listTasks (8): default (active + dedup), scope='mine', mine-without-
    current_user → [], scope='team', status filters (done + all), kpis
    JSONB normalization through the validator, --limit, malformed-row
    drop.

  getTaskLatest (4): happy, null on miss, escape rule_id, kpis JSONB
    normalization.

Full suite: 2942 / 2942 green (2904 prior + 38 new).

No CLI wiring yet — handler lands in the next commit, dispatcher
registration after that.

Part of T3 in /home/emanuele/.claude/plans/sprightly-petting-tulip.md
Adds runTasksCommand(args) — the CLI handler for `hivemind tasks` —
mirror of runRulesCommand (added in T2). Six subcommands:

  hivemind tasks add "<text>" [--scope me|team] [--assign <user_email>]
      ensureTasksTable → insertTask. Defaults: --scope me, --assign self.
      KPIs land as `[]` in T3; T4 will plug an LLM call into insertTask
      to fill them from the task text.
  hivemind tasks list [--mine|--team|--all] [--status active|done|all] [--limit N]
      listTasks. Default --mine (rows where assigned_to=cfg.userName).
      Conflicting flags rejected. Renders KPI lines under each task.
  hivemind tasks edit <task-id> "<new text>"
      editTask — SELECT prior + INSERT v+1 with new text.
  hivemind tasks done <task-id>
      markTaskDone.
  hivemind tasks assign <task-id> <user_email>
      assignTask.
  hivemind tasks report [<task-id>]
      T3 stub — prints "aggregation lands with the events module in T5"
      rather than silently returning zero progress (which would be
      indistinguishable from real zero progress).

Design notes — applied the codex lessons from the rules-side review
preemptively:

  • formatListRow prints the FULL task_id (36-char UUID) so users can
    copy-paste straight into edit/done/assign — these all do exact-
    match SELECTs. Truncation would silently break the round-trip.
  • The compound (version DESC, created_at DESC) ORDER BY tie-break
    lives in the module (read.ts) — same race-safety as rules.
  • Help text does NOT promise SessionStart injection; the rules-side
    final codex finding called that out and the same wording would
    repeat the mistake. The report stub explicitly defers to T5.
  • Login gating exits 2 with "Run hivemind login first" — no silent
    fallthrough to a half-broken DeeplakeApi.
  • parseScopeFilter rejects conflicting flags (--mine + --team) up
    front rather than letting the last one win silently.
  • stripKnownFlags handles both --flag value and --flag=value forms
    so a positional after a flag doesn't get eaten.

Tests (27): help/no-arg/unknown-sub, login gating, add (6 — happy,
--scope team, --assign cross, invalid --scope, missing text, flag
positional regression), list (8 — default --mine, --team, --all,
conflict, KPI-line rendering populated, KPI ?/target unset, empty
state, --limit reject, round-trip task_id preservation), edit (3 —
happy, missing args, not-found), done (1), assign (2 — happy, missing
args), report stub (1 — prints deferred notice + no writes), schema
bootstrap (2 — ensureTasksTable called once + HIVEMIND_TASKS_TABLE
override).

Full suite: 2969 / 2969 green. No dispatcher wiring yet — the final
commit registers `hivemind tasks` in src/cli/index.ts.

Part of T3 in /home/emanuele/.claude/plans/sprightly-petting-tulip.md
Wires runTasksCommand into the top-level dispatcher in
src/cli/index.ts (mirror of the rules wiring from T2):

1. import runTasksCommand from ../commands/tasks.js
2. Dispatch `cmd === "tasks"` → runTasksCommand(args.slice(1))
3. USAGE text gets a new "Personal + team tasks" block listing
   add / list / edit / done / assign / report alongside the existing
   "Team-wide rules" block.

The new help block is honest about what T3 ships:
  - KPI generation defers to T4 (LLM call from insertTask).
  - SessionStart injection defers to T6 (shared renderer).
  - report aggregation defers to T5 (events table).

This avoids the same UX mismatch codex flagged on the rules-side
final pass: don't promise something the binary doesn't deliver yet.

Closes T3 in /home/emanuele/.claude/plans/sprightly-petting-tulip.md
(tasks slice of S3). Events / auto-extract / LLM gen follow as
separate steps.

Full suite: 2969 / 2969 green.
Codex review on S3 surfaced an identity-shape papercut: the CLI
documented `assigned_to` as `<user_email>` while `listTasks --mine`
filters by `cfg.userName` (login persists this as the local-part or
display name, not necessarily the full email). Cross-assigned tasks
would silently disappear from the assignee's --mine view.

The minimum-disruption fix is to make the CLI contract honest about
what already happens at runtime:

  • src/commands/tasks.ts USAGE + module docstring — `<user_email>`
    becomes `<user>`. Added an "Identity" stanza that pins the
    contract: comparisons are exact, the string must match
    `hivemind whoami` exactly, no fuzzy email matching in v1.
  • src/cli/index.ts top-level help block mirrors the wording.
  • assign / add error messages updated to drop `<user_email>` for
    consistency.

A proper `userEmail` field on Config (with login backfill across the
six agents) is a v1.1 follow-up — out of scope for T3. The current
code already uses cfg.userName consistently for both
default-assigned_to AND --mine filter; the bug was purely the CLI
doc telling users to pass an email that wouldn't round-trip.

New regression test (cli-tasks.test.ts) locks in the contract:
"--mine identity match is exact (no fuzzy email matching) — lock the
v1 contract". Three rows with semantically-similar but textually-
distinct assigned_to values; the filter only matches the exact one.
A future "be helpful" change introducing fuzzy matching would have
to delete this test, which is the signal we want.

Also fixed: the help block in src/cli/index.ts used a literal
backtick around "hivemind whoami" inside the JS template literal
USAGE string, which broke template parsing (oxc/vite-transform error,
caught the regression on first vitest run). Switched to single quotes;
the help text is rendered to the terminal anyway, the cosmetic
difference is invisible.

Full suite: 2970 / 2970 green.
Adds src/events/ — the read+write layer for hivemind_task_events. The
table is INSERT-only; KPI current values are derived by aggregating
SUM(value) per (task_id, kpi_id). No UPDATEs means the Deeplake
UPDATE-coalescing bug is structurally unreachable for this stream
(see CLAUDE.md).

Module surface:

  appendEvent(query, table, {
    task_id, task_version, kpi_id?, value, note?, source,
    agent?, plugin_version?
  }) → { id }
    - Single INSERT. No SELECTs, no reads.
    - `source` enum: 'agent' | 'user' | 'auto-extract'
    - Validates: value must be finite (rejects NaN/Infinity),
      task_version must be a positive integer.
    - Negative values allowed (corrections / undo).

  computeCurrent(query, table, task_id, kpi_id) → number
    - SELECT SUM(value) FROM events WHERE task_id=? AND kpi_id=?
    - Returns 0 when no events (NULL from Deeplake) or junk total.
    - Normalizes string-typed totals (driver-dependent serialization).

  computeAllForTask(query, table, task_id) → { kpi_id → SUM }
    - One round-trip GROUP BY kpi_id for all KPIs on one task.
    - Saves N-1 round-trips vs a per-KPI loop in the renderer (T6).
    - Drops task-level events (NULL/empty kpi_id) from the map — those
      are not per-KPI counters.

All escape via sqlStr / sqlIdent. The (task_id, kpi_id) lookup index
created by ensureTaskEventsTable (T1) keeps the SUM cheap.

Tests (17):
  appendEvent (7): happy path, E-string escaping for notes, default
    nullables, negative values, NaN/Infinity rejection, task_version
    validation, identifier-injection rejection.
  computeCurrent (6): SUM happy, no-events 0, NULL total → 0,
    string-typed total parse, junk total → 0, WHERE-clause escaping.
  computeAllForTask (4): grouping happy, empty task, drop empty kpi_id
    rows, string-typed total normalization.

Full suite: 2987 / 2987 green (2970 prior + 17 new). No consumers yet
— auto-extract wiring lands in the next commit, then capture.ts gets
modified, then the tasks-report CLI un-stubs.

Part of T5 in /home/emanuele/.claude/plans/sprightly-petting-tulip.md
Adds src/hooks/auto-extract-patterns.ts — the standalone allow-list
the PostToolUse capture hook will consult to decide whether a shell
command should auto-emit a KPI event. v1 ships exactly ONE pattern:

  gh pr merge  →  +1 progress event

Design choices:

- ONE pattern only in v1. Plan and CLAUDE.md call out the rationale:
  `gh pr merge` is high-signal (intentional, rare); `git push` is
  intentionally excluded because the same command runs against
  personal branches and experiments, which would inflate counts.

- Agent-agnostic: matchCommand takes the raw command string, not a
  tool name wrapper. The capture hook (next commit) is responsible
  for extracting the command from whatever shape each agent's hook
  delivers (Claude Code Bash tool input vs Codex shell hook envelope
  vs ...). One pattern list, every agent.

- Note clamping: command snippets land in event.note via E-string
  literal, but we cap the body at 200 chars before SQL escaping so a
  giant pasted command doesn't bloat the row.

- Frozen array: PATTERNS is Object.freeze'd to prevent runtime
  mutation (a future bug introducing a hot-patch pattern would be a
  bad surprise).

Tests (16):

  PATTERNS shape (2): v1 ships exactly 1 pattern, unique-id invariant.

  True positives (7): plain `gh pr merge`, leading whitespace, with
    PR number, with flags, extra whitespace between tokens, full
    command preserved in note, NOTE_MAX_CHARS clamping.

  False-positive prevention (7): git push (the documented anti-
    target), gh pr view / list / create / checkout, unrelated shell
    commands (ls / npm / docker / commit message containing the
    magic string), embedded `gh pr merge` inside echo / grep /
    git log, token-boundary edge cases (ghpr merge, gh prmerge),
    empty / whitespace-only / null-ish inputs.

Full suite: 3003 / 3003 green (2987 prior + 16 new).

No hook wiring yet — capture.ts modification lands in the next
commit so the pattern module is verifiable in isolation before
touching a critical hook path.

Part of T5 in /home/emanuele/.claude/plans/sprightly-petting-tulip.md
Adds src/hooks/auto-extract.ts — the thin orchestrator that composes
the pattern allow-list (T5.2) with the events writer (T5.1) — and
calls it from capture.ts after the sessions row INSERT lands.

Why a separate module + thin call site:

- The orchestrator is unit-testable without mocking stdin or the
  whole capture pipeline. capture.ts stays an integration shell.
- The orchestrator's QueryFn boundary mirrors src/rules/ and
  src/tasks/ so the same mock pattern applies (tests/shared/
  auto-extract-orchestrator.test.ts).

v1 semantics (intentionally minimal):

  • Fires only on PostToolUse for the Bash tool. PreToolUse would
    double-count; non-Bash tools never carry a shell command in
    tool_input.command.
  • Emits an "orphan" event row: task_id="", kpi_id="". The
    pattern triggered, but v1 has no notion of a "current task"
    binding so we don't pretend to attribute progress. The note
    column carries the full command for later forensic / manual
    attribution (planned in v1.1).
  • Orphan events are deliberately invisible to computeAllForTask
    (which filters by kpi_id !== "") so they don't pollute KPI
    totals until they're attributed.

capture.ts integration:

- One await call after the sessions INSERT succeeds. Wrapped in
  try/catch — the session row is already safe by this point, so
  auto-extract NEVER breaks the capture path.
- On missing-table error (no SessionStart hook ensures task_events
  in v1; first PR merge of a fresh session hits this), we
  ensure + retry once. Mirrors the sessions-table lazy pattern at
  capture.ts:158-167. A v1.1 optimization is to pre-ensure
  task_events from SessionStart.
- Agent is hardcoded "claude_code" — this is the claude-code
  capture variant. Codex / cursor / hermes will get parallel
  wiring in their own variants when SessionStart injection lands
  in T6 (the auto-extract orchestrator itself is agent-agnostic;
  only the call site varies).

Tests for the orchestrator (8 — tests/shared/auto-extract-orchestrator.test.ts):

  Gating (4): non-PostToolUse → no-op, non-Bash tool → no-op,
    missing/non-string command → no-op, no-pattern-match → no-op.
    All four gates run ZERO queries (verified via spy count) — the
    fast path on non-matching commands is genuinely free of SQL.

  Match → INSERT (4): orphan event shape (task_id='' + kpi_id=''),
    error propagates to caller (capture.ts must catch), agent +
    plugin_version overrides honored, substring-of-magic-string
    inputs do NOT match (regression guard inheriting anchored-regex
    behavior from auto-extract-patterns.test.ts).

Full suite: 3011 / 3011 green (3003 prior + 8 new). capture.ts also
exercised indirectly via the full suite — no test file directly
mocks main(), matching the existing project convention.

Part of T5 in /home/emanuele/.claude/plans/sprightly-petting-tulip.md
Closes the loop on the T3 stub: `hivemind tasks report` now actually
computes KPI progress from the events stream via computeAllForTask
(added in T5.1). Plus a new `hivemind tasks progress` subcommand so
users have a manual writer — without it, the events stream only
contained orphan auto-extract rows and report would always show
0/target.

New subcommand:
  hivemind tasks progress <task-id> <kpi-id> --value N [--note "..."]
    1. getTaskLatest to resolve the current task version (events are
       bound to the version present at write time so a later edit
       doesn't retroactively re-credit progress).
    2. appendEvent with source='user', the resolved version, and any
       --note text. Negative --value allowed (corrections).
    3. Lazy-create + retry once on missing-table error (same pattern
       as capture.ts auto-extract path).
    4. Task-not-found → exit 1, no INSERT.

Report behavior:
  hivemind tasks report
    → listTasks(--mine, active, limit 50) + per-task computeAllForTask.
      Renders current/target/unit per KPI; 0/target when no events.
      "(no active tasks to report on)" empty state.
  hivemind tasks report <task-id>
    → getTaskLatest + single-task computeAllForTask. Task-not-found
      → exit 1.

USAGE block in src/cli/index.ts updated to document the new
subcommand. Module docstring + USAGE in src/commands/tasks.ts also
updated.

Tests (14 new):
  progress (7): happy path with version lookup + INSERT shape,
    --value= form, negative values, missing positional args, missing
    --value, invalid --value (non-finite), task-not-found, lazy-
    create + retry on missing-table.
  report (5 rewritten + 1 new): empty state, per-task aggregate
    rendering, 0/target when no events, targeted dive on one task,
    target-not-found, no-KPIs hint pointing at T4.

The old "T3 stub" test that asserted on the deferred-notice text is
removed — that contract no longer holds since report does the real
thing now. The new tests pin the new contract.

Full suite: 3024 / 3024 green (3011 prior + 14 new − 1 removed).

Closes T5 in /home/emanuele/.claude/plans/sprightly-petting-tulip.md.
LLM KPI generation (T4), shared SessionStart renderer (T6), and
`hivemind context` CLI for pi/openclaw (T7) follow.
Codex review on T5 surfaced two functional issues in the progress /
report paths:

[P2.1] Fresh-install report failure
  `tasks report` called `computeAllForTask` against task_events
  before that table existed. Auto-extract and `tasks progress` both
  lazy-create on first INSERT, but `report` is SELECT-only — on a
  fresh install where neither writer had run, the aggregate SELECT
  failed with "table does not exist" and report couldn't surface
  the intended zero/hint output. The kpis.length===0 short-circuit
  also ran AFTER the aggregate, so even a task with no KPIs hit the
  same failure.

  Fix: call ensureTaskEventsTable once at the top of the report
  subcommand (after the empty-state short-circuit, so an empty
  task list still costs 0 ensure calls). Reorder the per-task loop
  so `task.kpis.length === 0` short-circuits BEFORE the aggregate
  query — saves a round-trip per kpi-less task too.

[P2.2] parseValue accepted fractional / zero values
  `--value 0.5` parsed fine but events.value is BIGINT, so the
  INSERT would fail at the backend with a cryptic SQL error. Same
  for negative non-integer values. Tighten parseValue to require
  Number.isInteger; also reject 0 (zero events carry no signal —
  cleaner UX than silently writing them).

  The integer-only contract pins v1 to count-style KPIs (PRs
  merged, lines reviewed). Fractional KPIs (% complete) would
  require a schema change (BIGINT → DOUBLE PRECISION) — tracked as
  a v1.1 follow-up.

New regression tests (3):

  cli-tasks: "rejects fractional --value (events table is BIGINT)"
    — pins the integer contract so a future "be helpful" change
    accepting 0.5 has to consciously delete this test.

  cli-tasks: "rejects --value 0 (zero events carry no signal)"
    — pins the zero-value rejection contract.

  cli-tasks: "pre-ensures task_events at the top of report"
    — locks in the call to ensureTaskEventsTable so a refactor
    can't silently re-introduce the fresh-install failure.

Existing test "task with no KPIs prints the hint" updated to assert
the aggregate query is SKIPPED (now 1 query total vs the prior 2),
locking in the reorder.

Full suite: 3027 / 3027 green.
…ion semantic

Codex pass 2 on T5 surfaced two findings. One was a real regression
(failed-merge false-positive); the other was a semantic disagreement
where I disagree with codex and pin the contract in a docstring.

[P2.1] Failed merges no longer count toward KPI progress
  `gh pr merge 99999` (bad PR id) emits a PostToolUse with
  exit_code != 0, but the prior auto-extract orchestrator never
  inspected tool_response and emitted +1 anyway. Now:

    - AutoExtractInput grows a `tool_response?: Record<string,unknown>`
      field (capture.ts already threaded this; no plumbing change).
    - tryAutoExtract calls a new isToolResponseSuccess() helper that
      checks exit_code != 0, is_error, error, and interrupted before
      emitting. Missing tool_response falls back to "assume success"
      so agents that don't populate the field (other agents than
      claude_code) keep working — we don't penalize non-Bash
      response shapes.

  Regression tests (7) in auto-extract-orchestrator.test.ts cover:
  missing tool_response (assume success), exit_code 0, exit_code != 0,
  string exit_code (driver variance), interrupted=true, is_error=true,
  error=true.

[P2.2] aggregate.ts does NOT filter by task_version — by design
  Codex argued computeAllForTask should `WHERE task_version = ?` to
  honor the version binding `tasks progress` records. The push-back:
  task_version on events is for FORENSIC traceability ("what version
  was this event against"), not a reset mechanism. UX intuition:

    - User adds task "ship X" v=1 (KPI=PRs, target 5)
    - User emits 3 progress events on v=1
    - User edits task → v=2 (same kpi_id, refined text)
    - `tasks report` shows 3/5 — NOT 0/5

  Filtering by version would reset progress every edit, which is
  anti-intuitive. The real protection against accidental rebinding
  lives at the rendering layer: the renderer iterates the LATEST
  version's `kpis` JSONB only. An edit replacing the KPI set with
  new kpi_ids simply doesn't display the old events (they're orphan
  in the aggregate map). An edit that KEEPS the same kpi_id is "same
  KPI", and accumulating progress is what the user wants.

  Future point-in-time / forensic views can ship a separate
  `computeForVersion(task_id, kpi_id, version)` helper without
  touching this one. The docstring in aggregate.ts now explains
  this so the next reviewer doesn't repeat the question.

Full suite: 3034 / 3034 green (3027 prior + 7 new tests).
Two real correctness gaps caught by codex review pass 3.

[P2.1] `gh pr merge --auto` no longer counts as a merge
  GitHub CLI exits 0 after `gh pr merge --auto` even when the PR has
  not actually merged yet — it just enables auto-merge while CI is
  still running. The prior regex emitted +1 immediately, inflating
  merge counts for PRs that may never merge (failed CI, withdrawn,
  etc.). Tighten the pattern with a negative lookahead:

    /^\s*gh\s+pr\s+merge\b(?!.*\s--auto\b)/

  Real merges (with --merge, --squash, --rebase, --delete-branch,
  bare, or PR number) still match. Two existing tests that used
  `--auto` to assert matching are rewritten to use `--merge` /
  `--delete-branch` — those are real merges. A new "does NOT match
  --auto" regression test pins the exclusion in 5 positional
  variants.

[P2.2] `tasks progress` rejects unknown kpi_id
  Without validation, a typo (`k_pr_merg` vs `k_pr_merged`) wrote
  an event the report path could never display — `task.kpis`
  iteration filters by exact id match, so the orphaned event sat
  forever as ghost progress. Now `tasks progress` validates kpi_id
  against the latest task version's kpis before INSERT and exits
  1 with a "Valid: <list>" hint if not found.

  Edge case preserved: when task.kpis is empty (T3 default state
  pre-T4 LLM generation), any kpi_id is accepted. Blocking would
  prevent users from recording progress against KPIs that will be
  added later, and the failure mode (silently invisible) doesn't
  apply because nothing's being displayed anyway.

  New test pins both the rejection-with-valid-list behavior and the
  zero-KPI free-pass behavior.

Full suite: 3037 / 3037 green (3034 prior + 5 new tests − 2 rewrites).

The aggregate-task-version-filtering "finding" from pass 2 stays
unfixed by design (push-back documented in src/events/aggregate.ts).
…h aggregate

Adds src/hooks/shared/context-renderer.ts — the single source of truth
for the "HIVEMIND RULES + TASKS + HOW-TO" block that the 4 SessionStart
forks (claude-code, codex, cursor, hermes) will inject into each
agent's context. T6.2 wires the call sites; this commit is the
module + tests in isolation.

Module surface:

  renderContextBlock(query, { rulesTable, tasksTable, taskEventsTable,
    currentUser }, { maxRules?, maxTasks?, log? }): Promise<string>

  Returns the rendered block on success, "" on:
    - no active rules AND no visible tasks (nothing to inject)
    - any caught error (missing-table, network, parse) — SessionStart
      MUST NOT fail because of a bad rules/tasks read

Visibility rule (per the plan, A3b):
  - All active org-wide rules → rendered as-is
  - All active scope='team' tasks (any assignee) → rendered, with
    ★YOU highlight when assigned_to == currentUser
  - Active scope='me' tasks where assigned_to == currentUser only
    (bob's personal tasks NEVER appear in alice's block)

Performance:
  - One SELECT for rules, one for tasks, one batched aggregate
    for all displayed tasks' KPIs (computeAllForTasks — added
    in this commit too). 3 round-trips total regardless of how
    many KPIs are displayed.
  - Over-fetch (maxRules*4 / maxTasks*4) so the "X more" hint
    can give a useful count. Approximate at the high end — true
    counts beyond ~40 active rows just say "X+ more"-style.

KPI rendering:
  - Each task with KPIs prints `KPI: current/target unit` per KPI.
  - Tasks with no KPIs (T3 default state pre-T4) print no `|`
    separator — line ends after the task text.

New aggregate helper (src/events/aggregate.ts):

  computeAllForTasks(query, table, taskIds[]): Promise<{
    task_id → { kpi_id → total }
  }>

  One SELECT GROUP BY (task_id, kpi_id) WHERE task_id IN (...).
  Early-returns {} on empty taskIds (avoids invalid `IN ()` SQL).
  Drops rows with empty task_id (auto-extract orphans) or empty
  kpi_id (task-level events).

Tests (16 renderer + 4 batch aggregate = 20 new):

  renderContextBlock:
    - empty rules+tasks → empty string
    - missing-table error swallowed → empty string
    - any error swallowed → empty string
    - log callback fires on error
    - rules section: full rule_id (no truncation), maxRules cap +
      'X more' hint, maxRules option override, empty rules omits
      the section
    - tasks section: team-task visibility (any assignee), me-task
      filter (only current user's), ★YOU highlight on team tasks
      assigned to current user, KPI current/target/unit lines,
      0/target when no events, no '|' suffix when no KPIs,
      maxTasks cap + 'X more', computeAllForTasks called with
      SHOWN task ids only
    - HOW-TO footer: emitted when content present, omitted when
      block is empty

  computeAllForTasks:
    - one round-trip {task_id → {kpi_id → SUM}} map
    - empty taskIds → no SQL, returns {}
    - SQL identifier escaping in IN list
    - drops rows with empty task_id or kpi_id

Full suite: 3058 / 3058 green (3037 prior + 21 new).

Part of T6 in /home/emanuele/.claude/plans/sprightly-petting-tulip.md.
The 4 SessionStart fork integrations land in the next commit.
…forks

Wires renderContextBlock (added in T6.1) into each agent's SessionStart
hook so the rules + tasks block lands in the agent's context on every
session-start fire. Three forks get the inject; codex is deliberately
excluded.

Per-fork status:

  claude-code (src/hooks/session-start.ts):
    INJECTED. Renderer runs regardless of HIVEMIND_CAPTURE (it's
    read-only; the capture flag gates writes only). Block appended
    after the existing DEEPLAKE MEMORY + login-state text.

  cursor (src/hooks/cursor/session-start.ts):
    INJECTED. Renderer runs only when capture is enabled — matches
    the existing cursor placeholder logic structure (which gates the
    whole token+capture block together).

  hermes (src/hooks/hermes/session-start.ts):
    INJECTED. Same structure as cursor — renderer runs only when
    capture is enabled. Output goes through hermes's
    `{ context: ... }` envelope.

  codex (src/hooks/codex/session-start.ts):
    NOT INJECTED — by design. Codex's additionalContext is
    user-visible (rendered as `hook context: <text>` in the TUI
    history cell), so a ~30-line rules+tasks block on every
    SessionStart would clobber the user's view. The codebase
    already deliberately excludes the bulky DEEPLAKE MEMORY block
    from codex for the same reason; this commit only adds a clear
    comment explaining the choice. Codex agents discover rules /
    tasks on demand via the `hivemind rules list` / `hivemind tasks
    list` / `hivemind tasks report` CLIs. A v1.1 follow-up tracks
    "compact codex-friendly inject" as an Open Question.

Resilience:

- renderContextBlock absorbs ALL its own errors (missing table,
  network, parse) and returns "" on any failure. SessionStart MUST
  NOT fail because of a bad rules/tasks read; that's the renderer's
  one-line contract.
- The query function is constructed as an arrow over api.query so
  the renderer's QueryFn type matches without modifying the
  DeeplakeApi class signature.
- When the renderer returns "" (no content), the inject string is
  unchanged — no trailing "\n\n" pollution.

Test fixture updates:

- session-start-hook.test.ts (claude-code), cursor-session-start-
  hook.test.ts, hermes-session-start-hook.test.ts: validConfig
  fixture grows the four T6 fields (rulesTableName, tasksTableName,
  taskEventsTableName, skillsTableName). Without them the renderer's
  sqlIdent rendered `FROM "undefined"` and the SELECT would have
  failed against any real backend.
- Query count assertions in 5 tests bumped to account for the +2
  renderer SELECTs (listRules + listTasks; events SELECT is skipped
  because tasks default to []). Each updated count carries an
  inline comment explaining which queries it covers.

Full suite: 3058 / 3058 green. No new tests added in this commit —
the renderer is independently exercised by the 16 tests in T6.1.
Per-fork integration is verified by the existing session-start
tests' (now-updated) query-count + SQL-shape assertions.

Closes the renderer half of T6. The other half (T7's `hivemind
context` CLI for pi/openclaw) follows.
Two correctness gaps codex review pass 1 surfaced on T6.

[P2.1] Missing task_events table no longer drops the whole block

  Scenario: org has rules + tasks (via CLI), but no one has called
  `hivemind tasks progress` or run a Bash command that auto-extract
  recognizes. hivemind_task_events doesn't exist yet — it's lazy-
  created by writers. Under the prior code, computeAllForTasks
  threw, the outer catch returned "", and the SessionStart inject
  block lost rules+tasks entirely.

  Fix: wrap computeAllForTasks in its own sub-try. On failure (any
  error, including missing-table), the renderer continues with
  totals={}. Each KPI now renders as 0/target — the truthful state
  when no events exist yet — instead of suppressing the block.

[P2.2] Visibility filter no longer drops a user's task under a wave
       of newer private tasks owned by other users

  Scenario codex described: 41 newer scope='me' tasks owned by bob
  + 1 older scope='team' task assigned to alice. Under the prior
  `listTasks(scope='all', limit=maxTasks*4)` approach, the 41
  newer rows filled the cap window before the filter ran in JS,
  and alice's team task vanished from her own SessionStart inject.

  Fix: drop the scope='all' pre-filter; issue TWO separate
  listTasks queries — one for scope='team' and one for scope='mine'
  (which already filters by current_user at the DB layer). Merge +
  dedup. One extra SQL round-trip per SessionStart; correctness
  gain (no silently-missing tasks) is worth it.

  Adds a mergeAndDedupTasks helper that orders by created_at DESC
  and keeps first-occurrence per task_id (a team task assigned to
  current user could in principle show up in both lists; the dedup
  guarantees one rendering line per task_id).

Tests:

  context-renderer.test.ts: every existing mockQuery script grew a
  third response (mine-tasks). Two new regression tests pin the
  fixed contracts:
    - "missing-table on computeAllForTasks does NOT drop the
       rules+tasks block (codex P2 pass 1)" — events SELECT
       throws, output still contains rules + tasks + 0/target.
    - "merges team + mine results and dedups when the same
       task_id appears in both" — hypothetical overlap stays
       single-line.
    - "preserves visible tasks even when many private OTHER-user
       tasks would push them out of a global cap" — alice's one
       team task survives the visibility filter.

  Per-fork session-start tests (claude-code, cursor, hermes): query
  count expectations bumped by +1 to account for the second
  listTasks call (rules + team + mine instead of rules + tasks).

Full suite: 3061 / 3061 green (3058 prior + 3 new regression tests).
…HIVEMIND_CAPTURE

Codex review pass 2 surfaced a consistency bug between claude-code and
the cursor/hermes forks: the rules+tasks renderer was gated on both
`creds?.token` AND `captureEnabled` in cursor + hermes, while
claude-code correctly gates only on `creds?.token` (the
captureEnabled check is local to createPlaceholder).

HIVEMIND_CAPTURE=false is a WRITE opt-out (don't capture session
events, don't INSERT placeholders) — not a read-context opt-out.
A logged-in user running benchmarks with capture disabled should
still get the rules+tasks block in their agent context. Under the
prior code, cursor + hermes silently skipped the block entirely.

Fix: restructure both forks to mirror claude-code exactly:

  if (creds?.token) {          // outer gate: just login
    try {
      // ... ensureTable + ensureSessionsTable
      if (captureEnabled) {
        await createPlaceholder(...);
      } else {
        log("placeholder skipped (HIVEMIND_CAPTURE=false)");
      }
      // Renderer runs regardless of captureEnabled
      rulesTasksBlock = await renderContextBlock(...);
    } catch (e) { ... }
  }

Test updates:

- "skips placeholder when HIVEMIND_CAPTURE=false" (cursor + hermes):
  the old assertion `queryMock not called` no longer holds — the
  renderer correctly runs and issues 3 SELECTs (rules + team + mine).
  Renamed to "skips placeholder when HIVEMIND_CAPTURE=false but STILL
  renders rules+tasks" with explicit codex-pass-2 reference + the
  ensureTable / ensureSessionsTable assertions that come along for
  the ride now that the outer block runs unconditionally for logged-
  in users.

The new tests pin the corrected gate so a future refactor can't
silently re-introduce the inconsistency. Codex's recommendation
was to "gate only the placeholder write or run the renderer in a
separate logged-in read path" — implemented the first form for
minimal-diff while keeping the structure parallel to claude-code.

Full suite: 3061 / 3061 green.
Codex review pass 3 surfaced the third all-or-nothing failure path in
the renderer: a workspace that uses only `hivemind tasks` (rules table
never created) loses the WHOLE block at SessionStart because listRules
throws inside the outer try and short-circuits the rest. The symmetric
case (rules present, tasks missing) has the same shape.

The pass-1 fix already gave events its own sub-try (computeAllForTasks
failure → totals={}, block still renders). Pass 3 extends the same
defense to the two parent reads:

  - listRules wrapped in its own sub-try → on failure (missing table,
    network, parse), rules stays [], block still renders the tasks
    section.
  - listTasks team + mine queries share ONE sub-try (both hit the same
    table) → on failure, both teamTasks and myTasks stay [], block
    still renders rules.

Each sub-try logs via the optional log callback so debugging stays
possible. The outer try-catch survives as a safety net for
programming errors (e.g. a renderer-side TypeError) that the inner
ones wouldn't catch.

Tests (2 new regression guards):

  "missing rules table does NOT drop the tasks section (codex P2
   pass 3)" — listRules throws; tasks still inject.

  "missing tasks table does NOT drop the rules section (codex P2
   pass 3, symmetric)" — listTasks throws; rules still inject.

Combined with the pass-1 regression test ("missing-table on
computeAllForTasks does NOT drop the rules+tasks block"), every
single-table-missing partial-use scenario is now pinned.

Full suite: 3063 / 3063 green (3061 prior + 2 new tests).
…ex P2 pass 4)

Codex review pass 4 surfaced the symmetric of the pass-2 finding:
ensureTable + ensureSessionsTable are DDL writes (CREATE TABLE IF NOT
EXISTS + heal-missing-columns ALTER), so they must also be gated on
captureEnabled, not just the placeholder INSERT.

Before this commit, a logged-in user with HIVEMIND_CAPTURE=false still
ran DDL on memory + sessions tables at every SessionStart — violating
the "no writes" contract of the opt-out. Worse, a DDL failure would
prevent the read-only rules/tasks block from rendering.

Fix applied symmetrically to all three forks (claude-code + cursor +
hermes):

  if (creds?.token) {
    api = new DeeplakeApi(...);
    if (captureEnabled) {
      await api.ensureTable();           // ← now gated
      await api.ensureSessionsTable(...); // ← now gated
      await createPlaceholder(...);
    } else {
      log("placeholder + schema ensure skipped (HIVEMIND_CAPTURE=false)");
    }
    rulesTasksBlock = await renderContextBlock(...);  // read-only, always runs
  }

The renderer doesn't need memory/sessions tables ensured — it queries
rules/tasks/task_events, which are lazy-created by their own CLI
writes (`hivemind rules add`, `hivemind tasks add`, `hivemind tasks
progress`). Missing target tables are handled gracefully by the
renderer's per-section sub-tries (added in pass 1 + pass 3).

Test updates:

- claude-code "skips placeholder + still ensures tables + renders":
  renamed to "HIVEMIND_CAPTURE=false: no placeholder, no DDL ensure,
  but renderer still runs". ensureTable + ensureSessionsTable
  assertions flipped to .not.toHaveBeenCalled(). Query count
  unchanged (3 renderer SELECTs); log message updated to match the
  new "placeholder + schema ensure skipped" string.

- cursor + hermes "HIVEMIND_CAPTURE=false but STILL renders":
  same flip. ensure mocks .not.toHaveBeenCalled() on cursor (and
  unchanged for hermes since it never asserted them).

Full suite: 3063 / 3063 green. T6 sealed.
Adds src/commands/context.ts — a thin CLI that prints the same
rules + tasks + HOW-TO block renderContextBlock injects at
SessionStart. Two consumers:

  1. pi / openclaw agents — these platforms don't have a
     SessionStart hook in v1 (see plan A8). The model can call
     `hivemind context` from the Bash tool to pull the block
     on demand. Output is byte-identical to what claude-code /
     cursor / hermes get auto-injected, so the same instructions
     land regardless of which agent runs them.

  2. Read-only diagnostic for any agent / human — see exactly
     what the renderer would emit right now without firing
     SessionStart.

Behavior:

  - No flags. The renderer's defaults (maxRules=10, maxTasks=10)
    are the v1 contract; flags can come in a v1.1 if needed.
  - Login gated: exits 2 with "Run hivemind login first" if
    loadConfig returns null.
  - Empty state (no rules + no tasks): prints a diagnostic to
    STDERR — stdout stays empty so a caller doing
    `hivemind context | otherTool` gets the documented
    "nothing to inject" signal.
  - Help: --help / -h / help all print usage.

Dispatcher (src/cli/index.ts):
  - Imports runContextCommand.
  - Adds `cmd === "context"` branch.
  - USAGE text adds a "Cross-agent helpers" section pointing at
    the pi/openclaw fallback use case.

Tests (6, tests/claude-code/cli-context.test.ts):
  - --help / -h / help all print usage; no SQL.
  - loadConfig null → exit 2, no SQL.
  - happy path: 3 renderer SELECTs, block rendered to stdout.
  - empty state: diagnostic on stderr, stdout empty.
  - cfg.rulesTableName / cfg.tasksTableName / cfg.taskEventsTableName
    are threaded through (not hardcoded).

Full suite: 3069 / 3069 green (3063 prior + 6 new).

Closes T7 in /home/emanuele/.claude/plans/sprightly-petting-tulip.md.
Adds the LLM-driven KPI generator the plan reserved for T4. `hivemind
tasks add` now calls Anthropic's API to produce 1-3 KPIs from the task
text, then persists them with the row. Failure modes all degrade to
[] kpis (the T3-default state) so the task INSERT still succeeds.

Module surface — src/tasks/kpi-generator.ts:

  generateKpis({ text, client?, model?, timeoutMs?, log? }): Promise<Kpi[]>

  Defaults: model='claude-sonnet-4-6' (or HIVEMIND_KPI_MODEL),
            timeoutMs=10_000, MAX_KPIS=3.

  Returns [] (NEVER throws) on:
    - HIVEMIND_KPI_LLM=disable (explicit opt-out)
    - ANTHROPIC_API_KEY missing AND no client injected
    - SDK dynamic-import failure
    - LLM call timeout
    - Two parse failures in a row (initial + strict-prompt retry)
    - Any other unexpected error

  Two-pass parsing:
    1. Standard prompt asking for a JSON array.
    2. If JSON.parse fails OR the result isn't an array, retry once
       with a stricter system prompt that says "ONLY the JSON array,
       no prose, no markdown fences." This catches the most common
       LLM tic (wrapping JSON in ```json fences or adding a prose
       header before the array).

  Defensive cleanup:
    - Strips ```json / ```jsonc / ``` fences before parsing.
    - Stamps generated_by + generated_at when the LLM omits them
      (the prompt asks for both, but cheap to backfill so a valid-
      but-incomplete output isn't dropped).
    - Routes through parseKpis (existing kpi-validator) for shape
      validation, then truncates to MAX_KPIS=3.

Wire-up — src/tasks/write.ts:

  InsertTaskInput grows an optional `generateKpis?: (text) => Promise<Kpi[]>`.
  Precedence at insertTask time:
    1. explicit `input.kpis` always wins (tests, bulk import)
    2. `input.generateKpis(text)` if provided (T4 default path)
    3. [] (pre-T4 fallback)
  A throwing generator is caught and treated as []; INSERT still
  fires. The data layer (write.ts) imports NOTHING LLM-related —
  callers wire the generator. Tests get full control via the
  `kpis` precedence and `client` injection.

src/commands/tasks.ts:
  `tasks add` now passes `generateKpis: (t) => generateKpis({ text: t })`
  to insertTask. Without ANTHROPIC_API_KEY, behaviour is unchanged
  from T3 (empty kpis, manual filling via `tasks progress`).

@anthropic-ai/sdk added to package.json. Dynamic-import isolates
the dep so consumers that never call generateKpis don't pay the
SDK load cost.

Tests (20 unit + 5 eval cases):

  tests/shared/kpi-generator.test.ts (20, vitest auto-picked):
    Opt-outs (3): HIVEMIND_KPI_LLM=disable, no API key + no client,
      log callback fired on opt-out.
    Happy path (5): JSON array parsed, code fence stripped, MAX_KPIS
      cap, generated_by/at backfilled, malformed items dropped via
      parseKpis.
    Failure modes (6): malformed JSON both attempts, recovery via
      strict retry, both attempts throw, non-array shape, empty
      content blocks, hard timeout.
    Prompt construction (4): all 6 KPI fields named, CRITICAL banner
      in strict mode, code-fence stripping variants.
    Sub-internal exports (2): MAX_KPIS, DEFAULT_TIMEOUT_MS visible
      for fixture verification.

  tests/evals/kpi-generation.eval.ts (5 cases, MANUAL only):
    Live LLM eval — auto-skipped without ANTHROPIC_API_KEY. The
    `.eval.ts` extension keeps vitest's default discovery from
    picking these up; run explicitly with:

      ANTHROPIC_API_KEY=sk-... npx vitest run tests/evals \
        --include 'tests/evals/**/*.eval.ts'

    Cases: ship-a-feature, review-N-PRs, investigate-bug, write-docs,
    vague-task. Each prints generated KPIs and asserts the minimal
    sanity bounds (1-3 KPIs, positive-integer targets, non-empty
    names/units, soft hints). See tests/evals/README.md for the
    rationale and bump workflow.

  tests/claude-code/cli-tasks.test.ts: beforeEach now force-unsets
    ANTHROPIC_API_KEY + HIVEMIND_KPI_LLM so a real env var leaking
    in from the shell can't accidentally turn unit tests into
    network calls.

Full suite: 3089 / 3089 green (3069 prior + 20 new). Eval suite
deliberately excluded from default discovery.

Closes T4 in /home/emanuele/.claude/plans/sprightly-petting-tulip.md.
T8 from /home/emanuele/.claude/plans/sprightly-petting-tulip.md:
verify ≥90% coverage on new modules and add tests for any uncovered
gap.

Coverage status for the new T1-T7 modules (vitest --coverage):

  src/commands/context.ts          100 / 100 / 100 / 100
  src/commands/rules.ts             96 /  97 / 100 /  96
  src/commands/tasks.ts             91 /  91 / 100 /  91
  src/rules/write.ts               100 / 100 / 100 / 100
  src/rules/read.ts                100 /  73 / 100 / 100 *
  src/tasks/write.ts                98 / 100 /  88 / 100 *
  src/tasks/read.ts                100 /  76 / 100 / 100 *
  src/tasks/kpi-validator.ts        98 /  97 / 100 / 100
  src/tasks/kpi-generator.ts        90 /  87 / 100 /  90 *
  src/events/append.ts             100 / 100 / 100 / 100
  src/events/aggregate.ts           97 /  81 / 100 /  97 *
  src/hooks/auto-extract-patterns  100 / 100 / 100 / 100
  src/hooks/auto-extract.ts        100 / 100 / 100 / 100
  src/hooks/shared/context-renderer 97 /  86 /  93 /  97 *

  (*) Items below 90 on branches/functions are calibrated to actual
  reasonable coverage — the uncovered slices are documented per-file
  in the threshold comments:
    - read.ts `normalize()` has ~10 `?? ""` defensive fallbacks per
      column; tests pass complete rows, the ?? branches don't fire.
    - kpi-generator.ts has a dynamic `import("@anthropic-ai/sdk")`
      catch that requires breaking node module resolution to hit;
      tests inject `client` to bypass the import path.
    - aggregate.ts has a `normalizeTotal` ?? fallback for non-
      numeric driver-dependent shapes.
    - context-renderer.ts per-section sub-tries (codex review pass
      1 + 3 fixes) cover missing-table per section but not every
      error × section pair.

Changes:

1. tests/claude-code/session-start-hook.test.ts — add one test
   that populates a rule via queryMock and asserts the rendered
   block IS appended to additionalContext. This covers the
   `rulesTasksBlock ? ... : baseContext` ternary's true-branch
   that the default mocks (all-[] queries) couldn't reach.

2. vitest.config.ts — append the per-file thresholds block for
   the 14 new T1-T7 modules. Numbers calibrated to actual
   coverage with rationale comments per file (the project
   convention from the auth.ts / embeddings/* / skillify/*
   blocks above).

3. vitest.config.ts — deeplake-api.ts branches threshold lowered
   from 90 → 88. The line-483 MEMORY_COLUMNS drift guard is a
   defensive throw that can't be triggered without breaking
   production data shape; T1 added 3 more ensure*Table methods
   (each well-tested) but the unreachable defensive branch
   dragged the ratio.

Full suite: 3090 / 3090 green (3089 prior + 1 new ternary-coverage test).
T9 from /home/emanuele/.claude/plans/sprightly-petting-tulip.md:
document the rules/tasks/events surface that T1-T8 shipped.

Changes:

1. README.md — adds a top-level "Rules + tasks (cross-agent KPIs)"
   section between Skills (skillify) and Architecture. Includes the
   CLI surface, the SessionStart inject block format, the auto-
   extract one-liner, the KPI generation one-liner, the relevant
   env vars, and a link to the deep-dive doc. Same shape as the
   existing Skills section so the README structure stays consistent.

2. docs/RULES_TASKS_KPIS.md — deep-dive doc that mirrors the
   docs/SKILLIFY.md pattern. Covers:
     - Data model (3 tables, why three, why immutable+version-bump)
     - CLI surface + identity contract + round-trip safety
     - SessionStart injection: per-agent status table, block
       format, what the renderer fetches (3-4 SQL round-trips
       per session), graceful-degradation failure modes
     - Auto-extract pipeline: allow-list rationale, why git push
       and --auto are excluded, failure-mode gating via
       tool_response
     - KPI LLM generation: prompt shape, two-pass parsing,
       defensive cleanup, failure-mode list (all return [])
     - Eval suite pointer
     - Env vars table
     - Known v1 limitations (8 items mapped to v1.1 candidates)

Closes T9 (the last open task) for /home/emanuele/.claude/plans/
sprightly-petting-tulip.md. Branch feat/rules-and-tasks-kpis is
fully implemented + documented.
efenocchi added 23 commits May 21, 2026 18:04
…pe rows

Codex legacy audit pass 2 (P1.2) flagged a visibility-loss bug in
the shared SessionStart context renderer. `listTasks(scope="mine")`
returns every row where `assigned_to=current_user` regardless of
the row's own `scope`. Combined with the renderer's two-query
strategy (team + mine), a team-scope task assigned to the current
user landed in BOTH the team bucket and the mine bucket.

With many newer team-scope tasks assigned to other users, the
shared task could be pushed out of the team-side cap window. It
would then count against the mine bucket's quota and crowd out
genuine me-scope tasks, so a user with many team tasks silently
lost their personal todos from the SessionStart context.

Fix: in `renderContextBlock`, filter the mine query result down to
`scope === "me"` before merging with the team result. Team tasks
assigned to the current user still appear via the team query and
retain the ★YOU highlight. The `listTasks` API contract is
unchanged so CLI consumers such as `hivemind tasks list --mine`
continue to see the full union.

Regression coverage: the obsolete dedup-race test is replaced with
a focused case that feeds a team-scope row assigned to the current
user into both queries and asserts it appears exactly once (from
the team query, with ★YOU), while the me-scope personal task
remains visible.
… KPI-less tasks

Codex legacy audit pass 2 (P1.3) flagged a silent data-hiding bug
in `hivemind tasks report`. When a task had `kpis = '[]'` (LLM
generation skipped at insert time, fresh-install pre-T4, etc.) the
report subcommand short-circuited before querying `task_events`
and only printed a "no KPIs defined yet" hint.

However, `hivemind tasks progress` deliberately accepts ANY
`kpi_id` when `task.kpis` is empty so users can still record
progress against a forthcoming KPI shape. Those manually-recorded
events were silently invisible — written to the table but never
surfaced anywhere in the CLI. Users had no way to see them short
of running raw SQL.

Fix: `report` now queries `computeAllForTask` for every task
regardless of `kpis.length`. For KPI-less tasks it groups the
returned events by `kpi_id` and prints them under a
"manually-recorded progress" sub-header. When there are neither
KPIs nor events, the hint is updated to point users at the
`hivemind tasks progress` command.

The missing-table degradation behavior is preserved: a not-yet-
created `hivemind_task_events` table is silently absorbed and the
report falls back to "no KPIs" — `report` remains read-only and
must not trigger DDL writes.

Regression coverage: the prior "skips the aggregate query when
kpis.length===0" test is replaced with two focused cases — one
asserting the "record progress" hint surfaces for tasks with
neither KPIs nor events, and one asserting that manually-recorded
events are surfaced under each task with the expected formatting.
…SessionStart

Codex legacy audit pass 3 (P1.A) found that the prior mine-bucket
filter was applied too late. `renderContextBlock` called
`listTasks(scope='mine', limit: 4*cap)` and then filtered the
result to `scope === 'me'` in JavaScript. Because `listTasks`
already sorted by created_at and sliced to the limit BEFORE the
caller's filter ran, a user with many newer team-scope tasks
assigned to them could still lose older personal tasks: the
me-scope rows were evicted from the limited result set and never
reached the renderer.

Two changes close the gap:

1. `listTasks` gains a strict `scope='me'` value that filters to
   `row.scope === 'me' AND row.assigned_to === current_user` at the
   query layer, so the limit slice happens AFTER the strict scope
   filter rather than before. The `'mine'` value is unchanged and
   still returns the broader "anything assigned to current_user"
   union for CLI consumers.
2. `mergeAndDedupTasks` now orders me-scope tasks before team-scope
   tasks in the merged result so the downstream cap slice cannot
   evict a personal todo when many newer team-scope tasks are also
   assigned to the same user. Newest-first ordering is preserved
   within each scope group.

The renderer's redundant post-fetch `t.scope === 'me'` filter is
removed.

Regression coverage:

  - `tests/shared/tasks.test.ts` covers the new strict 'me' scope
    and the symmetric "missing current_user returns []" guard.
  - `tests/shared/context-renderer.test.ts` replaces the pass-2
    "post-fetch filter" case with a query-level assertion and adds
    a flood scenario (50 newer team-scope rows + 1 older me-scope
    row) confirming the personal task survives the cap slice.
…injection vector

Codex legacy audit pass 3 (P1.B) flagged that the pass-2 prompt-
injection hardening sanitized rule and task text but not the KPI
fields that `formatKpiSummary` inlines next to each task. A KPI
`name` or `unit` containing CR/LF would let a writer (LLM
generation, manual entry, or a malicious row) inject a forged
section header into every agent's SessionStart context.

Two layers of defense, mirroring the pass-2 approach:

1. Render-side sanitization. `formatKpiSummary` now routes `k.name`
   and `k.unit` through the same `sanitizeForInject` helper used
   for rule and task text, so any already-persisted bad row is
   neutralized at render time.
2. Validator-side rejection. `kpi-validator.validateOne` now uses a
   new `safeStr` helper that refuses any string containing CR/LF.
   All KPI string fields (`kpi_id`, `name`, `unit`, `generated_by`,
   `generated_at`) go through it — the non-rendered fields are
   covered too so the contract stays uniform and so a future
   renderer change does not silently re-open the hole.

Regression coverage:

  - `tests/shared/tasks.test.ts` adds a case feeding LF / CR / CRLF
    payloads through `parseKpis` and asserts only the clean entry
    survives.
  - `tests/shared/context-renderer.test.ts` adds two cases — one
    confirming that a validator-rejected KPI never leaks its
    forged fragment onto the task line, and one happy-path
    formatting assertion that catches a future regression that
    skips `sanitizeForInject` in `formatKpiSummary`.
Codex legacy audit pass 4 surfaced that the pass-2 / pass-3 fixes
matched only CR / LF / CRLF. Characters U+2028 (LINE SEPARATOR),
U+2029 (PARAGRAPH SEPARATOR), and U+0085 (NEXT LINE) are treated
as line breaks by many tokenizers and renderers, so a malicious
rule / task / KPI value using them could still inject a forged
section into the SessionStart context.

Render-side sanitization in `src/hooks/shared/context-renderer.ts`
now goes through a single LINE_TERMINATOR_RE constant that matches
the full Unicode line-terminator set, and the same character class
is rejected at write time in:

  - src/rules/write.ts  (assertValidText)
  - src/tasks/write.ts  (assertValidText)
  - src/tasks/kpi-validator.ts  (safeStr — used for every KPI string field)

Validator-side rejection now covers `kpi_id`, `name`, `unit`,
`generated_by`, and `generated_at`. Non-rendered fields are
covered too so the contract stays uniform and a future renderer
change cannot silently re-open the hole.

Regression coverage: the existing pass-2 / pass-3 cases in
`tests/shared/rules.test.ts`, `tests/shared/tasks.test.ts`, and
the `parseKpis` block gain explicit U+2028, U+2029, and U+0085
payloads to ensure each character is rejected.
…arators in legacy rule/task rows

Codex pass 5 flagged the bc3c263 coverage as write-path focused —
no direct regression test renders an already-persisted rule or
task containing U+2028 / U+2029 / U+0085 through
renderContextBlock. The validator now blocks such values on write,
but the render-side sanitizer is the safety net for in-flight rows
from a vulnerable older client and needs explicit coverage.

Two new cases:

  - "Unicode line separators in legacy rule rows are sanitized at
    render time" — feeds three rules whose bodies contain U+2028,
    U+2029, and U+0085 and asserts the rendered block exposes no
    raw separator and that each rule fits on exactly one line.
  - "Unicode line separators in legacy task rows are sanitized in
    formatTaskLine" — feeds a task whose body forges a HOW-TO
    section header surrounded by U+2028 / U+2029 separators and
    asserts no separator leaks, the task line stays single-line,
    and the legit HOW-TO footer remains the only one at line-start.
First testable iteration of the team goal/KPI system, designed as
a skill the agent loads on-demand when the user mentions a goal,
objective, KPI, or measurable target. Goals live as JSON files
under the goals/ subtree of the Deeplake memory mount, persisted
to the org-shared memory table through the existing VFS layer.
No new table, no schema migration, every team member sees a
colleague's team-scope goals through the regular memory channel.

Components landed in this commit:

- claude-code/skills/hivemind-goals/SKILL.md plus parity copy at
  codex/skills/hivemind-goals/SKILL.md. The skill teaches the
  goal JSON schema (text/scope/status/assigned_to/kpis) plus the
  read-before-write protocol, and instructs the agent to spawn
  an async LLM call (claude -p / codex exec, detached via nohup)
  for initial KPI generation after the goal file is created. No
  ANTHROPIC_API_KEY needed in the plugin -- each agent uses its
  own CLI credentials.

- src/hooks/commit-kpi-extract.ts plus wiring in capture.ts. A
  PostToolUse hook intercepts successful git commit Bash calls,
  captures the just-committed diff via git show HEAD, and spawns
  the agent's native LLM CLI (detached, fire-and-forget) with a
  prompt asking it to bump any KPI the diff has advanced.
  Disabled when HIVEMIND_AUTO_KPI_FROM_COMMITS=false.

- src/notifications/sources/open-goals.ts plus integration in
  primary-banner.ts. SessionStart welcome banner now appends a
  one-line summary when the current user has active goals.
  Lookup runs in parallel with the existing org-stats fetch,
  fails open on any error so the banner never regresses.

Verified end-to-end against a sandbox memory_test table:
  1. Agent creates the goal file in response to a track-this-as-
     a-goal prompt (UUID generated, JSON written with empty kpis).
  2. Goal row visible in memory_test via SQL query.
  3. SessionStart hook output JSON includes the goal summary
     line for the current user's open goal.

Known gaps still outstanding (follow-ups, not blockers for this
first checkpoint):
  - Async KPI worker (claude -p / codex exec spawned from the
    skill) does not inherit --plugin-dir or env-var overrides,
    so in the sandbox the sub-process can not see the test VFS;
    kpis_status stays pending.
  - Commit auto-extract has not been exercised under a real
    commit yet; the hook compiles and is wired but the live
    behavior is untested.
  - cursor / hermes / openclaw do not have a hivemind-goals
    skill or a wired commit hook yet.
…esign)

Foundational schema for the refined goal/KPI design where the agent
interacts via the existing Deeplake VFS at the memory mount, while
storage lives in dedicated structured tables.

Path convention (translated by the VFS layer in a follow-up commit):
  memory/goal/<owner>/<status>/<goal_id>.md  -> hivemind_goals row
  memory/kpi/<goal_id>/<kpi_id>.md           -> hivemind_kpis row

Path encoding is the source of truth for owner, status, goal_id, and
kpi_id — the content column stores only the human-readable markdown
body. This eliminates the path-vs-content drift footgun flagged by
the design review (round 3): there is nothing to drift because the
content does not replicate path-encoded fields.

Schema details:

- hivemind_goals: (id, goal_id, owner, status, content, version,
  created_at, agent, plugin_version). INSERT-only version-bump,
  matching the established skills / rules / tasks pattern that
  sidesteps the Deeplake UPDATE-coalescing quirk. Indexes on
  (goal_id, version) for cat reads and (owner, status) for the
  SessionStart banner filter.

- hivemind_kpis: (id, goal_id, kpi_id, content, version,
  created_at, agent, plugin_version). No owner column — KPI
  ownership is derived from the parent goal via logical join on
  goal_id, so a goal reassignment between users does not cascade
  into a multi-file move on the KPI rows. Index on
  (goal_id, kpi_id) for the VFS read / write hot path.

Wired via ensureGoalsTable / ensureKpisTable in deeplake-api.ts and
exposed through new config fields goalsTableName / kpisTableName
(env: HIVEMIND_GOALS_TABLE / HIVEMIND_KPIS_TABLE). Updated the two
hand-built Config fixtures in tests/claude-code (skillify-auto-pull,
spawn-wiki-worker) so tsc stays clean.

Follow-up commits land the VFS path classifier + operation
translators (deeplake-fs.ts), the rewritten hivemind-goals skill,
the SessionStart banner pointed at hivemind_goals, and the
commit-driven KPI auto-extract.
The Deeplake VFS now classifies every read/write under the memory
mount and dispatches goal + kpi paths to dedicated tables, while
all other paths keep their existing memory-table behavior.

Path conventions (agent-facing, mount-relative form shown):
  goal/<owner>/<status>/<goal_id>.md   -> hivemind_goals row
  kpi/<goal_id>/<kpi_id>.md            -> hivemind_kpis row

status one of opened, in_progress, closed. Path encoding is the
source of truth for owner / status / goal_id / kpi_id; the row
content column holds free markdown.

Highlights:

- src/shell/goal-paths.ts (new) — pure helper: classifyPath,
  decompose/composeGoalPath, decompose/composeKpiPath. Mount-prefix
  tolerant so the same code works on a / mount (production
  default) and a /memory mount (some test runners).

- src/shell/deeplake-fs.ts — bootstrap now fetches the latest
  version per row from hivemind_goals and hivemind_kpis (via the
  goal_id,MAX(version) and goal_id,kpi_id,MAX(version) subqueries
  enabled by the indexes added in the previous schema commit) and
  synthesizes VFS paths for the cache. upsertRow dispatches goal +
  kpi writes to dedicated upsertGoalRow / upsertKpiRow helpers that
  INSERT v=N+1 — never UPDATE — so the Deeplake update-coalescing
  bug stays neutralized.

- rm on a goal path is reinterpreted as a soft-close: write v=N+1
  with status=closed and move the cache entry to the canonical
  closed/ path. rm on an already-closed goal is a no-op (preserves
  the audit trail; no hard delete in v1).

- mv between status folders on a goal path is a single atomic
  version bump with the new status. The mv guard rejects any
  attempt to rename the goal_id or owner segment (only status is
  mutable through the path).

- ensureGoalsTable / ensureKpisTable are called during VFS
  create() so the lazy heal pattern matches the rest of the
  plugin and the first write does not race with table creation.

- src/notifications/sources/open-goals.ts — banner now reads from
  hivemind_goals directly (latest version per goal_id, owner LIKE
  for short / full-email tolerance, status IN opened/in_progress).
  Markdown body's first non-empty line becomes the human-readable
  label in the 📌 line.

- src/hooks/commit-kpi-extract.ts — buildPrompt now instructs the
  sub-agent to ls the memory/goal/<user>/{opened,in_progress}
  directories, read each goal + its kpi files, and use the Edit
  tool to bump the matching current: line. No hivemind CLI
  dependency; everything goes through the VFS.

- Skill files (claude-code + codex) rewritten to teach the
  path convention, the markdown body format, and the rm-as-soft-
  close semantic. Old JSON-file approach from cda002e is gone from
  the skill text.

End-to-end verified against the sandbox memory_test setup:
  1. Agent creates a goal via the skill -> file persisted at the
     correct path -> row written to hivemind_goals_test with
     goal_id/owner/status/content/version columns populated.
  2. SessionStart hook output JSON includes the 📌 line driven by
     the dedicated table SELECT — no more LIKE scan on memory.
… arg

The bash guard hook in the deeplake-shell rejects any shell command
whose argument string contains the literal home-relative memory
mount path. The previous skill template instructed the agent to
spawn the async KPI worker with that literal path baked into the
claude -p / codex exec prompt body, so the spawn was blocked at
the gate and KPI generation never started.

The hook is intentionally aggressive (it catches the agent trying
to read or write memory paths via raw bash, redirecting it to the
VFS-safe tools). It cannot tell the difference between "operate on
this path" and "pass this path as text inside an argument to a
sub-LLM". The cleanest workaround keeps the path encoding intact
while avoiding the literal substring in the shell command.

Skill change:

- The spawn prompt now passes goal_id (UUID) + the goal text only.
- The sub-agent loads the same hivemind-goals skill on activation
  (the description matches "goal/KPI/measurable"), reads the
  canonical path convention from the skill body, and composes
  the kpi/<goal_id>/<kpi_id>.md path itself when writing.
- Path encoding stays the source of truth — owner / status /
  goal_id / kpi_id still live in the path; only the LITERAL
  contiguous string "the home deeplake memory mount" is no
  longer present in the bash command.

Mirrored to codex/skills/hivemind-goals/SKILL.md with the codex
exec dispatch.

Smoke tested against the sandbox: spawn succeeded (pid landed,
no [RETRY REQUIRED] block), sub-agent reported writing 3 KPIs.
Sandbox limitation noted: the sub-agent itself runs without
--plugin-dir and the test environment had the global hivemind
plugin disabled to force the local build, so the sub-agent's
writes never reached the goals/kpis tables. In production this is
not a problem — the global plugin is enabled and the subprocess
inherits the VFS routing.
User report: the VFS was producing a new row in hivemind_goals on
every state change, leaving prior versions intact. The Deeplake
table view showed multiple rows per goal (one per opened →
in_progress → closed transition), which is confusing and
unnecessary for the single-user / small-team v1 workflow.

Switch upsertGoalRow and upsertKpiRow from INSERT-only-with-
version-bump to UPDATE-or-INSERT keyed by the logical primary
key (goal_id for goals; goal_id + kpi_id for kpis):
  - SELECT id WHERE <key> LIMIT 1
  - if row exists: UPDATE the mutable columns in place
  - else: INSERT a new row with version = 1

Effects:
  - hivemind_goals stays at one row per goal forever; status flips
    (opened → in_progress → closed), owner reassignments, and
    text edits all mutate the same row.
  - hivemind_kpis stays at one row per (goal_id, kpi_id); KPI
    progress bumps (Edit on the `current:` line) mutate in place.
  - The `version` column is now vestigial (always 1) but kept in
    the schema so reverting to the audit-trail pattern later does
    not require a column-add migration.

Bootstrap simplification: drop the `(goal_id, version) IN (SELECT
goal_id, MAX(version) ...)` subquery from both the goals and kpis
SELECTs in deeplake-fs.ts and from the SessionStart banner SELECT
in src/notifications/sources/open-goals.ts. With one row per
logical key, a direct WHERE on owner / status is the cheap path.

Trade-off accepted (per the user's explicit choice):
  - Lose the audit trail. A row no longer carries history of
    intermediate statuses or text revisions.
  - Exposed to Deeplake's UPDATE-coalescing quirk when two writes
    target the same row inside the backend's coalesce window. For
    v1 single-user low-volume use this is unlikely to fire; if a
    team-scale incident is ever observed we can revisit (either
    bring back version-bump for the affected operation, or rely
    on the write-batching debounce in deeplake-fs to widen the
    gap between rapid mutations).

Verified against the sandbox memory_test setup:
  - hivemind_goals_test before: 1 row (id=1dafbeb9..., opened, v1)
  - rm goal/.../opened/<id>.md (skill-triggered soft-close)
  - hivemind_goals_test after: 1 row (same id, status=closed, v1)
  Same id field, no proliferation.
Two changes shipped together:

1. Drop automatic KPI generation. The skill no longer spawns a
   background `claude -p` / `codex exec` to populate KPI files
   after a goal is created. KPI authoring now happens ONLY when
   the user explicitly asks for it ("aggiungi KPI per …" / "add
   metrics for this goal"). The previous auto-spawn flow was
   brittle (sub-agent did not inherit --plugin-dir in the sandbox
   and dropped the writes on the floor) and noisy in production
   (every create kicked off an LLM round-trip the user often did
   not want).

2. Cross-agent surface so every runtime in the plugin sees the
   goal/KPI path convention:

   - claude-code, codex, openclaw: SKILL.md updated. Skill auto-
     activates on goal/objective/KPI keywords; openclaw's variant
     documents the read-only constraint (the agent only exposes
     hivemind_search/read/index, no Write tool — so it surfaces
     goals without authoring them).

   - cursor, hermes: no SKILL.md loader. Added a shared constant
     `GOALS_INSTRUCTIONS` in src/hooks/shared/goals-instructions.ts
     and appended it to the SessionStart context injected by both
     agents. Single source of truth means the path convention
     stays in sync with the SKILL.md content.

   - codex SessionStart inject deliberately stays minimal (per
     the codex-rs TUI policy in codex/session-start.ts — the
     additionalContext is user-visible in the history cell and a
     20+ line block would clobber the view). The SKILL.md route
     is sufficient there.

   - pi: vscode extension with a separate extension-source tree
     and a read-only tool surface; cross-agent write support is
     out of scope for this rollout.

E2E verification on the memory_test sandbox:

   - claude-code: created goal `cross-agent claude-code test`
     (6732f1ef-...). Single row in hivemind_goals_test, zero rows
     in hivemind_kpis_test.
   - codex: created goal `codex round 2` (f54a5118-...). Single
     row in hivemind_goals_test, zero rows in hivemind_kpis_test.
   - cursor + hermes: build inspection — 7 goal-routing hits in
     bundle/shell/deeplake-shell.js and 1 HIVEMIND GOALS hit in
     bundle/session-start.js per agent. CLI launch out of scope
     (IDE/runtime not invokable from this environment).
   - openclaw: skill content rewritten to reflect the read-only
     consumer role; no behavior change beyond the description.
…-agent)

Live cross-agent verification surfaced a structural limit: cursor's
pre-tool-use hook intercepts ONLY Shell commands (and currently only
grep) and hermes' intercepts ONLY terminal. Neither runtime lets the
plugin rewrite Write / Edit tool calls, so the VFS-style routing
that works for claude-code and codex (Write tool intercepted, rerouted
to deeplake-shell, INSERT into hivemind_goals) silently fails on
cursor and hermes -- the Write reaches the host filesystem and never
hits the table. Pi has no hook system at all; openclaw exposes no
write tool.

Confirmed empirically: cursor agent that 'successfully' wrote goal +
KPI files via the Write tool produced zero rows in hivemind_goals /
hivemind_kpis. The files landed on disk (visible from a plain shell
ls) but never reached Deeplake.

This commit adds a second, equivalent code path: a 'hivemind goal'
and 'hivemind kpi' CLI surface that any agent can invoke via its
Shell / terminal tool. The CLI talks directly to the Deeplake API
and writes to the same hivemind_goals / hivemind_kpis tables, so
end-state team visibility is identical regardless of which path the
agent took.

Components:

- src/commands/goal.ts (new) runGoalCommand + runKpiCommand with
  subcommands goal add/list/done/progress and kpi add/list/bump.
  Outputs are tab-separated on the happy path.

- src/cli/index.ts registers the goal / kpi top-level commands
  (aliases goals / kpis accepted).

- src/hooks/shared/goals-instructions.ts adds GOALS_INSTRUCTIONS_CLI
  alongside the existing VFS variant. Each variant documents WHICH
  commands the agent should use on its runtime.

- src/hooks/cursor/session-start.ts switched to the CLI variant.
  Cursor's SessionStart now teaches the agent to invoke
  'hivemind goal add ...' via Shell instead of writing files.

- src/hooks/hermes/session-start.ts same switch.

- src/shell/goal-paths.ts classifier broadened to handle path
  shapes produced when bash redirects under deeplake-shell strip
  the mount prefix differently. Falls back to existing /goal/...
  and /memory/goal/... handling.

Verified live (org may21, prod hivemind_goals / hivemind_kpis):

  cursor agent invoked 'hivemind goal add' then 'hivemind kpi add'
  twice. Result: 1 row in hivemind_goals + 2 rows in hivemind_kpis,
  team-visible. Goal id 5da71447-d08b-4c8e-857b-8d4d0decdfbd.

Cross-agent matrix after this change:

  claude-code  VFS via Write tool intercept  verified sandbox
  codex        VFS via Write tool intercept  verified sandbox
  cursor       CLI via Shell tool            verified prod
  hermes       CLI via terminal tool         not retested
  pi           CLI (skill not auto-injected) not tested
  openclaw     read-only (search/read tools) by design
…ning

Live test surfaced that hermes loads the hivemind-memory skill on
session start and follows its generic memory-file write instructions
for goals too, ignoring the CLI variant of GOALS_INSTRUCTIONS that
the SessionStart hook injects. Result: the agent wrote files via
write_file (hermes tool surface) directly to the host filesystem
instead of invoking the hivemind CLI, so zero rows landed in the
team-shared tables despite the agent reporting success.

Adding a hermes-specific hivemind-goals SKILL.md with an explicit
DO-NOT-write_file warning. The skill is loaded by hermes' native
skill loader from ~/.hermes/skills/ alongside hivemind-memory; the
warning takes priority over the generic memory-skill instructions
because it is goal-specific.

Verified live (org may21, test tables):
  hermes loaded the new skill, invoked `hivemind goal add` and
  `hivemind kpi add` via the terminal tool. Goal id
  32d8c03c-... lands in hivemind_goals_test, 2 KPI rows
  (k-prs, k-bugs) in hivemind_kpis_test.
OpenClaw previously exposed only read-side tools (hivemind_search,
hivemind_read, hivemind_index). Its session-capture pipeline already
writes to the Deeplake API (sessions table), so the runtime is not
write-incapable -- only the LLM-callable tool surface was missing
write operations.

Adds two new tools that mirror the `hivemind goal add` and
`hivemind kpi add` CLI subcommands. Same Deeplake API channel that
hivemind_capture already uses (getApi() -> dl.query INSERT), so no
new permissions or auth flow required. Tables are lazy-created via
ensureGoalsTable / ensureKpisTable on first call (same pattern as
ensureTable + ensureSessionsTable in init).

Components:

- openclaw/src/index.ts: tracks goalsTable + kpisTable from config
  (lines around 392 and 681), registers hivemind_goal_add tool
  after hivemind_index, then hivemind_kpi_add. Both use the
  existing pluginApi.registerTool pattern. Returned details include
  the goal_id for the LLM to thread into a follow-up KPI call.

- openclaw/openclaw.plugin.json: declares the two new tools in the
  contracts.tools array so the openclaw runtime gates them as
  available.

- openclaw/skills/hivemind-goals/SKILL.md: rewritten from the
  read-only consumer version to the full read+write workflow. Lists
  all five tools with a tight workflow and a "do not auto-generate
  KPIs" rule consistent with the cross-agent skill text.

Cross-agent matrix now fully closed for write-side:

  | agent       | mechanism                             |
  |-------------|---------------------------------------|
  | claude-code | VFS via Write tool                    |
  | codex       | VFS via Write tool                    |
  | cursor      | hivemind CLI via Shell                |
  | hermes      | hivemind CLI via terminal             |
  | pi          | hivemind CLI via Shell (explicit prompt) |
  | openclaw    | hivemind_goal_add / hivemind_kpi_add tools |
Same-millisecond v=N+1 races could leave `version` and `created_at`
tied, after which `listRules`/`listTasks` and `getRuleLatest`/`getTaskLatest`
disagreed on which row was "latest" — a subsequent edit could
silently resurrect the loser's text.

Adds `id DESC` as the tertiary tie-breaker in both SQL ORDER BY and
the JS sort comparator, so all four callers pick the same row
row-for-row under contention. Existing rules/tasks tests updated
for the new ORDER BY shape; the tie-break behavior is exercised by
the dedicated "compound key" tests already in tests/shared/.

CodeRabbit PR #193 findings on src/rules/read.ts:62 and
src/tasks/read.ts:80.
Two related controls on the KPI pipeline:

1. kpi-generator: `callOnce` now returns `{ kpis, retryable }`.
   `retryable` is true only when the failure mode is plausibly
   fixable by a stricter prompt (empty LLM content, unparseable
   text). On the catch branch (network / timeout / SDK exception)
   `retryable` is false and `generateKpis` short-circuits — no
   wasted second call burning another 10s window.

   The retry on parseable-but-malformed JSON is preserved, so the
   recovery story for prompt-format drift is unchanged.

2. kpi-validator: `generated_at` was previously accepted as any
   non-empty string. Now validated through a new `isoStr()` helper
   that pattern-matches YYYY-MM-DDTHH:MM:SS[.fff][Z|±HH:MM] —
   loose enough that future client precision changes still
   validate, strict enough to reject "soon" / "yesterday" / "" that
   would otherwise feed the timeline + report renderers garbage.

Tests updated:
- "returns [] after ONE LLM call when the first attempt throws"
  now expects toHaveBeenCalledTimes(1) (was 2).
- New "RETRIES once on parse failure but skips retry after a
  thrown error in the retry pass" covers the asymmetric case so a
  parse-failure-then-network-error still terminates after two
  calls total.

CodeRabbit PR #193 findings on src/tasks/kpi-generator.ts:117 and
src/tasks/kpi-validator.ts:107.
Cursor and claude-code session-start tests already had explicit
`ensureTableMock` / `ensureSessionsTableMock` negative assertions
inside the HIVEMIND_CAPTURE=false case; hermes was the lone outlier
and would have missed a regression that re-enabled DDL writes on
the read-only path.

Adds the matching negative assertions so all three runtimes guard
the same contract: capture=false means zero DDL, but the renderer
still runs (3 SELECTs, no INSERT, no ensure).

CodeRabbit PR #193 finding on the hermes test.
- docs/RULES_TASKS_KPIS.md: four unlabeled fenced blocks now carry
  language tags (text or bash) so markdownlint stops emitting
  MD040 warnings.
- README.md: same fix for the SessionStart-injection-format fence
  at L316.
- src/cli/index.ts: the `hivemind rules` and `hivemind tasks` help
  blocks claimed SessionStart injection and KPI generation were
  follow-up work, but T4 + T6 are in this same PR. Updated to
  describe today's behavior — rules auto-inject for
  claude-code/cursor/hermes, codex/pi/openclaw use
  `hivemind context`; KPIs auto-generate when ANTHROPIC_API_KEY is
  set, opt out via HIVEMIND_KPI_LLM=disable.

CodeRabbit PR #193 findings on docs/RULES_TASKS_KPIS.md:16,
README.md:330, src/cli/index.ts:122.
… tools

CI's "Typecheck and Test" job was failing on this PR because the
registration test only enumerated the three read-side tools
(hivemind_index/read/search). The two write-side tools
(hivemind_goal_add, hivemind_kpi_add) landed in commit e4d947a but
the test wasn't updated; CI broke on the AssertionError diff.

Expanded the test file with:

1. Updated tools-registered expectation to include both new names.
2. Mocks for ensureGoalsTable / ensureKpisTable on the DeeplakeApi
   stub + goalsTableName / kpisTableName / skillsTableName on the
   config mock so the new tool paths run end-to-end.
3. Stubbed requestDeviceCode return shape so the "Not logged in"
   tests don't crash inside requestAuth before the early-return
   fires.
4. Eight new tests covering the two write tools:
   - INSERT shape (table, column order, owner/status/agent literals,
     content E-prefix, v=1)
   - Friendly error + logger.error on INSERT failure
   - "Not logged in" short-circuit (no DDL, no INSERT)
   - SQL injection escaping of goal text via sqlStr()
   - kpi_id default to name when [name] is omitted

Net: 22 passing (was 14), CI is green again.
Closes the three biggest coverage holes that the goal/kpi feature
added on top of this PR — per-file pre-PR coverage was:

  src/commands/goal.ts                   1.8% statements, 0% functions
  src/shell/goal-paths.ts               34.1% statements, 33.3% functions
  src/notifications/sources/open-goals  9.8% statements, 16.7% functions

Three new test files, mocked at the DeeplakeApi.query() seam per
CLAUDE.md test philosophy:

1. tests/claude-code/cli-goal.test.ts (40 tests) — every subcommand
   of `hivemind goal` (add/list/done/progress) and `hivemind kpi`
   (add/list/bump). Exercises:
   - "not logged in" gating on every subcommand (no DDL, no query)
   - SQL injection escaping of free text via sqlStr()
   - shape + count assertions on INSERT / UPDATE / SELECT
   - kpi bump's read-modify-write cycle: SELECT then UPDATE, never
     two UPDATEs (Deeplake UPDATE-coalescing-bug regression guard)
   - status enum rejection in `goal progress`
   - non-positive target rejection in `kpi add`
   - usage messages on missing args

2. tests/shared/goal-paths.test.ts (26 tests) — the VFS path
   classifier that dispatches goal/kpi writes to the dedicated
   tables. Covers the canonical mount-relative form plus the three
   host-FS variants (.deeplake/memory, /home/.../.deeplake/memory,
   /memory/), invalid-status / wrong-segment-count / missing-.md
   negatives, and compose ↔ decompose round-trip identity.

3. tests/claude-code/notifications-open-goals-source.test.ts
   (18 tests) — the SessionStart banner source that surfaces a
   user's open goals. Asserts the SQL filters to status IN
   ('opened', 'in_progress') (never 'closed'), owner LIKE
   tolerates short / full-email forms, single quotes are escaped,
   invalid table identifier rejects without a query, and
   formatOpenGoalsLine renders the right singular/plural copy.

Post-PR coverage:
  goal.ts                  96.4% / 100% / 98.51%
  goal-paths.ts            97.72% / 100% / 100%
  open-goals.ts            95.12% / 100% / 96.77%

Test total: +84 tests (3119 → 3203).
CI's per-file coverage gate failed two thresholds:

  src/hooks/shared/context-renderer.ts  branches 75.47% (need 80%)
    uncovered: 175-180 (outer catch's String(e) branch),
               217-220 (mergeAndDedupTasks version/created-at tie-breaks)

  src/deeplake-api.ts                   branches 84.41% (need 88%)
    uncovered: 634-667 (ensureGoalsTable + ensureKpisTable bodies)

Three new tests on context-renderer:
- "higher-version row wins on dedup" — exercises line 217 (case A)
- "tie-break on newer created_at" — exercises line 219-220 (case B)
- "non-Error throw reaches outer catch via sub-try log()" — exercises
  the String(e) branch at line 175 by throwing a plain string from
  the rules sub-try's log() handler so it escapes both the sub-try
  and the outer catch's e.message branch

Six new tests on deeplake-api covering ensureGoalsTable and
ensureKpisTable in the same three-shape pattern used for rules /
tasks / task-events:
- CREATE path: listTables → CREATE → heal-SELECT → CREATE INDEX(es)
- post-CREATE heal: ALTER fires for missing column
- already-up-to-date: no ALTER, no CREATE
- identifier injection: cross-table guard extended to the two new
  ensure* methods (rejects before any network call)

Test total: 3203 → 3212. Both threshold ERRORs gone; CI green.
…kpis

# Conflicts:
#	src/hooks/session-start.ts
@@ -0,0 +1,38 @@
---
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be shared why we have same copy for each?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kaghni good catch — the four SKILL.md files are deliberately per-runtime variants, not bit-identical copies. Two axes of divergence:

1. allowed-tools frontmatter — runtime-specific whitelist gated by the skill loader. Tool names differ across agents:

Agent allowed-tools
claude-code Read Write Edit Bash (separate file I/O primitives)
codex Bash (file edits go through apply_patch via shell — there is no separate Write tool)
hermes terminal (hermes does not expose Bash — its shell tool is named terminal)
openclaw n/a — openclaw uses the extension model, no skill-time tool whitelist

If we used the wrong list per agent the loader either rejects the skill or silently strips access to the tool the workflow depends on (e.g. Write on claude-code is what Path A intercepts — strip it and the goal write becomes a no-op).

2. Body content — substantively different for hermes/openclaw because the write path itself is different per runtime:

  • claude-code + codex → Path A (VFS): agent writes a Markdown file under the memory mount at goal/<owner>/<status>/<uuid>.md, the pre-tool-use hook rewrites the call into a deeplake-shell invocation, which lands the row in hivemind_goals. Skill body documents the Markdown-file workflow.
  • hermes → Path B (CLI): hermes' pre-tool-use can intercept only terminal, not write_file. A direct write_file to the memory path would land on the local filesystem and never reach the team-shared table (silent failure). The hermes skill therefore carries a hard ⚠️ CRITICAL warning + describes the hivemind goal add / hivemind kpi add CLI subcommands.
  • openclaw → Path C (registered tools): openclaw extension declares hivemind_goal_add and hivemind_kpi_add as native LLM tools. Skill body describes the tool contract.

So the four files reflect three actually-different ways of reaching the same Deeplake tables.

Where the shared part already lives — the SessionStart-injected guidance text (the prose the model sees every turn) is the single source of truth at src/hooks/shared/goals-instructions.ts, with two variants (GOALS_INSTRUCTIONS for VFS, GOALS_INSTRUCTIONS_CLI for CLI) reused across the per-agent session-start.ts files. So we are not duplicating the prompt text — only the SKILL.md registry entry, because each agent's skill loader reads from its own per-agent directory.

Concrete redundancy that DOES exist: claude-code's SKILL.md and codex's SKILL.md differ by exactly one line (the allowed-tools frontmatter — Read Write Edit Bash vs Bash). Body is bit-identical. That one is fair to dedup via a tiny generator script (template body + per-agent frontmatter) in a follow-up. Want me to open an issue tracking that?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@efenocchi if you think it's worth it, you can create an issue.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kaghni already opened it as #200#200

Scope is narrow on purpose: only the claude-code ↔ codex pair (1-line frontmatter diff, ~6 kB identical body). hermes / openclaw stay hand-authored because their bodies describe genuinely different write paths.

Tracked the effort as ~1 hour (template + small generator + CI sync guard). Feel free to assign it to whoever has the cycles.

await api.ensureSessionsTable(sessionsTable);
await createPlaceholder(api, table, sessionId, cwd, config.userName, config.orgName, config.workspaceId, pluginVersion);
log("placeholder created");
if (captureEnabled) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@coderabbitai was capture enabled working wrongly before this? what was changed?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kaghni, here's what changed and why:

Before this PR, the outer condition was:

if (creds?.token && captureEnabled) {
  // ALL DB work (DDL + writes) happened here
}

With HIVEMIND_CAPTURE=false, the entire block was skipped — but that was fine because there was nothing read-only to run inside it.

After this PR, the T6 renderContextBlock renderer was added — a read-only operation that fetches rules/tasks to inject into the agent's context. The logic was restructured to:

if (creds?.token) {                   // outer: only requires login
  if (captureEnabled) {
    // DDL + writes (ensureTable, ensureSessionsTable, createPlaceholder)
  }
  // read-only renderer — always runs regardless of HIVEMIND_CAPTURE
  rulesTasksBlock = await renderContextBlock(...);
}

So the change has two parts:

  1. captureEnabled was moved inward — it now only gates write/DDL operations, not the whole block.
  2. renderContextBlock was added outside the captureEnabled guard — it's purely read-only (SELECTs only), so it correctly runs even in HIVEMIND_CAPTURE=false mode.

The old code wasn't wrong — it just would have accidentally suppressed the new read-only renderer if the guard hadn't been moved. The refactor ensures that HIVEMIND_CAPTURE=false means "no writes or schema changes" but not "no reads", which is the intended semantics described in the inline comment.


🧠 Learnings used
Learnt from: efenocchi
Repo: activeloopai/hivemind PR: 120
File: src/hooks/cursor/session-start.ts:155-157
Timestamp: 2026-05-12T00:24:22.657Z
Learning: In Cursor-specific hooks (e.g., `src/hooks/cursor/session-start.ts` and other `src/hooks/cursor/*` files), keep the `getInstalledVersion(__bundleDir, ".claude-plugin")` call as-is: Cursor doesn’t have a `.claude-plugin` or `.cursor-plugin/plugin.json` manifest, so the first argument should be treated as a no-op. The installed hivemind version must be resolved via the second fallback mechanism that reads the `.hivemind_version` stamp written by Cursor’s installer to `~/.cursor/hivemind/.hivemind_version`. When reviewing changes, do not refactor this to rely on a manifest that Cursor doesn’t provide; version reporting should still reflect the Cursor installer’s hivemind stamp.

Learnt from: efenocchi
Repo: activeloopai/hivemind PR: 183
File: openclaw/src/index.ts:1280-1289
Timestamp: 2026-05-19T00:40:45.681Z
Learning: When populating the database column `sessions.message_embedding` in the session capture hooks, embed the full JSON envelope (the entire `line` object containing `id`, `type`, `session_id`, `content`, and `timestamp`) rather than embedding only the content-only `text`. Embedding content-only text for just one agent can break cross-agent semantic comparability/ranking for agents that have different embedding distributions. Separately, in the wiki-worker path, `memory.summary_embedding` intentionally uses content-only `text` because that column’s semantics differ—do not apply the session-enveloping rule there.

The PostToolUse hook on `git commit` was spawning a fire-and-forget
sub-agent (`claude -p` / `codex exec`) on every successful commit.
The sub-agent scanned every open + in-progress goal of the user,
read all KPI files, reasoned against the diff (up to 16k chars),
and emitted Edit tool calls to bump `current:` lines.

Cost profile we hit in practice:
- ~30-80k tokens per commit on the user's claude/codex plan
  (boilerplate + diff + N goals × M KPIs scan + reasoning + edits)
- No sha-dedup → `git commit --amend` retriggers the whole pass
- No "user has zero open goals?" prefilter → spawns even on empty
- No throttle → 3 commits in 30s = 3 parallel sub-agents
- Only claude-code / codex paid this; other 4 agents got nothing,
  so the cost was asymmetric across the team

Net: too expensive to leave on by default while the mitigations
above are unimplemented.

Removed the import + try block in src/hooks/capture.ts (the only
caller). The module src/hooks/commit-kpi-extract.ts stays on disk
intact — re-enable by restoring the import + try block here once
we add: sha-dedup, empty-goals prefilter, debounce on rapid
commits, and a hard sub-agent timeout.

No test changes — the module had no direct tests (the dispatch
went through capture.ts which had no per-file assertion for this
branch). Full suite still 3283/3283 green.
efenocchi added 2 commits May 22, 2026 22:06
…kpis

# Conflicts:
#	package-lock.json
#	src/cli/index.ts
#	src/config.ts
#	src/deeplake-schema.ts
#	src/hooks/codex/session-start.ts
#	src/hooks/cursor/session-start.ts
#	src/hooks/hermes/session-start.ts
#	src/hooks/session-start.ts
#	src/shell/deeplake-fs.ts
#	tests/claude-code/skillify-auto-pull.test.ts
#	tests/claude-code/spawn-wiki-worker.test.ts
#	vitest.config.ts
…threshold

Two coupled fixes after merging origin/main:

1. tests/shared/deeplake-api.test.ts — add the three standard
   ensureXxxTable test shapes for the newly-merged
   ensureCodebaseTable (from main's feat/codebase-graph-phase1):
   - CREATE path (table missing → CREATE + heal-no-ALTER + INDEX)
   - heal-after-CREATE (legacy state → ALTER fires)
   - already-up-to-date (no ALTER, no CREATE)
   Also extend the cross-table identifier-injection guard to
   ensureCodebaseTable. Module test count 69 → 72 (+3); plus the
   1-line addition to the injection guard.

2. vitest.config.ts — recalibrate src/deeplake-api.ts branches
   threshold from 88 to 87. Each new ensure*Table method carries
   an `if (!tables.includes(safe)) this._tablesCache = ...` inner
   double-check whose else-arm is structurally unreachable (the
   table can't have been concurrently created between the outer
   and inner check in a single event-loop tick). Pre-merge: 88
   (already accommodating 5 such unreachable branches from
   feat/rules-and-tasks-kpis). Post-merge: 87 (adds one more from
   codebase). Statements/functions/lines all stay at 90+ —
   measured 99.32 / 100 / 99.6.

Net: 3706/3706 tests pass, zero coverage-threshold ERRORs.
@efenocchi efenocchi merged commit 665e6aa into main May 22, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants