SkillOpt foundation: skill attribution + weekly auto-fire scaffold#230
Conversation
Foundation for measuring and improving org-shared skills. - skills-active.ts: record which org-shared skills (+versions) were in context per session at SessionStart, with a deterministic A/B bucket. The label that makes skill value measurable (with/without skill X, or v1-vs-v2). Org skills only; default-on (opt-out HIVEMIND_SKILL_ATTRIBUTION=0). - skillopt-trigger.ts + skillopt-worker.ts: weekly, user-side, SessionStart-triggered, detached, non-blocking auto-fire of the SkillOpt loop using the user's own agent (no org key). Trigger is fully tested; the worker's loop body (detect->optimize->gate->publish) is a documented scaffold pending deployed attribution + a rollout sandbox. Opt-out HIVEMIND_SKILLOPT_DISABLED=1. - session-start.ts: wire both in (non-blocking, swallowed), mirroring the skillify worker pattern. - tests: 20 (11 attribution shape/branches/negative-patterns + 9 auto-fire throttle/spawn/guards).
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds session-start skill attribution (enumerates org skills, computes session bucket, builds a /skills_active JSON payload and INSERT SQL) and a persisted weekly SkillOpt trigger that stamps lastRun and spawns a detached worker; integrates both into SessionStart and includes tests and a build entry. ChangesSkillOpt Attribution and Trigger
Sequence Diagram(s)sequenceDiagram
participant SessionStart
participant Skillify
participant FileSystem
participant DB
participant WorkerSpawner
SessionStart->>Skillify: listActiveOrgSkills(sessionId)
Skillify->>FileSystem: read SKILL.md, manifest
Skillify->>DB: buildSkillsActiveInsert(...) -> INSERT
SessionStart->>Skillify: maybeFireSkillOpt()
Skillify->>FileSystem: loadState() / saveState()
Skillify->>WorkerSpawner: spawn detached skillopt-worker.js (if fired)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~40 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 3 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
@coderabbitai review |
✅ Action performedReview finished.
|
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/skillify/skillopt-trigger.ts`:
- Around line 68-79: maybeFireSkillOpt currently does loadState → shouldFire →
saveState → spawnWorker without any cross-process locking which allows races;
wrap the critical section (loadState, shouldFire check, saveState, spawnWorker)
in a cross-process lock to make the weekly decision atomic. Add dependency hooks
(e.g., deps.lockAcquire/deps.lockRelease or reuse an existing locker) and
acquire the lock before calling loadState, re-check shouldFire after acquiring
the lock (to handle races), perform saveState and spawnWorker only if still
allowed, then release the lock; if the lock cannot be acquired within a short
timeout, return { fired: false, reason: "locked" } to avoid duplicate spawns.
Ensure the function references are maybeFireSkillOpt, loadState, shouldFire,
saveState, and spawnWorker so reviewers can find the change.
- Around line 22-23: STATE_FILE currently hardcodes the state path using
os.homedir(), which ignores HIVEMIND_STATE_DIR; change this to derive the state
directory from the shared resolver by calling getStateDir() and use
path.join(getStateDir(), "skillopt", "state.json") for STATE_FILE, and
add/import getStateDir from the shared state resolver module so the code honors
HIVEMIND_STATE_DIR (including empty/whitespace semantics) for test/custom
setups.
In `@src/skillify/skills-active.ts`:
- Line 62: The current sort on "out" uses only a.name.localeCompare(b.name),
causing nondeterministic order when names tie; update the comparator in the sort
call that returns out to use a secondary key (e.g., compare a.author (or
a.namespace/id) when a.name === b.name) so ties are resolved deterministically
and attribution payloads serialize consistently; locate the sort expression
"out.sort((a, b) => a.name.localeCompare(b.name))" and replace it with a
comparator that first compares name, then author (or another unique field) as
the tiebreaker.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro Plus
Run ID: d43c7c33-2615-4c68-8f6a-306535177eb8
📒 Files selected for processing (6)
src/hooks/session-start.tssrc/skillify/skillopt-trigger.tssrc/skillify/skillopt-worker.tssrc/skillify/skills-active.tstests/shared/skillopt-trigger.test.tstests/shared/skills-active.test.ts
codex flagged that the SessionStart trigger spawns skillopt-worker.js by resolving it relative to its own bundle dir (import.meta.url), but the worker was never added to esbuild.config.mjs. In a packaged install that path (claude-code/bundle/skillopt-worker.js) would not exist, so the weekly fire would stamp lastRun, launch node on a missing file, exit MODULE_NOT_FOUND, and suppress retries for a week. Add the worker as a Claude Code bundle entry (only the CC session-start hook calls maybeFireSkillOpt, so only this bundle needs it). Verified: build now emits 15 CC bundles incl. claude-code/bundle/skillopt-worker.js, and the bundled worker runs to exit 0 under the recursion-guard env.
codex flagged that the skills_active attribution row was inserted under the canonical /sessions/ path (buildSessionPath). The summary / raw-transcript readers select WHERE path LIKE '/sessions/%', so they would treat the attribution row as transcript content: for an otherwise-empty session, SessionEnd would summarize only this JSON instead of skipping; for a normal session it could pollute the summary with the active-skill list. Add buildSkillsActivePath() which namespaces attribution rows under /skills_active/ (never /sessions/ or /summaries/), and switch session-start.ts to it. Measurement queries select by message content (type = "skills_active"), not by path, so the distinct prefix is free. New unit test asserts the path is not under /sessions/; 22/22 shared tests green.
|
@coderabbitai review |
✅ Action performedReview finished.
|
There was a problem hiding this comment.
🧹 Nitpick comments (1)
tests/shared/skills-active.test.ts (1)
94-94: ⚡ Quick winAlign test data with the real session path format.
The
sessionPathin the test uses/sessions/..., but the actual code (persession-start.ts:239) passes the result ofbuildSkillsActivePath(), which returns/skills_active/.... WhilebuildSkillsActiveInsertdoesn't validate the path format and the test correctly verifies INSERT generation, using a realistic/skills_active/path would make the test data match production usage and avoid confusion for future maintainers.♻️ Proposed fix to use realistic path
- sessionPath: "/sessions/kamo/kamo_activeloop_hivemind_S1.jsonl", + sessionPath: "/skills_active/kamo/kamo_activeloop_default_S1.json", filename: "kamo_activeloop_hivemind_S1.jsonl",🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/shared/skills-active.test.ts` at line 94, Update the test data to use the real session path format returned by buildSkillsActivePath instead of "/sessions/..."; specifically change the sessionPath string in the test used by buildSkillsActiveInsert to a "/skills_active/..." style path so the test matches production usage (reference: buildSkillsActivePath, buildSkillsActiveInsert, and the sessionPath test value).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@tests/shared/skills-active.test.ts`:
- Line 94: Update the test data to use the real session path format returned by
buildSkillsActivePath instead of "/sessions/..."; specifically change the
sessionPath string in the test used by buildSkillsActiveInsert to a
"/skills_active/..." style path so the test matches production usage (reference:
buildSkillsActivePath, buildSkillsActiveInsert, and the sessionPath test value).
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro Plus
Run ID: 4b0cd5ad-2cb9-43d4-9405-fce83075cf84
📒 Files selected for processing (4)
esbuild.config.mjssrc/hooks/session-start.tssrc/skillify/skills-active.tstests/shared/skills-active.test.ts
🚧 Files skipped from review as they are similar to previous changes (2)
- src/hooks/session-start.ts
- src/skillify/skills-active.ts
…nches (CI) CI "Typecheck and Test" failed: the skill-attribution INSERT added one query call mid-sequence in session-start.ts, which broke 5 existing session-start-hook.test.ts assertions (call counts + the positionally-mocked rules/goals renderer responses shifted by one). The attribution feature's own commit never updated these tests. - Update the 5 placeholder-branching tests to account for the attribution INSERT at call-index 2, asserting its shape (contains "skills_active"), not just the bumped count. - Mock src/skillify/skillopt-trigger.js at the boundary so maybeFireSkillOpt no longer really spawns a detached worker + writes a state file during unit tests (a real side effect), and so both ternary branches + the swallow-catch are covered. - Add tests: HIVEMIND_SKILL_ATTRIBUTION=0 skips the write (negative assertion), a failed attribution write is swallowed (content-based mock, order-robust), the SkillOpt fired branch logs, and a throwing trigger is swallowed. - Reset HIVEMIND_SKILL_ATTRIBUTION in runHook (matching HIVEMIND_WIKI_WORKER / HIVEMIND_CAPTURE) so the =0 test can't leak into later tests. - Replace the dead `pop() ?? ""` branch with a branchless slice for the attribution filename. session-start.ts branch coverage 93.54% (≥90 threshold); skills-active.ts 100%; 107/107 session-start + shared tests green.
|
@coderabbitai review |
✅ Action performedReview finished.
|
Coverage ReportScope: files changed in this PR. Enforced threshold: 90% per metric (per file via
File Coverage — 18 files changed
Generated for commit b4d5d72. |
…rt (coderabbit)
Three CodeRabbit findings on the SkillOpt foundation files:
1. (Major, skillopt-trigger.ts) Hardcoded ~/.deeplake/state/skillopt path
ignored HIVEMIND_STATE_DIR and was captured at import time (a module-level
const grabs the real home before any test isolation). Route the state file
through the shared getStateDir() resolver, computed lazily via a stateFile()
helper so HIVEMIND_STATE_DIR / HOME swaps take effect.
2. (Major, skillopt-trigger.ts) The weekly fire decision (load→shouldFire→
save→spawn) had no cross-process lock, so two SessionStart hooks racing at
the week boundary could both spawn workers (doubling user-side cost once the
worker does real LLM work). Wrap the fire in the repo's atomic
tryAcquireWorkerLock("skillopt-weekly") / releaseWorkerLock — openSync(wx)
lets exactly one process win; the loser bails with reason "locked". The lock
self-heals after maxAgeMs, well under the weekly cadence. Injectable via
FireDeps.tryLock/releaseLock; new tests cover the locked + release branches
without touching a real lock file. Also made saveState atomic (tmp+rename).
3. (Minor, skills-active.ts) listActiveOrgSkills sorted by name only, so two
org skills sharing a name (different authors) could serialize in
filesystem-entry order. Added author as a secondary sort key.
56/56 session-start + shared tests green; build emits the worker bundle.
|
@coderabbitai review |
✅ Action performedReview finished.
|
…ity, state path (codex) Three codex P2 findings on the second review pass, all attribution-data accuracy: 1. (session-start.ts) The skills_active snapshot ran BEFORE autoPullSkills(), so a skill pulled/upgraded during the same SessionStart was recorded as stale/absent while the session could already use it. Moved autoPullSkills() above the attribution snapshot (latency unchanged — it was already awaited in the same hook) so the row reflects post-pull state. 2. (skills-active.ts) listActiveOrgSkills identified org skills by splitting the `<name>--<author>` dirname — but that pattern is a legitimate local naming style (e.g. deploy--blue-green), so it fabricated authors and wrote false positives into every attribution row. Now consults the pull manifest (loadManifest) — the authoritative record of what skillify pull installed — for identity; only manifest-recorded dirs present on disk are counted. This is the same reason unpull uses the manifest instead of the dirname. 3. (skillopt-trigger.ts) skillopt state.json sat directly in getStateDir(), which `hivemind skillify status` enumerates as a tracked project for every top-level .json (minus a small allowlist), surfacing a bogus project with undefined fields. Moved it into a getStateDir()/skillopt/ subdir — still HIVEMIND_STATE_DIR-aware, but a subdir isn't matched by the .json filter. Tests: manifest-based listActiveOrgSkills cases incl. an explicit false-positive guard (local `--` dir absent from the manifest is excluded). 109/109 session-start + shared tests green; session-start.ts branch 93.5%.
|
@coderabbitai review |
✅ Action performedReview finished.
|
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/skillify/skills-active.ts (1)
135-152:⚠️ Potential issue | 🟠 Major | ⚡ Quick winUse one UUID for both the row and the embedded message id.
Line 137 and Line 151 generate different UUIDs for the same logical
skills_activerecord. That makes every stored row internally inconsistent and will break any consumer that correlates the tableidwithmessage.id.Suggested fix
export function buildSkillsActiveInsert(a: SkillsActiveInsertArgs): string { + const id = crypto.randomUUID(); const entry = { - id: crypto.randomUUID(), + id, session_id: a.sessionId, cwd: a.cwd, hook_event_name: "SessionStart", timestamp: a.ts, type: "skills_active", @@ return ( `INSERT INTO "${a.sessionsTable}" (id, path, filename, message, message_embedding, author, size_bytes, project, description, agent, plugin_version, creation_date, last_update_date) ` + - `VALUES ('${crypto.randomUUID()}', '${sqlStr(a.sessionPath)}', '${sqlStr(a.filename)}', '${jsonForSql}'::jsonb, NULL, '${sqlStr(a.userName)}', ` + + `VALUES ('${id}', '${sqlStr(a.sessionPath)}', '${sqlStr(a.filename)}', '${jsonForSql}'::jsonb, NULL, '${sqlStr(a.userName)}', ` + `${Buffer.byteLength(line, "utf-8")}, '${sqlStr(a.projectName)}', 'skills_active', 'claude_code', '${sqlStr(a.pluginVersion)}', '${a.ts}', '${a.ts}')` ); }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/skillify/skills-active.ts` around lines 135 - 152, The INSERT uses two different UUIDs: one created in the entry object (entry.id) and a new crypto.randomUUID() in the VALUES clause, causing inconsistent ids between the DB row and the embedded JSON message; fix buildSkillsActiveInsert by generating a single id variable (use the same id when building entry and when composing the VALUES string) and use the JSON-stringified entry (jsonForSql) so the row id and the embedded entry.id are identical (update references around entry, jsonForSql, and the VALUES clause to reuse that single id).
♻️ Duplicate comments (1)
src/skillify/skillopt-trigger.ts (1)
99-116:⚠️ Potential issue | 🟠 Major | ⚡ Quick winRe-check the throttle after acquiring the lock.
loadState()andshouldFire()still run beforetryLock(). A second SessionStart can read the oldlastRun, block until the first process saves and releases, then acquire the lock and spawn anyway. The lock serializes spawns, but it still doesn't make the weekly decision atomic.Suggested fix
export function maybeFireSkillOpt(deps: FireDeps = {}): FireResult { const env = deps.env ?? process.env; // Default ON. Fires weekly for everyone; opt-out via HIVEMIND_SKILLOPT_DISABLED=1. if (env.HIVEMIND_SKILLOPT_DISABLED === "1") return { fired: false, reason: "disabled" }; if (env.HIVEMIND_SKILLOPT_WORKER === "1") return { fired: false, reason: "in-worker" }; // recursion guard const now = deps.now ?? Date.now(); - const state = deps.state ?? loadState(); - if (!shouldFire(state.lastRun, now)) return { fired: false, reason: "throttled" }; // Cross-process arbiter: two SessionStart hooks racing at the weekly boundary // could both pass the throttle and spawn duplicate workers (doubling user-side // cost once the worker does real LLM work). An atomic openSync(wx) worker-lock @@ const acquired = (deps.tryLock ?? (() => tryAcquireWorkerLock(LOCK_KEY)))(); if (!acquired) return { fired: false, reason: "locked" }; try { + const state = deps.state ?? loadState(); + if (!shouldFire(state.lastRun, now)) return { fired: false, reason: "throttled" }; (deps.save ?? saveState)({ ...state, lastRun: new Date(now).toISOString() }); (deps.spawnWorker ?? spawnWorker)(); } finally { (deps.releaseLock ?? (() => releaseWorkerLock(LOCK_KEY)))(); }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/skillify/skillopt-trigger.ts` around lines 99 - 116, After acquiring the worker lock you must re-check the throttle using the latest state to make the decision atomic: call the same state loader (deps.state ?? loadState()) or deps.loadState to fetch current state and run shouldFire(latest.lastRun, now) again after the acquired lock (using deps.tryLock/tryAcquireWorkerLock and LOCK_KEY), and if it returns false release the lock (deps.releaseLock ?? releaseWorkerLock) and return {fired:false, reason:"throttled"}; only if the re-check passes should you call (deps.save ?? saveState) to update lastRun and then (deps.spawnWorker ?? spawnWorker).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@src/skillify/skills-active.ts`:
- Around line 135-152: The INSERT uses two different UUIDs: one created in the
entry object (entry.id) and a new crypto.randomUUID() in the VALUES clause,
causing inconsistent ids between the DB row and the embedded JSON message; fix
buildSkillsActiveInsert by generating a single id variable (use the same id when
building entry and when composing the VALUES string) and use the
JSON-stringified entry (jsonForSql) so the row id and the embedded entry.id are
identical (update references around entry, jsonForSql, and the VALUES clause to
reuse that single id).
---
Duplicate comments:
In `@src/skillify/skillopt-trigger.ts`:
- Around line 99-116: After acquiring the worker lock you must re-check the
throttle using the latest state to make the decision atomic: call the same state
loader (deps.state ?? loadState()) or deps.loadState to fetch current state and
run shouldFire(latest.lastRun, now) again after the acquired lock (using
deps.tryLock/tryAcquireWorkerLock and LOCK_KEY), and if it returns false release
the lock (deps.releaseLock ?? releaseWorkerLock) and return {fired:false,
reason:"throttled"}; only if the re-check passes should you call (deps.save ??
saveState) to update lastRun and then (deps.spawnWorker ?? spawnWorker).
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro Plus
Run ID: ef1ec6f4-4a85-4a8c-8108-fd06127ddcca
📒 Files selected for processing (5)
src/hooks/session-start.tssrc/skillify/skillopt-trigger.tssrc/skillify/skills-active.tstests/shared/skillopt-trigger.test.tstests/shared/skills-active.test.ts
🚧 Files skipped from review as they are similar to previous changes (3)
- tests/shared/skillopt-trigger.test.ts
- tests/shared/skills-active.test.ts
- src/hooks/session-start.ts
|
@coderabbitai review |
✅ Action performedReview finished.
|
…odex) Two codex P2 findings on the third review pass, both follow-ons from the prior round's fixes: 1. (skillopt-trigger.ts) The cross-process worker lock still had a TOCTOU: both racing SessionStarts read `lastRun` BEFORE acquiring the lock, so the loser — after the winner stamped + released — would acquire the now-free lock and act on its stale snapshot, spawning a second worker. Now re-read state INSIDE the lock and re-check shouldFire before stamping/spawning; if a racing process already fired, bail with reason "throttled". Injectable via FireDeps.reload; new test exercises the race (pre-check eligible, in-lock state shows a fresh stamp → no spawn, lock still released). 2. (skills-active.ts / session-start.ts) Attribution only scanned the global ~/.claude/skills root, so org skills installed with `skillify pull --to project` (under <cwd>/.claude/skills) were omitted from the skills_active row. listActiveOrgSkills now takes a list of roots and dedups by skill identity; added skillRootsForCwd(cwd) = [global, <cwd>/.claude/skills], and session-start passes the session cwd. New tests cover multi-root scan + cross-root dedup. 29 shared + 28 session-start-hook tests green; build emits the worker bundle.
|
@coderabbitai review |
✅ Action performedReview finished.
|
Drop the `maybe` prefix: a function whose body is conditional (throttled / locked) should still be named for the action it performs, not its conditionality. `runWeeklySkillOpt` names the action (run the weekly SkillOpt fire on SessionStart); the returned FireResult reports whether it fired. Pure mechanical rename across the trigger, the SessionStart caller, and both test suites (the test double follows as runWeeklySkillOptMock). No behavior change.
|
@CodeRabbit when the weekly check is called? does weekly check mean that we check only the skills older than 7 days or do we check all of them? |
|
Tip For best results, initiate chat on the files or code changes. 🧩 Analysis chain🏁 Script executed: #!/bin/bash
fd -e ts skillopt-trigger src/skillify/ | xargs cat
echo "---"
fd -e ts skills-active src/skillify/ | xargs cat
echo "---"
fd -e ts skillopt-worker src/skillify/ | xargs catLength of output: 16030
1. When is the weekly check triggered?
// Fires if: no lastRun recorded, OR elapsed >= 7 days
return nowMs - last >= WEEK_MS; // WEEK_MS = 7 * 24 * 60 * 60 * 1000The last-run timestamp is stamped in There is also a cross-process lock ( 2. "Weekly check" = skills older than 7 days, or all of them?All of them. The 7-day interval governs how often the SkillOpt worker runs, not which skills are checked. When the worker loop is fully implemented (it is currently a scaffold that only logs and exits), it would evaluate all installed org-shared skills for deficiency on each run. 3. Where is the skills list saved?Two separate places:
4. Are skills not created by hivemind modified?No. const meta = pulled.get(e.name);
if (!meta) continue; // not pull-managed (local-only / unrelated) → excludedFurthermore, the current worker body does not modify anything at all — it is a scaffold that just logs 🧠 Learnings used |
A detached hook worker may not have claude on PATH (e.g. ~/.claude/local/claude);
spawn("claude") would ENOENT and the callers swallow it as no-change, so the weekly
worker silently produced no proposals in those layouts. Resolve the binary the same
way the rest of skillify does (findAgentBin('claude_code')).
runWeeklySkillOpt stamped lastRun before the worker checks creds, so firing at a logged-out SessionStart consumed the week — a user who logged in minutes later was skipped as throttled until next week. Added a canFire gate (default: creds present) that returns reason 'no-creds' before any stamp/spawn. Test covers it.
… (codex P2) Two follow-ons from round 6: - claude-model: --tools "" denies tool USE but claude -p still LOADS the user's MCP config; a broken/oversized MCP schema fails every judge/proposer call before JSON. Added --strict-mcp-config to ignore user MCP entirely. - trigger: the canFire gate used the credentials FILE, but the worker uses loadConfig() (which also accepts HIVEMIND_TOKEN/ORG_ID env creds). Mismatch skipped env-cred users forever / stamped on a token-only malformed file. Now both use loadConfig()?.token.
The weekly worker's lookback + detector thresholds are now overridable via env —
HIVEMIND_SKILLOPT_{LOOKBACK_DAYS,MIN_INVOCATIONS,FAILURE_RATE,FIRE_THRESHOLD} — with
the same defaults (30d, detector min-n, 0.4 rate, >=5 fire). A positive override
wins; non-numeric/<=0 falls back. Useful for tuning + smaller-data testing.
feat(skillopt): deficiency-detection + edit-proposer engine (stacked on #230)
…ocal disk
Detection is org-wide (live sessions query) but the cycle read the skill body
from local disk (readSkillBodyViaManifest), so a deficient org skill the worker's
machine doesn't have installed could never be improved — detection scope did not
match improvement scope.
Add readSkillBodyFromOrgTable: SELECT body, version FROM skills WHERE name/author
ORDER BY version DESC LIMIT 1 (latest wins, mirroring pull). The table's body
column is already frontmatter-less, which is exactly what the proposer consumes.
The cycle's readSkillBody dep is now async returning { body, version }; baseVersion
is threaded into the proposal so the upcoming publish step can land the edit as
version+1. The worker now reads from Deeplake; the local-disk manifest path is gone.
Tests: org-table reader (latest-version read + null/empty-body branches); existing
cycle tests updated to the new reader shape.
… in the org table The optimization loop is Deeplake-centric: read the body from the skills table (prior commit), improve it, and now publish the result straight back as a new version. No approval gate by design — detect → improve → publish. skill-org-publish.ts: - readCurrentSkillRow: fetch the FULL latest-version row so a republish preserves every field it isn't changing (parses the JSON contributors/source_sessions). - publishImprovedSkill: INSERT version+1 via insertSkillRow (append-only), with scope promoted to `team`, the triggering user + a `skillopt` marker appended to contributors (original author kept first, deduped), name/author unchanged. The skill keeps its original --author identity; the loop is recorded as a contributor, mirroring cross-author MERGE provenance (#118/#125). Concurrency: caller reads `current` fresh right before publish; a racing writer at worst yields two rows at one version — both preserved, readers take the latest. Tests: latest-version read + JSON parsing + legacy/empty-column tolerance; INSERT asserts version+1, scope=team, new body present / old body absent, contributor append+dedup, and the [author]-seed for legacy rows with no contributors.
…oposal-writing) The cycle's terminal step is now a DIRECT publish, not a review proposal. Rename the `writeProposal` dep to `publish` and await it; the worker's publish reads the full current row (readCurrentSkillRow) and lands the improved body as version+1 in the org skills table (scope→team, +collaborator/skillopt contributors), so teammates re-pull it. Drop writeProposalToDisk and the local proposals dir — there is no review/approval gate by design (detect → improve → publish). Tests: engine cycle updated to the publish dep; full skillopt sweep green (29).
…blish The E2E test surfaced a read-after-write disagreement: the cycle read the body once for the proposer, then the publisher read the row AGAIN — and under Deeplake eventual consistency the two reads could disagree, so the proposer would edit a skill the publisher then reported "not in table", wasting the LLM call and silently skipping the publish. Fix + simplification: readSkill now returns the FULL current row, and the cycle threads that SAME row to publish — one read, no second query, no disagreement. Drop the now-redundant readSkillBodyFromOrgTable (readCurrentSkillRow supersedes it). Tests updated to the readSkill (full row) + publish(current, rec) shape.
… own agent
The judge/proposer defaulted to claude (success-judge haiku, proposer sonnet), so a
codex/hermes/cursor/pi user with no local `claude` got nothing. Add agent-model.ts:
agentModel({agent, role}) dispatches to each agent's CLI in its safest NO-TOOLS mode
(claude `--tools ""`, codex `-s read-only --skip-git-repo-check`, hermes `-z` via
provider, cursor/pi `--print`) — the scorer feeds UNTRUSTED transcript text into the
prompt, so no tool use is allowed. Downstream parsers already extract JSON from
arbitrary text, so each agent only emits JSON in stdout — no per-agent output schema.
Model is per-agent+role with env overrides. detectScorerAgent resolves the host agent
(explicit HIVEMIND_SKILLOPT_AGENT override, then claude/codex env signatures, default
claude_code).
Worker scores on the detected agent. spawn injectable → per-agent argv + parsing
unit-tested with zero real CLI calls.
Tests: agent-model (11) — each agent's safe args, output parsing, env override,
non-zero exit, detection; full skillopt sweep green (57).
…der configurable Two issues the real codex/hermes CLI test surfaced: 1. findAgentBin's static path list misses nvm/volta/fnm installs — codex lives at ~/.nvm/.../bin/codex, so the worker ENOENT'd. The worker now resolves the agent bin via `command -v` (mirrors the wiki spawn helpers) and passes it to agentModel; undefined falls back to findAgentBin. agent-model.ts stays PATH-free for the openclaw bundle; the standalone worker can use `command -v`. 2. hermes provider was hardcoded to openrouter — wrong for a user on AWS Bedrock (or any other provider). Add HIVEMIND_SKILLOPT_<AGENT>_PROVIDER override. Verified: codex scores correctly through agentModel (read-only sandbox → clean JSON). hermes stays real-CLI-unverified (this machine has no hermes model configured). Tests: provider-override case (12 agent-model tests).
…ocument hermes Bedrock model ids hermes verified end-to-end on AWS Bedrock through agentModel (real verdict). The debug surfaced two robustness gaps: - hermes on a dead/legacy Bedrock model exits 0 with EMPTY stdout, swallowing the provider error to a dump file. agentModel now rejects on empty stdout so a misconfigured scorer surfaces loudly; callers already swallow it safely (judge stays conservative, proposer reports a failed call) instead of silently resolving "" into an unparseable verdict / no edits forever. - the hermes default model is openrouter-style and invalid on other providers; documented that Bedrock (etc.) users must set HIVEMIND_SKILLOPT_HERMES_PROVIDER + _MODEL — Bedrock needs an inference-profile id (us.anthropic.claude-haiku-4-5-...). Tests: empty-exit-0 rejection (13 agent-model tests).
…erence-profile id) pi verified end-to-end through agentModel (real verdict). Like hermes, pi's google/ gemini default needs a Google API key; a user on another provider (e.g. AWS Bedrock) must set HIVEMIND_SKILLOPT_PI_PROVIDER + _MODEL. cursor's dispatch flags are verified correct (--print --model --force --output-format text) — it was only auth-blocked. A wrong default now fails loudly via the exit-code / empty-output guards.
… not weekly The event trigger's core: replace the weekly time-throttle with accumulation. recordAnchored folds free-anchor-detected pushbacks into a per-skill count (deduped by sessionId#index so repeated UserPromptSubmit firings don't double-count); when a skill crosses the fire threshold (default 5, HIVEMIND_SKILLOPT_FIRE_COUNT) the worker fires for the org cycle and that skill's count resets. Cheap by design — the increment is a no-LLM pure state update; only the rare crossing spawns the worker. Tests: accumulate-below / fire-and-reset / dedup / independent skills / custom threshold / malformed-skip / tally-restart-after-fire (8).
…org-skill invocations #4b: anchoredOrgSkillsInSession reads one session's turns from Deeplake (capture normalises every agent into the sessions table, so multi-agent for free), locates each org-skill Skill invocation in a single pass, windows it, and runs the FREE heuristic anchor (no LLM) to find which skills the user pushed back on. Each anchored invocation gets a stable key sessionId#ordinal so the counter dedups it across repeated UserPromptSubmit/SessionEnd firings. Exports parseMessage/ParsedMsg from skill-invocations for reuse. Tests: anchored / satisfied / two-invocations-distinct-keys / org-skills-only / no-pre-invocation-misattribution / empty-session-id (6).
#4c-1: runEventTick is the cheap no-LLM step that runs (detached) on UserPromptSubmit/ SessionEnd — scan the session for pushed-back org skills (free anchor) -> fold into the per-skill counter -> report which skills crossed the threshold. The caller runs the paid org cycle only when something fired, so the per-turn path stays free. Replaces the weekly time-throttle: firing is driven by accumulated pushback, not the calendar. Adds atomic load/saveCounterState under the HIVEMIND_STATE_DIR-aware skillopt/counter.json. Tests: no-pushback no-op / record-below-threshold / cross-threshold-across-ticks / empty-session no-op (4); persistence round-trip + torn-file tolerance (2).
…cle only on fire #4c-2: when spawned with HIVEMIND_SKILLOPT_SESSION (the hook path), the worker runs the event tick first — scan that session for pushed-back org skills (free anchor), fold into the per-skill counter — and only proceeds to the paid org cycle when a skill crossed the fire threshold; otherwise it exits cheaply. Under event-firing the engine's >=N-deficient- SKILLS gate is redundant (the counter is the pattern signal), so fireThreshold defaults to 1 in event mode. A bare run (no session) stays a manual invocation with the engine default.
… improve if it failed Redesign to immediate + targeted (not accumulate-then-org-sweep). When an org skill X was invoked and the user reacts, judge X's window — the LLM is the only evaluator now, since a regex can't catch 'you fucked up again' — with the just-submitted reaction appended (it lags Deeplake, so the hook passes it in). If the judge says FAILED, read X's current body, propose a bounded edit, publish v+1 right then. meta-dedup stops re-publishing the same edit when the next reaction re-judges the same window. findLatestInvocation targets the invocation the user is reacting to. Reuses the existing judge / proposer / publisher / multi-agent scorer / windowing / meta. Tests: located-invocation, failed->improve+publish, ok->noop, not-org-skill, not-in-session, skill-absent, dedup (8).
…rg-wide cycle Per the redesign (immediate, K=3, LLM-judge-as-detector): an org skill call opens a K-message judgment window; the user's next reactions fire the worker, which judges that ONE skill and improves it on the first confirmed failure. No regex anchor, no per-skill counter, no org-wide all-skills scan, no session-end wait. - skillopt-trigger: markSkillPending (open a K=3 window on org-skill PreToolUse) + runEventTrigger(session, reaction) (fire per reaction, decrement the budget). Spawns the worker with SESSION/SKILL/REACTION/AGENT env. - skillopt-worker: targeted — reads the env inputs and calls improveSkillIfFailed. - hooks: PreToolUse -> markSkillPending; UserPromptSubmit (capture) -> runEventTrigger with the prompt as the reaction; SessionEnd skillopt call removed (fires on the reaction now). - remove the obsolete org-wide/accumulate machinery: skillopt-engine (runSkillOptCycle), skillopt-counter, skillopt-tick, skillopt-session-scan, deficiency-detector, skill-edit-gate, session-anchor (the dropped regex) + their tests. Reuses judge / proposer / publisher / multi-agent scorer / windowing / meta-dedup. 80 skillopt tests green; tsc clean.
Extract armSkillOptOnSkillUse (PreToolUse → open a window on an ORG Skill tool_use) and reactSkillOpt (UserPromptSubmit → fire the worker to judge the pending skill against the prompt, on the user's agent) into src/hooks/shared/skillopt-hook.ts, so each agent's hook wires SkillOpt with one swallowed call instead of copy-pasting. The base claude PreToolUse + capture hooks now use it. NOTE: per-agent (codex/hermes/cursor) trigger wiring is deliberately NOT done here — those agents don't surface org-skill invocation as a uniform 'Skill' tool_use (codex=command, hermes=terminal, cursor=Shell), so arming needs per-agent skill-invocation detection first. Tests: helper guards (Skill-only, disabled, no-prompt, internal-worker) — 6.
…odex P2) Codex flagged that markSkillPending/runEventTrigger load-modify-overwrite a single shared skillopt/state.json, so two concurrent sessions arming/reacting can clobber each other's pending entry (the later save drops the other session's). Store pending state PER SESSION (one small file each, sessionId sanitised to a safe filename) behind an injectable PendingStore — two sessions now touch different files, so neither can drop the other's window. Tests: a cross-session no-clobber case + the store-injected suite (trigger + hook, 16).
…ex P2) Codex flagged that concurrent workers can each read the same current row and publish version+1, creating duplicate version rows whose visible body is nondeterministic. - The K=3 reactions spawn up to K workers for the SAME skill (dominant case): wrap improveSkillIfFailed in a per-skill tryAcquireWorkerLock so only one reads-and-publishes at a time; the loser skips, and a later worker re-reads the now-improved skill (meta-dedup no-op). Removes the same-machine duplicate-version race. - readCurrentSkillRow now orders by version DESC, created_at DESC — so a cross-machine race that lands equal versions resolves the SAME row deterministically. Append-only history preserves every version; a matching pull-side tie-breaker is a foundation-level follow-up.
…nv, post-publish meta (coderabbit) - skillopt-hook: skip empty/whitespace reactions (don't spend the judgment budget / spawn a worker on a non-reaction). - agent-model: fail fast when a hermes/pi provider is overridden WITHOUT a model — the default model id is provider-specific, so a bare _PROVIDER=bedrock would silently send a wrong id. - agent-model: spawn the child with the injected env (not just global process.env), so a caller-scoped cred/config reaches it. - skillopt-improve: swallow a recordEdit (meta) failure AFTER a successful publish — reporting failure there would drop the dedup marker and invite a re-publish. - agent-model tests: provider override now paired with a model; add fail-fast + env-propagation cases.
The redesign removed the weekly runWeeklySkillOpt call from session-start (SkillOpt is event-driven now via PreToolUse/UserPromptSubmit), but session-start-hook.test.ts still asserted the old fired/swallowed branches and mocked the removed runWeeklySkillOpt export — failing CI's full test run. Remove the two obsolete tests + the orphaned mock.
…by tool_use_id (codex P2) Two data-integrity findings on the event path: - markSkillPending was a syntax-only name--author check, so a LOCAL skill named like an org skill could arm and get the shared org-table row of that name edited. Gate on the pull manifest (authoritative org source) — only arm for actually-pulled org skills. - the worker resolved the 'latest' invocation at query time, so a quick re-invocation of the same skill could make it judge the wrong window. Capture tool_use_id at PreToolUse, thread it arm -> pending -> worker, and findInvocation now pins THAT invocation (falling back to latest if it isn't captured yet). Tests: manifest-gate (local-shadow rejected), tool_use_id stored + passed + pinned-lookup.
…ts (efenocchi) @efenocchi flagged that the SkillOpt env-var string literals were duplicated between the trigger (which writes them onto the spawned worker's env) and the worker (which reads them back), plus the agent-model overrides — a typo in one place would silently break the hand-off. Introduce src/skillify/skillopt-env.ts as the single source of truth: SKILLOPT_ENV constants for the fixed names (DISABLED/WORKER/SESSION/SKILL/REACTION/TOOL_USE_ID/AGENT/JUDGE_WINDOW) and modelEnvNames()/providerEnvName() helpers for the dynamic per-agent <AGENT>_<ROLE>_MODEL / <AGENT>_PROVIDER overrides. All four call sites now reference it; only doc comments keep the literal names (intentional — they document the knobs for users). No behavior change. tsc clean (pre-existing tree-sitter errors only); 47 skillopt unit tests green.
…agent SkillOpt: Deeplake-centric direct-publish + multi-agent scorer + targeted event trigger
| const AGENT_CMD: Record<string, string> = { claude_code: "claude", codex: "codex", cursor: "cursor-agent", hermes: "hermes", pi: "pi" }; | ||
| function resolveAgentBin(agent: string): string | undefined { | ||
| try { | ||
| const p = execSync(`command -v ${AGENT_CMD[agent] ?? agent}`, { encoding: "utf8", stdio: ["ignore", "pipe", "ignore"] }).trim(); |
What this adds
The measurement foundation for SkillOpt (auto-improving org-shared skills), plus the auto-fire wiring.
1. Skill attribution (real, default-on) —
src/skillify/skills-active.ts+ SessionStart wiring.Records, per session, which org-shared skills (
<name>--<author>) were in context + their version + a deterministic A/B bucket, as askills_activerow in the sessions table. This is the prerequisite that makes skill value measurable at all — with-skill vs without, or v1 vs v2. Local-only skills excluded. Opt-out:HIVEMIND_SKILL_ATTRIBUTION=0. Embedding NULL (no daemon round-trip).2. Weekly auto-fire trigger + worker (mechanism real; loop body a scaffold) —
skillopt-trigger.ts,skillopt-worker.ts.At SessionStart, a once-a-week throttle spawns a detached, non-blocking worker that runs the SkillOpt loop using the user's own agent (
claude -p/codex→ no org API key, cost on the user; mirrors the skillify worker). StampslastRunbefore spawning (no hot-loop). Opt-out:HIVEMIND_SKILLOPT_DISABLED=1; recursion guardHIVEMIND_SKILLOPT_WORKER=1.Honest status — what's real vs not
Why merge now
Attribution is the prerequisite for everything downstream and is independently safe/useful (starts collecting the data the detector needs). The auto-fire scaffold establishes the wiring + opt-outs.
Tests
20 unit tests: attribution (shape+count, both branches, negative patterns — won't masquerade as a captured turn, exactly one INSERT, version parsing) and auto-fire (fires weekly/first-run/8d, throttles <7d, stamps-before-spawn, opt-out, recursion guard).
tsc --noEmitclean for the changed files.Not included
The throwaway research spike (kept out of tree); the deficiency detector; the productionized loop body; canary-publish + monitor/auto-revert. Those are follow-ups gated on this landing.
Summary by CodeRabbit