Skip to content

SkillOpt foundation: skill attribution + weekly auto-fire scaffold#230

Merged
kaghni merged 61 commits into
mainfrom
feat/skillopt-foundation
Jun 6, 2026
Merged

SkillOpt foundation: skill attribution + weekly auto-fire scaffold#230
kaghni merged 61 commits into
mainfrom
feat/skillopt-foundation

Conversation

@kaghni
Copy link
Copy Markdown
Collaborator

@kaghni kaghni commented Jun 4, 2026

What this adds

The measurement foundation for SkillOpt (auto-improving org-shared skills), plus the auto-fire wiring.

1. Skill attribution (real, default-on)src/skillify/skills-active.ts + SessionStart wiring.
Records, per session, which org-shared skills (<name>--<author>) were in context + their version + a deterministic A/B bucket, as a skills_active row in the sessions table. This is the prerequisite that makes skill value measurable at all — with-skill vs without, or v1 vs v2. Local-only skills excluded. Opt-out: HIVEMIND_SKILL_ATTRIBUTION=0. Embedding NULL (no daemon round-trip).

2. Weekly auto-fire trigger + worker (mechanism real; loop body a scaffold)skillopt-trigger.ts, skillopt-worker.ts.
At SessionStart, a once-a-week throttle spawns a detached, non-blocking worker that runs the SkillOpt loop using the user's own agent (claude -p/codex → no org API key, cost on the user; mirrors the skillify worker). Stamps lastRun before spawning (no hot-loop). Opt-out: HIVEMIND_SKILLOPT_DISABLED=1; recursion guard HIVEMIND_SKILLOPT_WORKER=1.

Honest status — what's real vs not

  • ✅ Attribution recorder: real, tested, ready to collect data on merge.
  • ✅ Auto-fire trigger/throttle/spawn: real, tested (deterministic).
  • ⚠️ The worker's loop body is a documented scaffold (logs + exits). The loop engine (rollout → optimizer → gate) was prototyped and validated out-of-tree, but wiring it into the worker needs two prerequisites: (a) this attribution deployed → real data for a deficiency detector (the "find a bad skill" trigger), and (b) a rollout sandbox. So on merge, the trigger fires weekly but the worker is a safe no-op until the body lands.

Why merge now

Attribution is the prerequisite for everything downstream and is independently safe/useful (starts collecting the data the detector needs). The auto-fire scaffold establishes the wiring + opt-outs.

Tests

20 unit tests: attribution (shape+count, both branches, negative patterns — won't masquerade as a captured turn, exactly one INSERT, version parsing) and auto-fire (fires weekly/first-run/8d, throttles <7d, stamps-before-spawn, opt-out, recursion guard). tsc --noEmit clean for the changed files.

Not included

The throwaway research spike (kept out of tree); the deficiency detector; the productionized loop body; canary-publish + monitor/auto-revert. Those are follow-ups gated on this landing.

Summary by CodeRabbit

  • New Features
    • Weekly SkillOpt auto-triggers at session start with cross-process locking, non-blocking detached execution, and swallowed spawn failures.
    • Session start now conditionally records active org-shared skills into a separate attribution namespace with deterministic A/B bucketing; attribution is env-gated and won’t fail session start if it errors.
  • Chores
    • Background worker artifact included in the build.
  • Tests
    • Added comprehensive tests for attribution, weekly throttle/lock behavior, opt-out and worker-guard cases.

Foundation for measuring and improving org-shared skills.

- skills-active.ts: record which org-shared skills (+versions) were in context per session at
  SessionStart, with a deterministic A/B bucket. The label that makes skill value measurable
  (with/without skill X, or v1-vs-v2). Org skills only; default-on (opt-out HIVEMIND_SKILL_ATTRIBUTION=0).
- skillopt-trigger.ts + skillopt-worker.ts: weekly, user-side, SessionStart-triggered, detached,
  non-blocking auto-fire of the SkillOpt loop using the user's own agent (no org key). Trigger is
  fully tested; the worker's loop body (detect->optimize->gate->publish) is a documented scaffold
  pending deployed attribution + a rollout sandbox. Opt-out HIVEMIND_SKILLOPT_DISABLED=1.
- session-start.ts: wire both in (non-blocking, swallowed), mirroring the skillify worker pattern.
- tests: 20 (11 attribution shape/branches/negative-patterns + 9 auto-fire throttle/spawn/guards).
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Jun 4, 2026

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds session-start skill attribution (enumerates org skills, computes session bucket, builds a /skills_active JSON payload and INSERT SQL) and a persisted weekly SkillOpt trigger that stamps lastRun and spawns a detached worker; integrates both into SessionStart and includes tests and a build entry.

Changes

SkillOpt Attribution and Trigger

Layer / File(s) Summary
Skills active enumeration and SQL generation
src/skillify/skills-active.ts, tests/shared/skills-active.test.ts
Read org-shared skill directories and SKILL.md versions, compute deterministic session bucket, build a /skills_active/... path, and generate a single INSERT statement embedding a skills_active JSONB payload with skills, skills_count, and ab_bucket.
Weekly SkillOpt trigger, worker, tests, build
src/skillify/skillopt-trigger.ts, src/skillify/skillopt-worker.ts, tests/shared/skillopt-trigger.test.ts, esbuild.config.mjs
Persistent lastRun state under state dir, pure shouldFire predicate (defaults to WEEK_MS), maybeFireSkillOpt() decision API with optional cross-process lock, atomic state save, stamps lastRun before spawning, spawns detached Node worker with recursion guard, minimal worker scaffold, tests covering throttle/fire/disabled/recursion, and build hook to bundle the worker.
Session-start hook integration and tests
src/hooks/session-start.ts, tests/claude-code/session-start-hook.test.ts
Moves auto-pull earlier; when capture placeholder is created and HIVEMIND_SKILL_ATTRIBUTION !== "0", lists active skills, builds session path and INSERT SQL, runs api.query and logs/swallow errors; separately calls maybeFireSkillOpt() non-blocking and logs the fire/skip decision while swallowing errors.

Sequence Diagram(s)

sequenceDiagram
  participant SessionStart
  participant Skillify
  participant FileSystem
  participant DB
  participant WorkerSpawner
  SessionStart->>Skillify: listActiveOrgSkills(sessionId)
  Skillify->>FileSystem: read SKILL.md, manifest
  Skillify->>DB: buildSkillsActiveInsert(...) -> INSERT
  SessionStart->>Skillify: maybeFireSkillOpt()
  Skillify->>FileSystem: loadState() / saveState()
  Skillify->>WorkerSpawner: spawn detached skillopt-worker.js (if fired)
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~40 minutes

Possibly related PRs

  • activeloopai/hivemind#112: Modifies SessionStart hook wiring around skill auto-pull; related to auto-pull placement and skill attribution changes.

Suggested reviewers

  • efenocchi

Poem

I count the skills by moonlit track,
I stamp each session, then trot back,
Once a week I nudge a worker to wake,
It hums, it logs, then quiets for my sake,
🐰✨

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 58.82% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Description check ❓ Inconclusive The description is comprehensive, covering objectives, implementation status, testing, and honest caveats. However, the Version Bump section of the template is not addressed—no version bump is indicated. Explicitly state whether a version bump is needed (add version bump to package.json if this is a release) or confirm no release is required for this change.
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely summarizes the main changes: skill attribution infrastructure plus the weekly auto-fire trigger scaffold, which are the two core components of this PR.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/skillopt-foundation

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot requested a review from efenocchi June 4, 2026 08:15
@kaghni
Copy link
Copy Markdown
Collaborator Author

kaghni commented Jun 4, 2026

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Jun 4, 2026

✅ Action performed

Review finished.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/skillify/skillopt-trigger.ts`:
- Around line 68-79: maybeFireSkillOpt currently does loadState → shouldFire →
saveState → spawnWorker without any cross-process locking which allows races;
wrap the critical section (loadState, shouldFire check, saveState, spawnWorker)
in a cross-process lock to make the weekly decision atomic. Add dependency hooks
(e.g., deps.lockAcquire/deps.lockRelease or reuse an existing locker) and
acquire the lock before calling loadState, re-check shouldFire after acquiring
the lock (to handle races), perform saveState and spawnWorker only if still
allowed, then release the lock; if the lock cannot be acquired within a short
timeout, return { fired: false, reason: "locked" } to avoid duplicate spawns.
Ensure the function references are maybeFireSkillOpt, loadState, shouldFire,
saveState, and spawnWorker so reviewers can find the change.
- Around line 22-23: STATE_FILE currently hardcodes the state path using
os.homedir(), which ignores HIVEMIND_STATE_DIR; change this to derive the state
directory from the shared resolver by calling getStateDir() and use
path.join(getStateDir(), "skillopt", "state.json") for STATE_FILE, and
add/import getStateDir from the shared state resolver module so the code honors
HIVEMIND_STATE_DIR (including empty/whitespace semantics) for test/custom
setups.

In `@src/skillify/skills-active.ts`:
- Line 62: The current sort on "out" uses only a.name.localeCompare(b.name),
causing nondeterministic order when names tie; update the comparator in the sort
call that returns out to use a secondary key (e.g., compare a.author (or
a.namespace/id) when a.name === b.name) so ties are resolved deterministically
and attribution payloads serialize consistently; locate the sort expression
"out.sort((a, b) => a.name.localeCompare(b.name))" and replace it with a
comparator that first compares name, then author (or another unique field) as
the tiebreaker.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: d43c7c33-2615-4c68-8f6a-306535177eb8

📥 Commits

Reviewing files that changed from the base of the PR and between 6d63135 and 6488026.

📒 Files selected for processing (6)
  • src/hooks/session-start.ts
  • src/skillify/skillopt-trigger.ts
  • src/skillify/skillopt-worker.ts
  • src/skillify/skills-active.ts
  • tests/shared/skillopt-trigger.test.ts
  • tests/shared/skills-active.test.ts

Comment thread src/skillify/skillopt-trigger.ts Outdated
Comment thread src/skillify/skillopt-trigger.ts Outdated
Comment thread src/skillify/skills-active.ts Outdated
kaghni added 2 commits June 4, 2026 08:33
codex flagged that the SessionStart trigger spawns skillopt-worker.js by
resolving it relative to its own bundle dir (import.meta.url), but the worker
was never added to esbuild.config.mjs. In a packaged install that path
(claude-code/bundle/skillopt-worker.js) would not exist, so the weekly fire
would stamp lastRun, launch node on a missing file, exit MODULE_NOT_FOUND, and
suppress retries for a week.

Add the worker as a Claude Code bundle entry (only the CC session-start hook
calls maybeFireSkillOpt, so only this bundle needs it). Verified: build now
emits 15 CC bundles incl. claude-code/bundle/skillopt-worker.js, and the
bundled worker runs to exit 0 under the recursion-guard env.
codex flagged that the skills_active attribution row was inserted under the
canonical /sessions/ path (buildSessionPath). The summary / raw-transcript
readers select WHERE path LIKE '/sessions/%', so they would treat the
attribution row as transcript content: for an otherwise-empty session,
SessionEnd would summarize only this JSON instead of skipping; for a normal
session it could pollute the summary with the active-skill list.

Add buildSkillsActivePath() which namespaces attribution rows under
/skills_active/ (never /sessions/ or /summaries/), and switch session-start.ts
to it. Measurement queries select by message content (type = "skills_active"),
not by path, so the distinct prefix is free. New unit test asserts the path is
not under /sessions/; 22/22 shared tests green.
@kaghni
Copy link
Copy Markdown
Collaborator Author

kaghni commented Jun 4, 2026

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Jun 4, 2026

✅ Action performed

Review finished.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
tests/shared/skills-active.test.ts (1)

94-94: ⚡ Quick win

Align test data with the real session path format.

The sessionPath in the test uses /sessions/..., but the actual code (per session-start.ts:239) passes the result of buildSkillsActivePath(), which returns /skills_active/.... While buildSkillsActiveInsert doesn't validate the path format and the test correctly verifies INSERT generation, using a realistic /skills_active/ path would make the test data match production usage and avoid confusion for future maintainers.

♻️ Proposed fix to use realistic path
-    sessionPath: "/sessions/kamo/kamo_activeloop_hivemind_S1.jsonl",
+    sessionPath: "/skills_active/kamo/kamo_activeloop_default_S1.json",
     filename: "kamo_activeloop_hivemind_S1.jsonl",
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/shared/skills-active.test.ts` at line 94, Update the test data to use
the real session path format returned by buildSkillsActivePath instead of
"/sessions/..."; specifically change the sessionPath string in the test used by
buildSkillsActiveInsert to a "/skills_active/..." style path so the test matches
production usage (reference: buildSkillsActivePath, buildSkillsActiveInsert, and
the sessionPath test value).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@tests/shared/skills-active.test.ts`:
- Line 94: Update the test data to use the real session path format returned by
buildSkillsActivePath instead of "/sessions/..."; specifically change the
sessionPath string in the test used by buildSkillsActiveInsert to a
"/skills_active/..." style path so the test matches production usage (reference:
buildSkillsActivePath, buildSkillsActiveInsert, and the sessionPath test value).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 4b0cd5ad-2cb9-43d4-9405-fce83075cf84

📥 Commits

Reviewing files that changed from the base of the PR and between 6488026 and 1c79a5e.

📒 Files selected for processing (4)
  • esbuild.config.mjs
  • src/hooks/session-start.ts
  • src/skillify/skills-active.ts
  • tests/shared/skills-active.test.ts
🚧 Files skipped from review as they are similar to previous changes (2)
  • src/hooks/session-start.ts
  • src/skillify/skills-active.ts

…nches (CI)

CI "Typecheck and Test" failed: the skill-attribution INSERT added one query
call mid-sequence in session-start.ts, which broke 5 existing
session-start-hook.test.ts assertions (call counts + the positionally-mocked
rules/goals renderer responses shifted by one). The attribution feature's own
commit never updated these tests.

- Update the 5 placeholder-branching tests to account for the attribution
  INSERT at call-index 2, asserting its shape (contains "skills_active"), not
  just the bumped count.
- Mock src/skillify/skillopt-trigger.js at the boundary so maybeFireSkillOpt no
  longer really spawns a detached worker + writes a state file during unit
  tests (a real side effect), and so both ternary branches + the swallow-catch
  are covered.
- Add tests: HIVEMIND_SKILL_ATTRIBUTION=0 skips the write (negative assertion),
  a failed attribution write is swallowed (content-based mock, order-robust),
  the SkillOpt fired branch logs, and a throwing trigger is swallowed.
- Reset HIVEMIND_SKILL_ATTRIBUTION in runHook (matching HIVEMIND_WIKI_WORKER /
  HIVEMIND_CAPTURE) so the =0 test can't leak into later tests.
- Replace the dead `pop() ?? ""` branch with a branchless slice for the
  attribution filename.

session-start.ts branch coverage 93.54% (≥90 threshold); skills-active.ts 100%;
107/107 session-start + shared tests green.
@kaghni
Copy link
Copy Markdown
Collaborator Author

kaghni commented Jun 4, 2026

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Jun 4, 2026

✅ Action performed

Review finished.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 4, 2026

Coverage Report

Scope: files changed in this PR. Enforced threshold: 90% per metric (per file via vitest.config.ts).

Status Category Percentage Covered / Total
🟢 Lines 90.26% (🎯 90%) 751 / 832
🔴 Statements 85.59% (🎯 90%) 885 / 1034
🔴 Functions 80.58% (🎯 90%) 112 / 139
🔴 Branches 81.52% (🎯 90%) 688 / 844
File Coverage — 18 files changed
File Stmts Branches Functions Lines
src/hooks/capture.ts 🟢 97.5% 🔴 83.3% 🟢 100.0% 🟢 100.0%
src/hooks/pre-tool-use.ts 🟢 94.5% 🔴 88.5% 🟢 94.4% 🟢 94.2%
src/hooks/session-end.ts 🟢 97.7% 🟢 90.9% 🟢 100.0% 🟢 100.0%
src/hooks/session-start.ts 🟢 100.0% 🟢 96.3% 🟢 100.0% 🟢 100.0%
src/hooks/shared/skillopt-hook.ts 🟢 100.0% 🟢 100.0% 🟢 100.0% 🟢 100.0%
src/skillify/agent-model.ts 🟢 92.4% 🔴 88.1% 🔴 88.9% 🟢 100.0%
src/skillify/claude-model.ts 🔴 0.0% 🔴 0.0% 🔴 0.0% 🔴 0.0%
src/skillify/skill-edits.ts 🔴 81.3% 🔴 72.2% 🟢 100.0% 🟢 98.2%
src/skillify/skill-invocations.ts 🟢 92.1% 🔴 82.8% 🟢 100.0% 🟢 98.4%
src/skillify/skill-org-publish.ts 🟢 92.3% 🔴 78.1% 🟢 100.0% 🟢 100.0%
src/skillify/skill-proposer.ts 🟢 92.3% 🔴 86.2% 🔴 80.0% 🟢 96.8%
src/skillify/skill-publisher.ts 🟢 100.0% 🟢 100.0% 🟢 100.0% 🟢 100.0%
src/skillify/skillopt-env.ts 🟢 100.0% 🟢 100.0% 🟢 100.0% 🟢 100.0%
src/skillify/skillopt-improve.ts 🟢 95.9% 🔴 86.5% 🟢 100.0% 🟢 100.0%
src/skillify/skillopt-meta.ts 🟢 100.0% 🔴 88.2% 🟢 100.0% 🟢 100.0%
src/skillify/skillopt-trigger.ts 🔴 59.0% 🔴 66.7% 🔴 45.5% 🔴 61.9%
src/skillify/skillopt-worker.ts 🔴 0.0% 🔴 0.0% 🔴 0.0% 🔴 0.0%
src/skillify/success-judge.ts 🟢 96.2% 🔴 83.3% 🟢 100.0% 🟢 100.0%

Generated for commit b4d5d72.

…rt (coderabbit)

Three CodeRabbit findings on the SkillOpt foundation files:

1. (Major, skillopt-trigger.ts) Hardcoded ~/.deeplake/state/skillopt path
   ignored HIVEMIND_STATE_DIR and was captured at import time (a module-level
   const grabs the real home before any test isolation). Route the state file
   through the shared getStateDir() resolver, computed lazily via a stateFile()
   helper so HIVEMIND_STATE_DIR / HOME swaps take effect.

2. (Major, skillopt-trigger.ts) The weekly fire decision (load→shouldFire→
   save→spawn) had no cross-process lock, so two SessionStart hooks racing at
   the week boundary could both spawn workers (doubling user-side cost once the
   worker does real LLM work). Wrap the fire in the repo's atomic
   tryAcquireWorkerLock("skillopt-weekly") / releaseWorkerLock — openSync(wx)
   lets exactly one process win; the loser bails with reason "locked". The lock
   self-heals after maxAgeMs, well under the weekly cadence. Injectable via
   FireDeps.tryLock/releaseLock; new tests cover the locked + release branches
   without touching a real lock file. Also made saveState atomic (tmp+rename).

3. (Minor, skills-active.ts) listActiveOrgSkills sorted by name only, so two
   org skills sharing a name (different authors) could serialize in
   filesystem-entry order. Added author as a secondary sort key.

56/56 session-start + shared tests green; build emits the worker bundle.
@kaghni
Copy link
Copy Markdown
Collaborator Author

kaghni commented Jun 4, 2026

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Jun 4, 2026

✅ Action performed

Review finished.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

…ity, state path (codex)

Three codex P2 findings on the second review pass, all attribution-data accuracy:

1. (session-start.ts) The skills_active snapshot ran BEFORE autoPullSkills(), so
   a skill pulled/upgraded during the same SessionStart was recorded as
   stale/absent while the session could already use it. Moved autoPullSkills()
   above the attribution snapshot (latency unchanged — it was already awaited in
   the same hook) so the row reflects post-pull state.

2. (skills-active.ts) listActiveOrgSkills identified org skills by splitting the
   `<name>--<author>` dirname — but that pattern is a legitimate local naming
   style (e.g. deploy--blue-green), so it fabricated authors and wrote false
   positives into every attribution row. Now consults the pull manifest
   (loadManifest) — the authoritative record of what skillify pull installed —
   for identity; only manifest-recorded dirs present on disk are counted. This
   is the same reason unpull uses the manifest instead of the dirname.

3. (skillopt-trigger.ts) skillopt state.json sat directly in getStateDir(),
   which `hivemind skillify status` enumerates as a tracked project for every
   top-level .json (minus a small allowlist), surfacing a bogus project with
   undefined fields. Moved it into a getStateDir()/skillopt/ subdir — still
   HIVEMIND_STATE_DIR-aware, but a subdir isn't matched by the .json filter.

Tests: manifest-based listActiveOrgSkills cases incl. an explicit false-positive
guard (local `--` dir absent from the manifest is excluded). 109/109 session-start
+ shared tests green; session-start.ts branch 93.5%.
@kaghni
Copy link
Copy Markdown
Collaborator Author

kaghni commented Jun 4, 2026

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Jun 4, 2026

✅ Action performed

Review finished.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/skillify/skills-active.ts (1)

135-152: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Use one UUID for both the row and the embedded message id.

Line 137 and Line 151 generate different UUIDs for the same logical skills_active record. That makes every stored row internally inconsistent and will break any consumer that correlates the table id with message.id.

Suggested fix
 export function buildSkillsActiveInsert(a: SkillsActiveInsertArgs): string {
+  const id = crypto.randomUUID();
   const entry = {
-    id: crypto.randomUUID(),
+    id,
     session_id: a.sessionId,
     cwd: a.cwd,
     hook_event_name: "SessionStart",
     timestamp: a.ts,
     type: "skills_active",
@@
   return (
     `INSERT INTO "${a.sessionsTable}" (id, path, filename, message, message_embedding, author, size_bytes, project, description, agent, plugin_version, creation_date, last_update_date) ` +
-    `VALUES ('${crypto.randomUUID()}', '${sqlStr(a.sessionPath)}', '${sqlStr(a.filename)}', '${jsonForSql}'::jsonb, NULL, '${sqlStr(a.userName)}', ` +
+    `VALUES ('${id}', '${sqlStr(a.sessionPath)}', '${sqlStr(a.filename)}', '${jsonForSql}'::jsonb, NULL, '${sqlStr(a.userName)}', ` +
     `${Buffer.byteLength(line, "utf-8")}, '${sqlStr(a.projectName)}', 'skills_active', 'claude_code', '${sqlStr(a.pluginVersion)}', '${a.ts}', '${a.ts}')`
   );
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/skillify/skills-active.ts` around lines 135 - 152, The INSERT uses two
different UUIDs: one created in the entry object (entry.id) and a new
crypto.randomUUID() in the VALUES clause, causing inconsistent ids between the
DB row and the embedded JSON message; fix buildSkillsActiveInsert by generating
a single id variable (use the same id when building entry and when composing the
VALUES string) and use the JSON-stringified entry (jsonForSql) so the row id and
the embedded entry.id are identical (update references around entry, jsonForSql,
and the VALUES clause to reuse that single id).
♻️ Duplicate comments (1)
src/skillify/skillopt-trigger.ts (1)

99-116: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Re-check the throttle after acquiring the lock.

loadState() and shouldFire() still run before tryLock(). A second SessionStart can read the old lastRun, block until the first process saves and releases, then acquire the lock and spawn anyway. The lock serializes spawns, but it still doesn't make the weekly decision atomic.

Suggested fix
 export function maybeFireSkillOpt(deps: FireDeps = {}): FireResult {
   const env = deps.env ?? process.env;
   // Default ON. Fires weekly for everyone; opt-out via HIVEMIND_SKILLOPT_DISABLED=1.
   if (env.HIVEMIND_SKILLOPT_DISABLED === "1") return { fired: false, reason: "disabled" };
   if (env.HIVEMIND_SKILLOPT_WORKER === "1") return { fired: false, reason: "in-worker" }; // recursion guard
   const now = deps.now ?? Date.now();
-  const state = deps.state ?? loadState();
-  if (!shouldFire(state.lastRun, now)) return { fired: false, reason: "throttled" };
 
   // Cross-process arbiter: two SessionStart hooks racing at the weekly boundary
   // could both pass the throttle and spawn duplicate workers (doubling user-side
   // cost once the worker does real LLM work). An atomic openSync(wx) worker-lock
@@
   const acquired = (deps.tryLock ?? (() => tryAcquireWorkerLock(LOCK_KEY)))();
   if (!acquired) return { fired: false, reason: "locked" };
   try {
+    const state = deps.state ?? loadState();
+    if (!shouldFire(state.lastRun, now)) return { fired: false, reason: "throttled" };
     (deps.save ?? saveState)({ ...state, lastRun: new Date(now).toISOString() });
     (deps.spawnWorker ?? spawnWorker)();
   } finally {
     (deps.releaseLock ?? (() => releaseWorkerLock(LOCK_KEY)))();
   }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/skillify/skillopt-trigger.ts` around lines 99 - 116, After acquiring the
worker lock you must re-check the throttle using the latest state to make the
decision atomic: call the same state loader (deps.state ?? loadState()) or
deps.loadState to fetch current state and run shouldFire(latest.lastRun, now)
again after the acquired lock (using deps.tryLock/tryAcquireWorkerLock and
LOCK_KEY), and if it returns false release the lock (deps.releaseLock ??
releaseWorkerLock) and return {fired:false, reason:"throttled"}; only if the
re-check passes should you call (deps.save ?? saveState) to update lastRun and
then (deps.spawnWorker ?? spawnWorker).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@src/skillify/skills-active.ts`:
- Around line 135-152: The INSERT uses two different UUIDs: one created in the
entry object (entry.id) and a new crypto.randomUUID() in the VALUES clause,
causing inconsistent ids between the DB row and the embedded JSON message; fix
buildSkillsActiveInsert by generating a single id variable (use the same id when
building entry and when composing the VALUES string) and use the
JSON-stringified entry (jsonForSql) so the row id and the embedded entry.id are
identical (update references around entry, jsonForSql, and the VALUES clause to
reuse that single id).

---

Duplicate comments:
In `@src/skillify/skillopt-trigger.ts`:
- Around line 99-116: After acquiring the worker lock you must re-check the
throttle using the latest state to make the decision atomic: call the same state
loader (deps.state ?? loadState()) or deps.loadState to fetch current state and
run shouldFire(latest.lastRun, now) again after the acquired lock (using
deps.tryLock/tryAcquireWorkerLock and LOCK_KEY), and if it returns false release
the lock (deps.releaseLock ?? releaseWorkerLock) and return {fired:false,
reason:"throttled"}; only if the re-check passes should you call (deps.save ??
saveState) to update lastRun and then (deps.spawnWorker ?? spawnWorker).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: ef1ec6f4-4a85-4a8c-8108-fd06127ddcca

📥 Commits

Reviewing files that changed from the base of the PR and between 1a8daa9 and ba9a7c3.

📒 Files selected for processing (5)
  • src/hooks/session-start.ts
  • src/skillify/skillopt-trigger.ts
  • src/skillify/skills-active.ts
  • tests/shared/skillopt-trigger.test.ts
  • tests/shared/skills-active.test.ts
🚧 Files skipped from review as they are similar to previous changes (3)
  • tests/shared/skillopt-trigger.test.ts
  • tests/shared/skills-active.test.ts
  • src/hooks/session-start.ts

@kaghni
Copy link
Copy Markdown
Collaborator Author

kaghni commented Jun 4, 2026

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Jun 4, 2026

✅ Action performed

Review finished.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

…odex)

Two codex P2 findings on the third review pass, both follow-ons from the prior
round's fixes:

1. (skillopt-trigger.ts) The cross-process worker lock still had a TOCTOU: both
   racing SessionStarts read `lastRun` BEFORE acquiring the lock, so the loser —
   after the winner stamped + released — would acquire the now-free lock and act
   on its stale snapshot, spawning a second worker. Now re-read state INSIDE the
   lock and re-check shouldFire before stamping/spawning; if a racing process
   already fired, bail with reason "throttled". Injectable via FireDeps.reload;
   new test exercises the race (pre-check eligible, in-lock state shows a fresh
   stamp → no spawn, lock still released).

2. (skills-active.ts / session-start.ts) Attribution only scanned the global
   ~/.claude/skills root, so org skills installed with `skillify pull --to
   project` (under <cwd>/.claude/skills) were omitted from the skills_active row.
   listActiveOrgSkills now takes a list of roots and dedups by skill identity;
   added skillRootsForCwd(cwd) = [global, <cwd>/.claude/skills], and session-start
   passes the session cwd. New tests cover multi-root scan + cross-root dedup.

29 shared + 28 session-start-hook tests green; build emits the worker bundle.
@kaghni
Copy link
Copy Markdown
Collaborator Author

kaghni commented Jun 4, 2026

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Jun 4, 2026

✅ Action performed

Review finished.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Drop the `maybe` prefix: a function whose body is conditional (throttled /
locked) should still be named for the action it performs, not its
conditionality. `runWeeklySkillOpt` names the action (run the weekly SkillOpt
fire on SessionStart); the returned FireResult reports whether it fired. Pure
mechanical rename across the trigger, the SessionStart caller, and both test
suites (the test double follows as runWeeklySkillOptMock). No behavior change.
@efenocchi
Copy link
Copy Markdown
Collaborator

@CodeRabbit when the weekly check is called? does weekly check mean that we check only the skills older than 7 days or do we check all of them?
where do we save the skills generated list? do we modify also the ones not created by hivemind?

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Jun 4, 2026

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
fd -e ts skillopt-trigger src/skillify/ | xargs cat
echo "---"
fd -e ts skills-active src/skillify/ | xargs cat
echo "---"
fd -e ts skillopt-worker src/skillify/ | xargs cat

Length of output: 16030



@efenocchi Great questions — here are precise answers based on the code:


1. When is the weekly check triggered?

runWeeklySkillOpt() is called on every SessionStart (i.e., every time you open a new Claude Code session). However, it does a cheap throttle check first and actually spawns the worker at most once per 7 days. The logic in shouldFire() is:

// Fires if: no lastRun recorded, OR elapsed >= 7 days
return nowMs - last >= WEEK_MS; // WEEK_MS = 7 * 24 * 60 * 60 * 1000

The last-run timestamp is stamped in skillopt/state.json (under your HIVEMIND_STATE_DIR, e.g. ~/.deeplake/skillopt/state.json) before the worker is spawned, so a crashing worker can't re-trigger every session.

There is also a cross-process lock (skillopt-weekly) to prevent two concurrent SessionStart hooks racing at the weekly boundary and spawning duplicate workers.


2. "Weekly check" = skills older than 7 days, or all of them?

All of them. The 7-day interval governs how often the SkillOpt worker runs, not which skills are checked. When the worker loop is fully implemented (it is currently a scaffold that only logs and exits), it would evaluate all installed org-shared skills for deficiency on each run.


3. Where is the skills list saved?

Two separate places:

What Where
Attribution row (which skills were active at session start) DeepLake sessions table, path /skills_active/<user>/<user>_<org>_<workspace>_<sessionId>.json — intentionally not under /sessions/ so transcript readers ignore it
Weekly throttle state (lastRun timestamp) <HIVEMIND_STATE_DIR>/skillopt/state.json on local disk

4. Are skills not created by hivemind modified?

No. listActiveOrgSkills() only includes directories that are present in the pull manifest (i.e., installed via skillify pull). Local-only skills and any other unmanaged directories are explicitly excluded:

const meta = pulled.get(e.name);
if (!meta) continue; // not pull-managed (local-only / unrelated) → excluded

Furthermore, the current worker body does not modify anything at all — it is a scaffold that just logs "loop body not yet enabled" and exits cleanly. The actual optimize → gate → publish logic is prototyped separately and not yet wired in.


🧠 Learnings used
Learnt from: kaghni
Repo: activeloopai/hivemind PR: 97
File: src/hooks/shared/autoupdate.ts:122-146
Timestamp: 2026-05-07T08:28:25.207Z
Learning: In `src/hooks/shared/autoupdate.ts`, the `autoUpdate` function intentionally does NOT include a machine-wide lock/cooldown. An initial draft had a flock-style lock at `~/.deeplake/.autoupdate.lock`, but after a latency-fix refactor to fire-and-forget detached spawn (commit 601aa5c), the lock is released immediately after dispatching the child — defeating the purpose. The proper home for the lock is `src/cli/update.ts:runUpdate()` so the long-running `hivemind update` process owns the lock for its full lifetime. `npm install -g` has its own internal locking that prevents corruption in the meantime, so concurrent invocations are either both no-ops (already at latest) or one wins idempotently. This is tracked as a separate follow-up.

kaghni added 20 commits June 5, 2026 19:12
A detached hook worker may not have claude on PATH (e.g. ~/.claude/local/claude);
spawn("claude") would ENOENT and the callers swallow it as no-change, so the weekly
worker silently produced no proposals in those layouts. Resolve the binary the same
way the rest of skillify does (findAgentBin('claude_code')).
runWeeklySkillOpt stamped lastRun before the worker checks creds, so firing at a
logged-out SessionStart consumed the week — a user who logged in minutes later was
skipped as throttled until next week. Added a canFire gate (default: creds present)
that returns reason 'no-creds' before any stamp/spawn. Test covers it.
… (codex P2)

Two follow-ons from round 6:
- claude-model: --tools "" denies tool USE but claude -p still LOADS the user's MCP
  config; a broken/oversized MCP schema fails every judge/proposer call before JSON.
  Added --strict-mcp-config to ignore user MCP entirely.
- trigger: the canFire gate used the credentials FILE, but the worker uses loadConfig()
  (which also accepts HIVEMIND_TOKEN/ORG_ID env creds). Mismatch skipped env-cred users
  forever / stamped on a token-only malformed file. Now both use loadConfig()?.token.
The weekly worker's lookback + detector thresholds are now overridable via env —
HIVEMIND_SKILLOPT_{LOOKBACK_DAYS,MIN_INVOCATIONS,FAILURE_RATE,FIRE_THRESHOLD} — with
the same defaults (30d, detector min-n, 0.4 rate, >=5 fire). A positive override
wins; non-numeric/<=0 falls back. Useful for tuning + smaller-data testing.
feat(skillopt): deficiency-detection + edit-proposer engine (stacked on #230)
…ocal disk

Detection is org-wide (live sessions query) but the cycle read the skill body
from local disk (readSkillBodyViaManifest), so a deficient org skill the worker's
machine doesn't have installed could never be improved — detection scope did not
match improvement scope.

Add readSkillBodyFromOrgTable: SELECT body, version FROM skills WHERE name/author
ORDER BY version DESC LIMIT 1 (latest wins, mirroring pull). The table's body
column is already frontmatter-less, which is exactly what the proposer consumes.
The cycle's readSkillBody dep is now async returning { body, version }; baseVersion
is threaded into the proposal so the upcoming publish step can land the edit as
version+1. The worker now reads from Deeplake; the local-disk manifest path is gone.

Tests: org-table reader (latest-version read + null/empty-body branches); existing
cycle tests updated to the new reader shape.
… in the org table

The optimization loop is Deeplake-centric: read the body from the skills table
(prior commit), improve it, and now publish the result straight back as a new
version. No approval gate by design — detect → improve → publish.

skill-org-publish.ts:
- readCurrentSkillRow: fetch the FULL latest-version row so a republish preserves
  every field it isn't changing (parses the JSON contributors/source_sessions).
- publishImprovedSkill: INSERT version+1 via insertSkillRow (append-only), with
  scope promoted to `team`, the triggering user + a `skillopt` marker appended to
  contributors (original author kept first, deduped), name/author unchanged. The
  skill keeps its original --author identity; the loop is recorded as a
  contributor, mirroring cross-author MERGE provenance (#118/#125).

Concurrency: caller reads `current` fresh right before publish; a racing writer at
worst yields two rows at one version — both preserved, readers take the latest.

Tests: latest-version read + JSON parsing + legacy/empty-column tolerance; INSERT
asserts version+1, scope=team, new body present / old body absent, contributor
append+dedup, and the [author]-seed for legacy rows with no contributors.
…oposal-writing)

The cycle's terminal step is now a DIRECT publish, not a review proposal. Rename
the `writeProposal` dep to `publish` and await it; the worker's publish reads the
full current row (readCurrentSkillRow) and lands the improved body as version+1
in the org skills table (scope→team, +collaborator/skillopt contributors), so
teammates re-pull it. Drop writeProposalToDisk and the local proposals dir — there
is no review/approval gate by design (detect → improve → publish).

Tests: engine cycle updated to the publish dep; full skillopt sweep green (29).
…blish

The E2E test surfaced a read-after-write disagreement: the cycle read the body
once for the proposer, then the publisher read the row AGAIN — and under Deeplake
eventual consistency the two reads could disagree, so the proposer would edit a
skill the publisher then reported "not in table", wasting the LLM call and
silently skipping the publish.

Fix + simplification: readSkill now returns the FULL current row, and the cycle
threads that SAME row to publish — one read, no second query, no disagreement.
Drop the now-redundant readSkillBodyFromOrgTable (readCurrentSkillRow supersedes it).

Tests updated to the readSkill (full row) + publish(current, rec) shape.
… own agent

The judge/proposer defaulted to claude (success-judge haiku, proposer sonnet), so a
codex/hermes/cursor/pi user with no local `claude` got nothing. Add agent-model.ts:
agentModel({agent, role}) dispatches to each agent's CLI in its safest NO-TOOLS mode
(claude `--tools ""`, codex `-s read-only --skip-git-repo-check`, hermes `-z` via
provider, cursor/pi `--print`) — the scorer feeds UNTRUSTED transcript text into the
prompt, so no tool use is allowed. Downstream parsers already extract JSON from
arbitrary text, so each agent only emits JSON in stdout — no per-agent output schema.
Model is per-agent+role with env overrides. detectScorerAgent resolves the host agent
(explicit HIVEMIND_SKILLOPT_AGENT override, then claude/codex env signatures, default
claude_code).

Worker scores on the detected agent. spawn injectable → per-agent argv + parsing
unit-tested with zero real CLI calls.

Tests: agent-model (11) — each agent's safe args, output parsing, env override,
non-zero exit, detection; full skillopt sweep green (57).
…der configurable

Two issues the real codex/hermes CLI test surfaced:

1. findAgentBin's static path list misses nvm/volta/fnm installs — codex lives at
   ~/.nvm/.../bin/codex, so the worker ENOENT'd. The worker now resolves the agent
   bin via `command -v` (mirrors the wiki spawn helpers) and passes it to agentModel;
   undefined falls back to findAgentBin. agent-model.ts stays PATH-free for the
   openclaw bundle; the standalone worker can use `command -v`.

2. hermes provider was hardcoded to openrouter — wrong for a user on AWS Bedrock (or
   any other provider). Add HIVEMIND_SKILLOPT_<AGENT>_PROVIDER override.

Verified: codex scores correctly through agentModel (read-only sandbox → clean JSON).
hermes stays real-CLI-unverified (this machine has no hermes model configured).

Tests: provider-override case (12 agent-model tests).
…ocument hermes Bedrock model ids

hermes verified end-to-end on AWS Bedrock through agentModel (real verdict). The
debug surfaced two robustness gaps:

- hermes on a dead/legacy Bedrock model exits 0 with EMPTY stdout, swallowing the
  provider error to a dump file. agentModel now rejects on empty stdout so a
  misconfigured scorer surfaces loudly; callers already swallow it safely (judge
  stays conservative, proposer reports a failed call) instead of silently resolving
  "" into an unparseable verdict / no edits forever.
- the hermes default model is openrouter-style and invalid on other providers;
  documented that Bedrock (etc.) users must set HIVEMIND_SKILLOPT_HERMES_PROVIDER +
  _MODEL — Bedrock needs an inference-profile id (us.anthropic.claude-haiku-4-5-...).

Tests: empty-exit-0 rejection (13 agent-model tests).
…erence-profile id)

pi verified end-to-end through agentModel (real verdict). Like hermes, pi's google/
gemini default needs a Google API key; a user on another provider (e.g. AWS Bedrock)
must set HIVEMIND_SKILLOPT_PI_PROVIDER + _MODEL. cursor's dispatch flags are verified
correct (--print --model --force --output-format text) — it was only auth-blocked.
A wrong default now fails loudly via the exit-code / empty-output guards.
… not weekly

The event trigger's core: replace the weekly time-throttle with accumulation.
recordAnchored folds free-anchor-detected pushbacks into a per-skill count (deduped
by sessionId#index so repeated UserPromptSubmit firings don't double-count); when a
skill crosses the fire threshold (default 5, HIVEMIND_SKILLOPT_FIRE_COUNT) the worker
fires for the org cycle and that skill's count resets. Cheap by design — the increment
is a no-LLM pure state update; only the rare crossing spawns the worker.

Tests: accumulate-below / fire-and-reset / dedup / independent skills / custom
threshold / malformed-skip / tally-restart-after-fire (8).
…org-skill invocations

#4b: anchoredOrgSkillsInSession reads one session's turns from Deeplake (capture
normalises every agent into the sessions table, so multi-agent for free), locates
each org-skill Skill invocation in a single pass, windows it, and runs the FREE
heuristic anchor (no LLM) to find which skills the user pushed back on. Each anchored
invocation gets a stable key sessionId#ordinal so the counter dedups it across
repeated UserPromptSubmit/SessionEnd firings. Exports parseMessage/ParsedMsg from
skill-invocations for reuse.

Tests: anchored / satisfied / two-invocations-distinct-keys / org-skills-only /
no-pre-invocation-misattribution / empty-session-id (6).
#4c-1: runEventTick is the cheap no-LLM step that runs (detached) on UserPromptSubmit/
SessionEnd — scan the session for pushed-back org skills (free anchor) -> fold into the
per-skill counter -> report which skills crossed the threshold. The caller runs the paid
org cycle only when something fired, so the per-turn path stays free. Replaces the weekly
time-throttle: firing is driven by accumulated pushback, not the calendar. Adds atomic
load/saveCounterState under the HIVEMIND_STATE_DIR-aware skillopt/counter.json.

Tests: no-pushback no-op / record-below-threshold / cross-threshold-across-ticks /
empty-session no-op (4); persistence round-trip + torn-file tolerance (2).
…cle only on fire

#4c-2: when spawned with HIVEMIND_SKILLOPT_SESSION (the hook path), the worker runs the
event tick first — scan that session for pushed-back org skills (free anchor), fold into
the per-skill counter — and only proceeds to the paid org cycle when a skill crossed the
fire threshold; otherwise it exits cheaply. Under event-firing the engine's >=N-deficient-
SKILLS gate is redundant (the counter is the pattern signal), so fireThreshold defaults to
1 in event mode. A bare run (no session) stays a manual invocation with the engine default.
… improve if it failed

Redesign to immediate + targeted (not accumulate-then-org-sweep). When an org skill X was
invoked and the user reacts, judge X's window — the LLM is the only evaluator now, since a
regex can't catch 'you fucked up again' — with the just-submitted reaction appended (it
lags Deeplake, so the hook passes it in). If the judge says FAILED, read X's current body,
propose a bounded edit, publish v+1 right then. meta-dedup stops re-publishing the same edit
when the next reaction re-judges the same window. findLatestInvocation targets the
invocation the user is reacting to.

Reuses the existing judge / proposer / publisher / multi-agent scorer / windowing / meta.
Tests: located-invocation, failed->improve+publish, ok->noop, not-org-skill, not-in-session,
skill-absent, dedup (8).
…rg-wide cycle

Per the redesign (immediate, K=3, LLM-judge-as-detector): an org skill call opens a
K-message judgment window; the user's next reactions fire the worker, which judges that
ONE skill and improves it on the first confirmed failure. No regex anchor, no per-skill
counter, no org-wide all-skills scan, no session-end wait.

- skillopt-trigger: markSkillPending (open a K=3 window on org-skill PreToolUse) +
  runEventTrigger(session, reaction) (fire per reaction, decrement the budget). Spawns the
  worker with SESSION/SKILL/REACTION/AGENT env.
- skillopt-worker: targeted — reads the env inputs and calls improveSkillIfFailed.
- hooks: PreToolUse -> markSkillPending; UserPromptSubmit (capture) -> runEventTrigger with
  the prompt as the reaction; SessionEnd skillopt call removed (fires on the reaction now).
- remove the obsolete org-wide/accumulate machinery: skillopt-engine (runSkillOptCycle),
  skillopt-counter, skillopt-tick, skillopt-session-scan, deficiency-detector,
  skill-edit-gate, session-anchor (the dropped regex) + their tests.

Reuses judge / proposer / publisher / multi-agent scorer / windowing / meta-dedup.
80 skillopt tests green; tsc clean.
Extract armSkillOptOnSkillUse (PreToolUse → open a window on an ORG Skill tool_use) and
reactSkillOpt (UserPromptSubmit → fire the worker to judge the pending skill against the
prompt, on the user's agent) into src/hooks/shared/skillopt-hook.ts, so each agent's hook
wires SkillOpt with one swallowed call instead of copy-pasting. The base claude PreToolUse
+ capture hooks now use it.

NOTE: per-agent (codex/hermes/cursor) trigger wiring is deliberately NOT done here — those
agents don't surface org-skill invocation as a uniform 'Skill' tool_use (codex=command,
hermes=terminal, cursor=Shell), so arming needs per-agent skill-invocation detection first.

Tests: helper guards (Skill-only, disabled, no-prompt, internal-worker) — 6.
kaghni added 7 commits June 6, 2026 06:09
…odex P2)

Codex flagged that markSkillPending/runEventTrigger load-modify-overwrite a single shared
skillopt/state.json, so two concurrent sessions arming/reacting can clobber each other's
pending entry (the later save drops the other session's). Store pending state PER SESSION
(one small file each, sessionId sanitised to a safe filename) behind an injectable
PendingStore — two sessions now touch different files, so neither can drop the other's
window.

Tests: a cross-session no-clobber case + the store-injected suite (trigger + hook, 16).
…ex P2)

Codex flagged that concurrent workers can each read the same current row and publish
version+1, creating duplicate version rows whose visible body is nondeterministic.

- The K=3 reactions spawn up to K workers for the SAME skill (dominant case): wrap
  improveSkillIfFailed in a per-skill tryAcquireWorkerLock so only one reads-and-publishes
  at a time; the loser skips, and a later worker re-reads the now-improved skill (meta-dedup
  no-op). Removes the same-machine duplicate-version race.
- readCurrentSkillRow now orders by version DESC, created_at DESC — so a cross-machine race
  that lands equal versions resolves the SAME row deterministically. Append-only history
  preserves every version; a matching pull-side tie-breaker is a foundation-level follow-up.
…nv, post-publish meta (coderabbit)

- skillopt-hook: skip empty/whitespace reactions (don't spend the judgment budget / spawn
  a worker on a non-reaction).
- agent-model: fail fast when a hermes/pi provider is overridden WITHOUT a model — the
  default model id is provider-specific, so a bare _PROVIDER=bedrock would silently send a
  wrong id.
- agent-model: spawn the child with the injected env (not just global process.env), so a
  caller-scoped cred/config reaches it.
- skillopt-improve: swallow a recordEdit (meta) failure AFTER a successful publish —
  reporting failure there would drop the dedup marker and invite a re-publish.
- agent-model tests: provider override now paired with a model; add fail-fast +
  env-propagation cases.
The redesign removed the weekly runWeeklySkillOpt call from session-start (SkillOpt is
event-driven now via PreToolUse/UserPromptSubmit), but session-start-hook.test.ts still
asserted the old fired/swallowed branches and mocked the removed runWeeklySkillOpt export
— failing CI's full test run. Remove the two obsolete tests + the orphaned mock.
…by tool_use_id (codex P2)

Two data-integrity findings on the event path:

- markSkillPending was a syntax-only name--author check, so a LOCAL skill named like an org
  skill could arm and get the shared org-table row of that name edited. Gate on the pull
  manifest (authoritative org source) — only arm for actually-pulled org skills.
- the worker resolved the 'latest' invocation at query time, so a quick re-invocation of the
  same skill could make it judge the wrong window. Capture tool_use_id at PreToolUse, thread
  it arm -> pending -> worker, and findInvocation now pins THAT invocation (falling back to
  latest if it isn't captured yet).

Tests: manifest-gate (local-shadow rejected), tool_use_id stored + passed + pinned-lookup.
…ts (efenocchi)

@efenocchi flagged that the SkillOpt env-var string literals were duplicated between the
trigger (which writes them onto the spawned worker's env) and the worker (which reads them
back), plus the agent-model overrides — a typo in one place would silently break the hand-off.
Introduce src/skillify/skillopt-env.ts as the single source of truth: SKILLOPT_ENV constants
for the fixed names (DISABLED/WORKER/SESSION/SKILL/REACTION/TOOL_USE_ID/AGENT/JUDGE_WINDOW) and
modelEnvNames()/providerEnvName() helpers for the dynamic per-agent <AGENT>_<ROLE>_MODEL /
<AGENT>_PROVIDER overrides. All four call sites now reference it; only doc comments keep the
literal names (intentional — they document the knobs for users). No behavior change.

tsc clean (pre-existing tree-sitter errors only); 47 skillopt unit tests green.
…agent

SkillOpt: Deeplake-centric direct-publish + multi-agent scorer + targeted event trigger
const AGENT_CMD: Record<string, string> = { claude_code: "claude", codex: "codex", cursor: "cursor-agent", hermes: "hermes", pi: "pi" };
function resolveAgentBin(agent: string): string | undefined {
try {
const p = execSync(`command -v ${AGENT_CMD[agent] ?? agent}`, { encoding: "utf8", stdio: ["ignore", "pipe", "ignore"] }).trim();
@kaghni kaghni merged commit 1588795 into main Jun 6, 2026
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants