Skip to content

fix: stop autoupdate from deleting the running session's bundle#72

Merged
efenocchi merged 5 commits intomainfrom
fix/plugin-autoupdate-session-safety
Apr 23, 2026
Merged

fix: stop autoupdate from deleting the running session's bundle#72
efenocchi merged 5 commits intomainfrom
fix/plugin-autoupdate-session-safety

Conversation

@efenocchi
Copy link
Copy Markdown
Collaborator

@efenocchi efenocchi commented Apr 22, 2026

Summary

Fixes Stop hook error: Plugin directory does not exist: /…/.claude/plugins/cache/hivemind/hivemind/<old-version> errors that appeared for every post-SessionStart hook (Stop, PostToolUse, SessionEnd) once an auto-update happened mid-session.

  • Remove the destructive rmSync loop in src/hooks/session-start.ts that explicitly rm -rf'd every non-latest version directory from inside a running session (the actual source of the error — an empirical claude plugin update doesn't wipe adjacent versions on its own).
  • Wrap the claude plugin update execSync in both session-start.ts and session-start-setup.ts with a PID-scoped snapshot → restore-or-cleanup dance, so the live session keeps finding its bundle paths even if Claude's installer wipes the versioned directory under some scope.
  • Add a new SessionEnd hook plugin-cache-gc.ts that garbage-collects older version directories — keeps the current version plus the two next-newest (DEFAULT_KEEP_COUNT = 3, enough headroom for a session pinned through two further updates) and sweeps up stale .keep-<pid> snapshots from crashed SessionStart updates.
  • Add a || true guard to each scope in session-start-setup.ts's update command so it stops spuriously logging autoupdate failed when a scope doesn't have hivemind installed.

Why it happens

At session start the hook registry bakes the resolved CLAUDE_PLUGIN_ROOT (e.g. …/hivemind/0.6.38/) into every hook command. When the sync SessionStart hook runs claude plugin update and then iterates the cache directory removing every non-latest version — the currently-executing version among them — later hooks in the same session (Stop, PostToolUse, SessionEnd) all fail with ENOENT on their bundle path. Cleanup now only happens in SessionEnd, by which time the session is gone.

Architecture

  • src/utils/plugin-cache.ts (new): pure helpers — resolveVersionedPluginDir (dev-layout guard), snapshotPluginDir / restoreOrCleanup (PID-scoped backup/restore around the installer), readCurrentVersionFromManifest, planGc / executeGc (leaves non-semver entries and live-PID snapshots alone, bails if current version isn't on disk).
  • src/hooks/plugin-cache-gc.ts (new): thin runGc(bundleDir, opts) orchestrator wired as an async SessionEnd hook (15s timeout) alongside the existing summary hook — separate hook to keep the concern isolated and to run even when HIVEMIND_CAPTURE=false disables the summary flow.
  • Helpers are guarded so a local --plugin-dir dev run skips all snapshot/GC work and never touches a developer checkout.

Verification

  • Full test suite: 46 files / 967 tests pass under --coverage.
  • Coverage on the four touched files, ≥ 90 on every metric and locked in vitest.config.ts:
    • utils/plugin-cache.ts — stmts 93.45, branch 90.47, funcs 100, lines 96.66
    • hooks/plugin-cache-gc.ts — 100 / 100 / 100 / 100
    • hooks/session-start.ts — 98.82 / 90.47 / 100 / 98.76
    • hooks/session-start-setup.ts — 100 / 100 / 100 / 100
  • Real claude -p --plugin-dir ./claude-code with claude-code/.claude-plugin/plugin.json manually lowered to 0.6.30 (to force the update path) — hook-debug.log shows both autoupdate paths firing with autoupdate snapshot outcome: noop (dev-layout guard correct) and autoupdate succeeded, no more cache cleanup: removed old version lines, and plugin-cache-gc skipping correctly.
  • Subprocess integration test plugin-cache-gc-bundle.integration.test.ts runs the actual shipped plugin-cache-gc.js bundle against a fake HOME — validates keep-current-plus-two, preserves live-PID snapshots, bails without a manifest, and skips dev layouts.
  • Empirical check: a ghost 0.6.40-ghost/ directory survived a no-op claude plugin update — confirms the installer doesn't wipe adjacent versions on its own, so the destructive rmSync loop was responsible for the errors observed in production. (A real version-bump install could still wipe the directory on its own; the snapshot/restore dance handles that case too.)

Known limitation

Accepted tradeoff: a session pinned to version X through three further updates can still lose its bundle. Reference-counting per version would close this; deferred as out of scope. N = 3 gives two updates of headroom, which covers every real-world scenario observed so far.

Test plan

  • CI green (tests + coverage thresholds + PR coverage comment ≥ 90 on the four files).
  • Smoke test in a real install: run any Claude session, trigger an auto-update by bumping the published version beyond the installed one, confirm no Plugin directory does not exist errors in subsequent hooks.
  • Next SessionEnd cleans old versions down to 3 (observe in hook-debug.log: [plugin-cache-gc] gc kept=… deletedVersions=…).

@claude
Copy link
Copy Markdown

claude Bot commented Apr 22, 2026

Claude finished @efenocchi's task in 7m 24s —— View job


Review findings

  • restoreOrCleanup error swallowed silently (src/utils/plugin-cache.ts:96-97) — when renameSync throws after the installer deletes pluginDir, the function returns "noop" (same as the dev-layout path), so callers can't detect that the bundle dir is actually gone and the original ENOENT will resurface in subsequent hooks.
  • Concurrent-session GC gap (src/utils/plugin-cache.ts:154-158) — isPidAlive protects .keep-<pid> snapshots but not version directories themselves; a second session running the same old version has its bundle deleted when the first session exits and GC fires.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 22, 2026

Coverage Report

Scope: files changed in this PR. Enforced threshold: 90% per metric (per file via vitest.config.ts).

Status Category Percentage Covered / Total
🟢 Lines 98.31% (🎯 90%) 232 / 236
🟢 Statements 97.01% (🎯 90%) 260 / 268
🟢 Functions 100.00% (🎯 90%) 24 / 24
🟢 Branches 93.39% (🎯 90%) 113 / 121
File Coverage — 4 files changed
File Stmts Branches Functions Lines
src/hooks/plugin-cache-gc.ts 🟢 100.0% 🟢 100.0% 🟢 100.0% 🟢 100.0%
src/hooks/session-start-setup.ts 🟢 100.0% 🟢 100.0% 🟢 100.0% 🟢 100.0%
src/hooks/session-start.ts 🟢 98.8% 🟢 90.5% 🟢 100.0% 🟢 98.8%
src/utils/plugin-cache.ts 🟢 93.6% 🟢 90.5% 🟢 100.0% 🟢 96.7%

Generated for commit 2c92448.

Comment thread src/utils/plugin-cache.ts Outdated
Comment on lines +96 to +97
} catch {
return "noop";
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When renameSync throws (e.g. EACCES, EXDEV), this catch returns "noop" — the same value as the dev-layout path at line 84 — so the caller cannot distinguish "not a versioned install" from "restore failed and pluginDir is still gone", silently reproducing the original ENOENT bug in all subsequent hooks.

Add a distinct return value or at minimum log before falling through, so the snapshot outcome is observable:

Suggested change
} catch {
return "noop";
} catch (e: any) {
// renameSync failed — pluginDir may still be absent; log and surface
// as a distinct outcome so callers can detect the broken state.
try { process.stderr.write(`[plugin-cache] restore failed: ${e?.message}\n`); } catch { /* ignore */ }
return "noop";
}

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 35d04cb. The catch now returns a distinct "restore-failed" outcome (new RestoreOutcome union) and writes a one-line diagnostic to stderr, so the existing log(\autoupdate snapshot outcome: ${outcome}`)` in both session-start hooks surfaces the real state instead of silently matching the dev-layout "noop" path.

Test rewritten to assert both the new return value and the stderr emission.

Comment thread src/utils/plugin-cache.ts
Comment on lines +154 to +158
if (currentVersion && versions.includes(currentVersion)) {
for (const v of versions) {
if (!keep.has(v)) deleteVersions.push(v);
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isPidAlive only guards .keep-<pid> snapshot directories, not the version directories themselves. If two sessions start on the same old version and Session A exits first, GC deletes that version while Session B is still using it — the original ENOENT reappears for a concurrent session.

Consider checking whether any live .keep-<pid> snapshot references a version directory before deleting it (i.e., if <ver>.keep-<alivePid> exists for a version, don't delete <ver>):

Suggested change
if (currentVersion && versions.includes(currentVersion)) {
for (const v of versions) {
if (!keep.has(v)) deleteVersions.push(v);
}
}
const deleteVersions: string[] = [];
if (currentVersion && versions.includes(currentVersion)) {
const liveSnapshotVersions = new Set(
snapshots.filter(s => { const m = s.match(KEEP_RE); return m && isAlive(Number(m[1])); })
.map(s => s.replace(/\.keep-\d+$/, "")),
);
for (const v of versions) {
if (!keep.has(v) && !liveSnapshotVersions.has(v)) deleteVersions.push(v);
}
}

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Walking through the scenario with concrete numbers: with DEFAULT_KEEP_COUNT = 3, planGc always keeps [current, current-1, current-2]. Two sessions on the same old version — e.g. both on 0.6.40 while current = 0.6.41 — look like:

sorted desc = [0.6.41, 0.6.40, 0.6.39, 0.6.38]
keep         = {0.6.41, 0.6.40, 0.6.39}
deleteVersions = [0.6.38]   ← 0.6.40 is NOT here

So a single update between the two sessions doesn't fail — the previous version is always in the keep-list. The bug only triggers when the pinned version falls off the keep-list entirely, which needs three further updates during the session's lifetime. That case is already called out in the PR description under "Known limitation" as an accepted tradeoff.

The proposed .keep-<alivePid> guard also doesn't cover that case. Snapshots are only created inside the autoupdate branch of a session — see snapshotPluginDir call sites in session-start.ts:166 and session-start-setup.ts:79. A long-running session that stays on its version without updating never creates a snapshot pointing at itself, so liveSnapshotVersions never contains the version that actually needs protecting. The guard's set would be empty for exactly the version at risk.

The real fix would be per-session reference counting (<versionDir>/.sessions/<pid>, checked in planGc). That's ~40–50 lines of helper + plumbing + tests, and rough consensus with the author of this PR was to defer it given how rare the actual failure mode is (a single Claude session staying alive through three plugin releases). Not closing this thread silently — happy to revisit if you have a scenario that still repros under N=3.

@efenocchi efenocchi requested a review from kaghni April 22, 2026 20:45
Introduces a pure-function helper used by later commits to make plugin
auto-updates safe for the session currently running, and to garbage-
collect old versioned-install directories.

Helpers:

- `resolveVersionedPluginDir(bundleDir)` — detects whether we are
  running from a real `~/.claude/plugins/cache/hivemind/hivemind/X.Y.Z/`
  layout. Returns null for a local `--plugin-dir` dev run so later
  callers skip all snapshot/GC side effects on a developer checkout.

- `snapshotPluginDir(pluginDir)` / `restoreOrCleanup(handle)` —
  copy the versioned plugin directory to `<pluginDir>.keep-<pid>`
  before the installer runs, then restore from the snapshot if the
  installer wiped the directory, otherwise remove the snapshot.
  PID-scoped names so concurrent sessions don't stomp each other.

- `readCurrentVersionFromManifest(manifestPath)` — parses the
  `installed_plugins.json` entry for `hivemind@hivemind` and returns
  the semver the installer considers current. Robust against missing
  / malformed / unexpected manifest shapes.

- `planGc` / `executeGc` — plan (never execute) which versioned
  directories to keep vs delete, keyed off the manifest version and a
  `keepCount` cap. Leaves non-semver and live-PID snapshot entries
  alone so the hook never deletes something it doesn't understand.
  `DEFAULT_KEEP_COUNT = 3` — current plus two predecessors, enough
  headroom for a session pinned through two further updates.

Unit tests cover every branch: semver compare, manifest parse failures,
snapshot overwrite + restore + cleanup, GC plan with missing current
version, dev-layout and live-PID guards, and executeGc error
tolerance.
The SessionStart auto-update path was producing errors like

    Stop hook error: Plugin directory does not exist:
    /home/<user>/.claude/plugins/cache/hivemind/hivemind/0.6.38
    (hivemind@hivemind — run /plugin to reinstall)

for every hook that fired after the update completed (Stop, PostToolUse,
SessionEnd). The live Claude session bakes `CLAUDE_PLUGIN_ROOT` into its
hook registry at startup, so once the versioned directory is gone the
remaining hooks in that session all fail with ENOENT.

Root cause: `src/hooks/session-start.ts` explicitly `rm -rf`'d every
non-latest version directory right after running `claude plugin
update`, from inside the session that may itself be executing from one
of those directories. An empirical `claude plugin update` with no-op
latest-version semantics leaves adjacent versions alone — the loop in
this hook was the sole destroyer.

Fix:

- Remove the destructive `readdirSync` + `rmSync` loop from
  session-start.ts entirely. Cleanup is deferred to the SessionEnd GC
  hook (added in a follow-up commit) which is safe across concurrent
  sessions.

- Wrap the `claude plugin update` execSync in both `session-start.ts`
  and `session-start-setup.ts` with `snapshotPluginDir` →
  `restoreOrCleanup`, so if Claude's installer *does* wipe the
  versioned directory (under some scope or future installer change),
  the live session still finds its bundle paths until it exits.
  The snapshot is PID-scoped to avoid collisions when multiple Claude
  sessions trigger an update concurrently.

- Add `|| true` to each scope command in
  `session-start-setup.ts` so a scope that doesn't have hivemind
  installed (most of them, in practice) doesn't make the whole chained
  command exit non-zero and spuriously log `autoupdate failed`. The
  sync hook already had this; the async hook didn't.

- Drop the now-unused `readdirSync` / `rmSync` imports from
  session-start.ts.

Tests: the two session-start-hook tests that previously asserted the
destructive behavior (`rmSync` called on old cache entries,
`readdirSync` error tolerance during cleanup) are rewritten to assert
the new contract — no rmSync against version directories happens
in-session, and the new snapshot-outcome log line is emitted. Full
suite (952 tests) green.

Verified end-to-end with `claude -p --plugin-dir ./claude-code` after
lowering `claude-code/.claude-plugin/plugin.json` to force the update
path: both hooks log `autoupdate snapshot outcome: noop` (correct for
the dev-layout guard) and `autoupdate succeeded`, with no `cache
cleanup: removed old version` lines.
With the destructive rmSync loop removed from session-start, something
still needs to prune old versioned directories so they don't accumulate
forever under ~/.claude/plugins/cache/hivemind/hivemind/. Doing this
cleanup on SessionEnd is safe — the session that just ended is no
longer holding any bundle paths hostage — and gives us a natural spot
to also sweep up stale `.keep-<pid>` snapshots from crashed updates.

Behavior:

- Keeps the current version (from installed_plugins.json) plus the
  next two newest by semver (DEFAULT_KEEP_COUNT = 3). Three gives
  enough headroom that a session pinned to version X through two
  further updates still finds its bundle when the next SessionEnd
  fires.

- Deletes `<pluginDir>.keep-<pid>` snapshots whose PID is no longer
  alive. Live-PID snapshots are left alone so an in-progress update
  in another session isn't corrupted.

- Bails without deleting anything if the manifest is missing,
  malformed, or the current version isn't on disk — we should never
  "clean up" into an unknown state.

- Guarded by `resolveVersionedPluginDir` so a local `--plugin-dir` dev
  run skips silently and never touches a developer checkout.

Registration: added as a second, `async` entry under SessionEnd in
`claude-code/hooks/hooks.json` with a 15s timeout. Runs alongside the
existing summary-spawning hook; keeps the two concerns separate so
`HIVEMIND_CAPTURE=false` environments still get GC even though they
skip the summary hook.

Build: `plugin-cache-gc` added to the esbuild entry list; the shipped
bundle is ~5KB.

Tests:

- `claude-code/tests/plugin-cache-bundles.test.ts` — static guard over
  the shipped JS + hooks.json, so the bundler can't silently drop the
  helpers or forget to wire the hook.

- `claude-code/tests/plugin-cache-gc-bundle.integration.test.ts` —
  spawns the real built `plugin-cache-gc.js` bundle as a node
  subprocess against a fake HOME. Exercises: keep-current-plus-two,
  preserves live-PID snapshots, bails without a manifest, and skips
  the dev layout. This is the closest we get to real session behavior
  without touching the user's actual plugin cache.
Brings the four files touched by this branch up to the 90% coverage
bar enforced by `vitest.config.ts` PR-scoped thresholds:

    utils/plugin-cache.ts          stmts=93.45  branch=90.47  funcs=100   lines=96.66
    hooks/plugin-cache-gc.ts       stmts=100    branch=100    funcs=100   lines=100
    hooks/session-start.ts         stmts=98.82  branch=90.47  funcs=100   lines=98.76
    hooks/session-start-setup.ts   stmts=100    branch=100    funcs=100   lines=100

Changes:

- `plugin-cache-gc.ts`: extracted `runGc(bundleDir, opts)` so tests can
  exercise the orchestration branches directly (default log / default
  manifest path / explicit overrides). The script-mode bootstrap at
  the bottom is guarded by `import.meta.url === pathToFileURL(argv[1])`
  so imports from tests never trigger it, and wrapped in a `c8 ignore`
  block since it's only reachable from a real `node plugin-cache-gc.js`
  invocation (already covered by the bundle integration test).

- `claude-code/tests/plugin-cache.test.ts`: three new tests close the
  remaining branches in the helper module:
    * `restoreOrCleanup` catch branch — `chmodSync` the parent to 0500
      so `renameSync` fails EACCES.
    * `readCurrentVersionFromManifest` return-null-after-loop — a
      manifest where every entry has a non-semver / missing version.
    * `executeGc` catch blocks — `chmodSync` to 0500 so both
      `rmSync` calls for deleteVersions and deleteSnapshots error out,
      asserting the errors are collected without throwing.

- `claude-code/tests/plugin-cache-gc.test.ts` (new): direct unit tests
  for `runGc`. Covers HIVEMIND_WIKI_WORKER guard, dev-layout skip,
  "nothing to gc", delete-older-versions, explicit keepCount override,
  dead-PID vs live-PID snapshot handling, and default opts.

- `claude-code/tests/session-start-setup-branches.test.ts` (new): mocks
  `version-check` and `plugin-cache` so we can hit branches the
  original source-level test can't reach against the real filesystem —
  userName fallback to "unknown", `if (current)` false branch when
  `getInstalledVersion` returns null, the `resolved ? snapshot : null`
  truthy branch that fires when a real versioned install is detected,
  the failure path where `execSync` throws and `restoreOrCleanup` is
  called from the catch, and the outer version-check catch when
  `getLatestVersion` rejects.

- `vitest.config.ts`: locks the four files at 90/90/90/90 so
  regressions surface in the PR coverage comment.

Full suite: 46 files / 967 tests passing under `--coverage`.
…utcome

Addresses the review comment on `plugin-cache.ts:97`. Before this
change, a thrown `renameSync` or `rmSync` inside `restoreOrCleanup`
was silently swallowed and the function returned `"noop"` — the same
value used for the dev-layout path. A genuine failure (pluginDir
wiped by the installer, rename back failed with EACCES/EBUSY) was
therefore indistinguishable from "nothing to do" in the caller's log
line, silently reproducing the original ENOENT bug in every hook
that fired afterwards.

Now:

- Distinct return value `"restore-failed"` (new `RestoreOutcome`
  union) is emitted when the try block throws. The existing caller
  pattern `log(\`autoupdate snapshot outcome: ${outcome}\`)` in both
  session-start.ts and session-start-setup.ts surfaces it without
  further changes.
- An explicit `process.stderr.write` before returning, guarded by an
  inner try/catch so a broken stderr can't mask the original failure.
  Stderr is observable even when the debug log sink is unavailable.

Test: the existing "fs error swallowed" case is rewritten to assert
the new outcome AND that the stderr warning is emitted, using a
`vi.spyOn(process.stderr, "write")` around the chmod-based
rename-fail trick.

All four touched files still ≥ 90 on every metric.
@efenocchi efenocchi force-pushed the fix/plugin-autoupdate-session-safety branch from 35d04cb to caf9709 Compare April 22, 2026 23:32
@efenocchi efenocchi merged commit 78e28c2 into main Apr 23, 2026
3 checks passed
kaghni added a commit that referenced this pull request Apr 23, 2026
Resolved conflict in openclaw/package.json and openclaw/openclaw.plugin.json
by keeping 0.6.47 (our side, already published to ClawHub). Main had
bumped to 0.6.46 via release.yml after PR #72 (plugin-autoupdate
session safety).

Brought in from main:
- plugin-cache helper (src/utils/plugin-cache.ts)
- SessionEnd GC hook (src/hooks/plugin-cache-gc.ts) for CC plugin
- Autoupdate session-safety fixes in src/hooks/session-start{,-setup}.ts
- New tests: plugin-cache-bundles, plugin-cache-gc, plugin-cache-gc-bundle
  integration — raises full suite to 1031/1031 (45 → 50 files).

Nothing in main's changes touches openclaw/; the merge is a pure
dependency bump from our perspective. openclaw/dist/index.js rebuilt
with the merged src/hooks/memory-path-utils.ts; scanner sweep still
reports zero forbidden patterns.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants