fix: stop autoupdate from deleting the running session's bundle#72
fix: stop autoupdate from deleting the running session's bundle#72
Conversation
|
Claude finished @efenocchi's task in 7m 24s —— View job Review findings
|
Coverage ReportScope: files changed in this PR. Enforced threshold: 90% per metric (per file via
File Coverage — 4 files changed
Generated for commit 2c92448. |
| } catch { | ||
| return "noop"; |
There was a problem hiding this comment.
When renameSync throws (e.g. EACCES, EXDEV), this catch returns "noop" — the same value as the dev-layout path at line 84 — so the caller cannot distinguish "not a versioned install" from "restore failed and pluginDir is still gone", silently reproducing the original ENOENT bug in all subsequent hooks.
Add a distinct return value or at minimum log before falling through, so the snapshot outcome is observable:
| } catch { | |
| return "noop"; | |
| } catch (e: any) { | |
| // renameSync failed — pluginDir may still be absent; log and surface | |
| // as a distinct outcome so callers can detect the broken state. | |
| try { process.stderr.write(`[plugin-cache] restore failed: ${e?.message}\n`); } catch { /* ignore */ } | |
| return "noop"; | |
| } |
There was a problem hiding this comment.
Fixed in 35d04cb. The catch now returns a distinct "restore-failed" outcome (new RestoreOutcome union) and writes a one-line diagnostic to stderr, so the existing log(\autoupdate snapshot outcome: ${outcome}`)` in both session-start hooks surfaces the real state instead of silently matching the dev-layout "noop" path.
Test rewritten to assert both the new return value and the stderr emission.
| if (currentVersion && versions.includes(currentVersion)) { | ||
| for (const v of versions) { | ||
| if (!keep.has(v)) deleteVersions.push(v); | ||
| } | ||
| } |
There was a problem hiding this comment.
isPidAlive only guards .keep-<pid> snapshot directories, not the version directories themselves. If two sessions start on the same old version and Session A exits first, GC deletes that version while Session B is still using it — the original ENOENT reappears for a concurrent session.
Consider checking whether any live .keep-<pid> snapshot references a version directory before deleting it (i.e., if <ver>.keep-<alivePid> exists for a version, don't delete <ver>):
| if (currentVersion && versions.includes(currentVersion)) { | |
| for (const v of versions) { | |
| if (!keep.has(v)) deleteVersions.push(v); | |
| } | |
| } | |
| const deleteVersions: string[] = []; | |
| if (currentVersion && versions.includes(currentVersion)) { | |
| const liveSnapshotVersions = new Set( | |
| snapshots.filter(s => { const m = s.match(KEEP_RE); return m && isAlive(Number(m[1])); }) | |
| .map(s => s.replace(/\.keep-\d+$/, "")), | |
| ); | |
| for (const v of versions) { | |
| if (!keep.has(v) && !liveSnapshotVersions.has(v)) deleteVersions.push(v); | |
| } | |
| } |
There was a problem hiding this comment.
Walking through the scenario with concrete numbers: with DEFAULT_KEEP_COUNT = 3, planGc always keeps [current, current-1, current-2]. Two sessions on the same old version — e.g. both on 0.6.40 while current = 0.6.41 — look like:
sorted desc = [0.6.41, 0.6.40, 0.6.39, 0.6.38]
keep = {0.6.41, 0.6.40, 0.6.39}
deleteVersions = [0.6.38] ← 0.6.40 is NOT here
So a single update between the two sessions doesn't fail — the previous version is always in the keep-list. The bug only triggers when the pinned version falls off the keep-list entirely, which needs three further updates during the session's lifetime. That case is already called out in the PR description under "Known limitation" as an accepted tradeoff.
The proposed .keep-<alivePid> guard also doesn't cover that case. Snapshots are only created inside the autoupdate branch of a session — see snapshotPluginDir call sites in session-start.ts:166 and session-start-setup.ts:79. A long-running session that stays on its version without updating never creates a snapshot pointing at itself, so liveSnapshotVersions never contains the version that actually needs protecting. The guard's set would be empty for exactly the version at risk.
The real fix would be per-session reference counting (<versionDir>/.sessions/<pid>, checked in planGc). That's ~40–50 lines of helper + plumbing + tests, and rough consensus with the author of this PR was to defer it given how rare the actual failure mode is (a single Claude session staying alive through three plugin releases). Not closing this thread silently — happy to revisit if you have a scenario that still repros under N=3.
Introduces a pure-function helper used by later commits to make plugin auto-updates safe for the session currently running, and to garbage- collect old versioned-install directories. Helpers: - `resolveVersionedPluginDir(bundleDir)` — detects whether we are running from a real `~/.claude/plugins/cache/hivemind/hivemind/X.Y.Z/` layout. Returns null for a local `--plugin-dir` dev run so later callers skip all snapshot/GC side effects on a developer checkout. - `snapshotPluginDir(pluginDir)` / `restoreOrCleanup(handle)` — copy the versioned plugin directory to `<pluginDir>.keep-<pid>` before the installer runs, then restore from the snapshot if the installer wiped the directory, otherwise remove the snapshot. PID-scoped names so concurrent sessions don't stomp each other. - `readCurrentVersionFromManifest(manifestPath)` — parses the `installed_plugins.json` entry for `hivemind@hivemind` and returns the semver the installer considers current. Robust against missing / malformed / unexpected manifest shapes. - `planGc` / `executeGc` — plan (never execute) which versioned directories to keep vs delete, keyed off the manifest version and a `keepCount` cap. Leaves non-semver and live-PID snapshot entries alone so the hook never deletes something it doesn't understand. `DEFAULT_KEEP_COUNT = 3` — current plus two predecessors, enough headroom for a session pinned through two further updates. Unit tests cover every branch: semver compare, manifest parse failures, snapshot overwrite + restore + cleanup, GC plan with missing current version, dev-layout and live-PID guards, and executeGc error tolerance.
The SessionStart auto-update path was producing errors like
Stop hook error: Plugin directory does not exist:
/home/<user>/.claude/plugins/cache/hivemind/hivemind/0.6.38
(hivemind@hivemind — run /plugin to reinstall)
for every hook that fired after the update completed (Stop, PostToolUse,
SessionEnd). The live Claude session bakes `CLAUDE_PLUGIN_ROOT` into its
hook registry at startup, so once the versioned directory is gone the
remaining hooks in that session all fail with ENOENT.
Root cause: `src/hooks/session-start.ts` explicitly `rm -rf`'d every
non-latest version directory right after running `claude plugin
update`, from inside the session that may itself be executing from one
of those directories. An empirical `claude plugin update` with no-op
latest-version semantics leaves adjacent versions alone — the loop in
this hook was the sole destroyer.
Fix:
- Remove the destructive `readdirSync` + `rmSync` loop from
session-start.ts entirely. Cleanup is deferred to the SessionEnd GC
hook (added in a follow-up commit) which is safe across concurrent
sessions.
- Wrap the `claude plugin update` execSync in both `session-start.ts`
and `session-start-setup.ts` with `snapshotPluginDir` →
`restoreOrCleanup`, so if Claude's installer *does* wipe the
versioned directory (under some scope or future installer change),
the live session still finds its bundle paths until it exits.
The snapshot is PID-scoped to avoid collisions when multiple Claude
sessions trigger an update concurrently.
- Add `|| true` to each scope command in
`session-start-setup.ts` so a scope that doesn't have hivemind
installed (most of them, in practice) doesn't make the whole chained
command exit non-zero and spuriously log `autoupdate failed`. The
sync hook already had this; the async hook didn't.
- Drop the now-unused `readdirSync` / `rmSync` imports from
session-start.ts.
Tests: the two session-start-hook tests that previously asserted the
destructive behavior (`rmSync` called on old cache entries,
`readdirSync` error tolerance during cleanup) are rewritten to assert
the new contract — no rmSync against version directories happens
in-session, and the new snapshot-outcome log line is emitted. Full
suite (952 tests) green.
Verified end-to-end with `claude -p --plugin-dir ./claude-code` after
lowering `claude-code/.claude-plugin/plugin.json` to force the update
path: both hooks log `autoupdate snapshot outcome: noop` (correct for
the dev-layout guard) and `autoupdate succeeded`, with no `cache
cleanup: removed old version` lines.
With the destructive rmSync loop removed from session-start, something still needs to prune old versioned directories so they don't accumulate forever under ~/.claude/plugins/cache/hivemind/hivemind/. Doing this cleanup on SessionEnd is safe — the session that just ended is no longer holding any bundle paths hostage — and gives us a natural spot to also sweep up stale `.keep-<pid>` snapshots from crashed updates. Behavior: - Keeps the current version (from installed_plugins.json) plus the next two newest by semver (DEFAULT_KEEP_COUNT = 3). Three gives enough headroom that a session pinned to version X through two further updates still finds its bundle when the next SessionEnd fires. - Deletes `<pluginDir>.keep-<pid>` snapshots whose PID is no longer alive. Live-PID snapshots are left alone so an in-progress update in another session isn't corrupted. - Bails without deleting anything if the manifest is missing, malformed, or the current version isn't on disk — we should never "clean up" into an unknown state. - Guarded by `resolveVersionedPluginDir` so a local `--plugin-dir` dev run skips silently and never touches a developer checkout. Registration: added as a second, `async` entry under SessionEnd in `claude-code/hooks/hooks.json` with a 15s timeout. Runs alongside the existing summary-spawning hook; keeps the two concerns separate so `HIVEMIND_CAPTURE=false` environments still get GC even though they skip the summary hook. Build: `plugin-cache-gc` added to the esbuild entry list; the shipped bundle is ~5KB. Tests: - `claude-code/tests/plugin-cache-bundles.test.ts` — static guard over the shipped JS + hooks.json, so the bundler can't silently drop the helpers or forget to wire the hook. - `claude-code/tests/plugin-cache-gc-bundle.integration.test.ts` — spawns the real built `plugin-cache-gc.js` bundle as a node subprocess against a fake HOME. Exercises: keep-current-plus-two, preserves live-PID snapshots, bails without a manifest, and skips the dev layout. This is the closest we get to real session behavior without touching the user's actual plugin cache.
Brings the four files touched by this branch up to the 90% coverage
bar enforced by `vitest.config.ts` PR-scoped thresholds:
utils/plugin-cache.ts stmts=93.45 branch=90.47 funcs=100 lines=96.66
hooks/plugin-cache-gc.ts stmts=100 branch=100 funcs=100 lines=100
hooks/session-start.ts stmts=98.82 branch=90.47 funcs=100 lines=98.76
hooks/session-start-setup.ts stmts=100 branch=100 funcs=100 lines=100
Changes:
- `plugin-cache-gc.ts`: extracted `runGc(bundleDir, opts)` so tests can
exercise the orchestration branches directly (default log / default
manifest path / explicit overrides). The script-mode bootstrap at
the bottom is guarded by `import.meta.url === pathToFileURL(argv[1])`
so imports from tests never trigger it, and wrapped in a `c8 ignore`
block since it's only reachable from a real `node plugin-cache-gc.js`
invocation (already covered by the bundle integration test).
- `claude-code/tests/plugin-cache.test.ts`: three new tests close the
remaining branches in the helper module:
* `restoreOrCleanup` catch branch — `chmodSync` the parent to 0500
so `renameSync` fails EACCES.
* `readCurrentVersionFromManifest` return-null-after-loop — a
manifest where every entry has a non-semver / missing version.
* `executeGc` catch blocks — `chmodSync` to 0500 so both
`rmSync` calls for deleteVersions and deleteSnapshots error out,
asserting the errors are collected without throwing.
- `claude-code/tests/plugin-cache-gc.test.ts` (new): direct unit tests
for `runGc`. Covers HIVEMIND_WIKI_WORKER guard, dev-layout skip,
"nothing to gc", delete-older-versions, explicit keepCount override,
dead-PID vs live-PID snapshot handling, and default opts.
- `claude-code/tests/session-start-setup-branches.test.ts` (new): mocks
`version-check` and `plugin-cache` so we can hit branches the
original source-level test can't reach against the real filesystem —
userName fallback to "unknown", `if (current)` false branch when
`getInstalledVersion` returns null, the `resolved ? snapshot : null`
truthy branch that fires when a real versioned install is detected,
the failure path where `execSync` throws and `restoreOrCleanup` is
called from the catch, and the outer version-check catch when
`getLatestVersion` rejects.
- `vitest.config.ts`: locks the four files at 90/90/90/90 so
regressions surface in the PR coverage comment.
Full suite: 46 files / 967 tests passing under `--coverage`.
…utcome
Addresses the review comment on `plugin-cache.ts:97`. Before this
change, a thrown `renameSync` or `rmSync` inside `restoreOrCleanup`
was silently swallowed and the function returned `"noop"` — the same
value used for the dev-layout path. A genuine failure (pluginDir
wiped by the installer, rename back failed with EACCES/EBUSY) was
therefore indistinguishable from "nothing to do" in the caller's log
line, silently reproducing the original ENOENT bug in every hook
that fired afterwards.
Now:
- Distinct return value `"restore-failed"` (new `RestoreOutcome`
union) is emitted when the try block throws. The existing caller
pattern `log(\`autoupdate snapshot outcome: ${outcome}\`)` in both
session-start.ts and session-start-setup.ts surfaces it without
further changes.
- An explicit `process.stderr.write` before returning, guarded by an
inner try/catch so a broken stderr can't mask the original failure.
Stderr is observable even when the debug log sink is unavailable.
Test: the existing "fs error swallowed" case is rewritten to assert
the new outcome AND that the stderr warning is emitted, using a
`vi.spyOn(process.stderr, "write")` around the chmod-based
rename-fail trick.
All four touched files still ≥ 90 on every metric.
35d04cb to
caf9709
Compare
Resolved conflict in openclaw/package.json and openclaw/openclaw.plugin.json by keeping 0.6.47 (our side, already published to ClawHub). Main had bumped to 0.6.46 via release.yml after PR #72 (plugin-autoupdate session safety). Brought in from main: - plugin-cache helper (src/utils/plugin-cache.ts) - SessionEnd GC hook (src/hooks/plugin-cache-gc.ts) for CC plugin - Autoupdate session-safety fixes in src/hooks/session-start{,-setup}.ts - New tests: plugin-cache-bundles, plugin-cache-gc, plugin-cache-gc-bundle integration — raises full suite to 1031/1031 (45 → 50 files). Nothing in main's changes touches openclaw/; the merge is a pure dependency bump from our perspective. openclaw/dist/index.js rebuilt with the merged src/hooks/memory-path-utils.ts; scanner sweep still reports zero forbidden patterns.
Summary
Fixes
Stop hook error: Plugin directory does not exist: /…/.claude/plugins/cache/hivemind/hivemind/<old-version>errors that appeared for every post-SessionStart hook (Stop, PostToolUse, SessionEnd) once an auto-update happened mid-session.rmSyncloop insrc/hooks/session-start.tsthat explicitlyrm -rf'd every non-latest version directory from inside a running session (the actual source of the error — an empiricalclaude plugin updatedoesn't wipe adjacent versions on its own).claude plugin updateexecSync in bothsession-start.tsandsession-start-setup.tswith a PID-scoped snapshot → restore-or-cleanup dance, so the live session keeps finding its bundle paths even if Claude's installer wipes the versioned directory under some scope.SessionEndhookplugin-cache-gc.tsthat garbage-collects older version directories — keeps the current version plus the two next-newest (DEFAULT_KEEP_COUNT = 3, enough headroom for a session pinned through two further updates) and sweeps up stale.keep-<pid>snapshots from crashed SessionStart updates.|| trueguard to each scope insession-start-setup.ts's update command so it stops spuriously loggingautoupdate failedwhen a scope doesn't have hivemind installed.Why it happens
At session start the hook registry bakes the resolved
CLAUDE_PLUGIN_ROOT(e.g.…/hivemind/0.6.38/) into every hook command. When the syncSessionStarthook runsclaude plugin updateand then iterates the cache directory removing every non-latest version — the currently-executing version among them — later hooks in the same session (Stop, PostToolUse, SessionEnd) all fail with ENOENT on their bundle path. Cleanup now only happens inSessionEnd, by which time the session is gone.Architecture
src/utils/plugin-cache.ts(new): pure helpers —resolveVersionedPluginDir(dev-layout guard),snapshotPluginDir/restoreOrCleanup(PID-scoped backup/restore around the installer),readCurrentVersionFromManifest,planGc/executeGc(leaves non-semver entries and live-PID snapshots alone, bails if current version isn't on disk).src/hooks/plugin-cache-gc.ts(new): thinrunGc(bundleDir, opts)orchestrator wired as an async SessionEnd hook (15s timeout) alongside the existing summary hook — separate hook to keep the concern isolated and to run even whenHIVEMIND_CAPTURE=falsedisables the summary flow.--plugin-dirdev run skips all snapshot/GC work and never touches a developer checkout.Verification
--coverage.vitest.config.ts:utils/plugin-cache.ts— stmts 93.45, branch 90.47, funcs 100, lines 96.66hooks/plugin-cache-gc.ts— 100 / 100 / 100 / 100hooks/session-start.ts— 98.82 / 90.47 / 100 / 98.76hooks/session-start-setup.ts— 100 / 100 / 100 / 100claude -p --plugin-dir ./claude-codewithclaude-code/.claude-plugin/plugin.jsonmanually lowered to0.6.30(to force the update path) — hook-debug.log shows both autoupdate paths firing withautoupdate snapshot outcome: noop(dev-layout guard correct) andautoupdate succeeded, no morecache cleanup: removed old versionlines, andplugin-cache-gcskipping correctly.plugin-cache-gc-bundle.integration.test.tsruns the actual shippedplugin-cache-gc.jsbundle against a fakeHOME— validates keep-current-plus-two, preserves live-PID snapshots, bails without a manifest, and skips dev layouts.0.6.40-ghost/directory survived a no-opclaude plugin update— confirms the installer doesn't wipe adjacent versions on its own, so the destructivermSyncloop was responsible for the errors observed in production. (A real version-bump install could still wipe the directory on its own; the snapshot/restore dance handles that case too.)Known limitation
Accepted tradeoff: a session pinned to version X through three further updates can still lose its bundle. Reference-counting per version would close this; deferred as out of scope.
N = 3gives two updates of headroom, which covers every real-world scenario observed so far.Test plan
Plugin directory does not existerrors in subsequent hooks.hook-debug.log: [plugin-cache-gc] gc kept=… deletedVersions=…).