Skip to content

feat(cli): persistent skill cache to skip prpm install on repeat launches#124

Open
khaliqgant wants to merge 4 commits into
mainfrom
feat/persistent-skill-cache
Open

feat(cli): persistent skill cache to skip prpm install on repeat launches#124
khaliqgant wants to merge 4 commits into
mainfrom
feat/persistent-skill-cache

Conversation

@khaliqgant
Copy link
Copy Markdown
Member

@khaliqgant khaliqgant commented May 15, 2026

Summary

Two layers of caching for agentworkforce agent <persona> skill installs, so repeat launches stop re-running npx prpm install / npx skills add.

Layer 1 — persistent source-keyed cache. Content-addressed dir under ~/.agentworkforce/workforce/cache/plugins/<fingerprint>/, keyed by (harness, sorted skill sources, local-file SHA). On a hit the install subprocess is skipped entirely — claude reuses it as --plugin-dir; opencode/codex mirror it into the relayfile mount. Source-string or local-file edits rotate the fingerprint automatically.

Layer 2 — opt-in upstream drift detection. Because a floating ref (@org/skill, github.com/org/repo#x) doesn't change the fingerprint when upstream publishes, the marker (now schema v2) records each remote skill's upstream identity at install time and re-probes it on a TTL:

Source Recorded Probe
prpm resolved version from prpm.lock GET registry.prpm.dev/api/v1/packages/<pkg>latest_version.version
skill.sh GitHub blob SHA of SKILL.md (path from skills-lock.json) GitHub Contents API w/ If-None-Match → 304 when unchanged
local .md covered by the fingerprint's content hash

If now - lastUpstreamCheckAt > interval (default 24h), parallel probes run; any drift downgrades the cache hit to a miss and the reinstall self-heals the marker. Every probe fails open — a network error, timeout, 4xx/5xx, or malformed body is treated as "no drift" so a flaky registry never blocks or slows a launch.

Control surface

Flag / env Effect
(default) cache on; upstream check every 24h
--refresh-skills unconditional reinstall into the cache dir
--no-skill-cache / AGENTWORKFORCE_NO_SKILL_CACHE=1 bypass the cache entirely (per-session install)
--check-upstream force a drift probe this launch (ignore TTL)
--no-check-upstream skip the drift probe this launch
AGENTWORKFORCE_SKILL_CACHE_CHECK_INTERVAL 24h default; 0=always, never/off=disable; accepts ms/s/m/h/d, bare number = hours
--install-in-repo disengages caching (installs land in the repo, by design)

Tests

  • persona-kit: 209 tests (# fail 0) — incl. 12 fingerprint/marker tests, 22 new drift/probe tests (mocked HTTP: prpm & github success, 304, 404/5xx fail-open, timeout, mixed sets, v1→v2 migration, TTL math, marker round-trip).
  • cli: 188 tests (# fail 0) — incl. flag-parse coverage for all four new flags.
  • Full repo: 590/590 passing across 8 packages.

End-to-end (real services, not mocked)

Verified against live prpm.dev + api.github.com:

  • Probe layer: recorded 1.0.0 → detects 1.0.0 → 1.1.3; recorded current → no drift; GitHub stale SHA → drift; GitHub real SHA → 304 "unchanged".
  • Full CLI cycle with a real prpm skill (@agent-relay/choosing-swarm-patterns):
    1. cache miss → real prpm install → marker records resolved 1.1.3
    2. in-TTL relaunch → plain cache hit, no probe (fast path)
    3. --check-upstream + current → "Skills up to date — reusing cache"
    4. tampered marker (1.0.0) + --check-upstream → live-detected drift → reinstall → marker self-heals to 1.1.3
    5. tampered marker + --no-check-upstream → probe skipped, plain cache hit
  • Layer-1 e2e (claude --plugin-dir, opencode mount mirror, --refresh-skills, --no-skill-cache, local-file edit invalidation) all confirmed.

Decision trail

A trail trajectory documenting the design decisions (cache shape, never-auto-invalidate fingerprint, all-harness scope, drift-probe choice, precise blob SHA vs. repo-HEAD, schema-v2 migration) is committed at .trajectories/completed/2026-05/traj_47ulsb0rwbid.{json,md}.

🤖 Generated with Claude Code

…ches

Every `agentworkforce agent <persona>` previously spawned a fresh
`npx prpm install …` per declared skill, even when the skill set hadn't
changed — `npx` resolution + registry lookups + tarball fetches added
several seconds of latency to every launch.

Introduce a content-addressed cache under
`~/.agentworkforce/workforce/cache/plugins/<fingerprint>/` keyed by a
SHA-256 of `(harness, sorted skill sources, local file contents)`. On a
hit, the install subprocess is skipped entirely:
  - claude reuses the cache dir directly as `--plugin-dir <cacheDir>`.
  - opencode / codex mirror the cache contents into the relayfile mount
    before launch (mount-ignored patterns keep this from syncing back to
    the user's repo).

Invalidation is conservative: changing a skill source string (or editing
a local `.md` skill) rotates the fingerprint and produces a fresh cache
entry. To force-refresh past unchanged remote sources, pass
`--refresh-skills`; to bypass the cache entirely, pass `--no-skill-cache`
or set `AGENTWORKFORCE_NO_SKILL_CACHE=1`.

`--install-in-repo` disengages caching — the user explicitly opted into
having installs land in their working tree, so mirroring a cache back
into the repo would silently re-introduce the leakage that flag is
designed to surface.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 15, 2026

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 2a7f1126-e80b-43e3-bc8e-fe8e62a8950a

📥 Commits

Reviewing files that changed from the base of the PR and between 2afd176 and 9ab2ba7.

📒 Files selected for processing (2)
  • packages/cli/src/cli.test.ts
  • packages/cli/src/cli.ts
🚧 Files skipped from review as they are similar to previous changes (1)
  • packages/cli/src/cli.ts

📝 Walkthrough

Walkthrough

Adds a persistent, content-addressed skill-install cache and upstream drift probing; exposes cache/probe APIs via persona-kit; adds CLI flags (--no-skill-cache, --refresh-skills, upstream check controls) and integrates cache-aware install/mirror logic, locking, and TTL-gated upstream re-checks into runInteractive and mount/non-mount flows.

Changes

Persistent skill-cache system

Layer / File(s) Summary
Skill-cache module and comprehensive tests
packages/persona-kit/src/skill-cache.ts, packages/persona-kit/src/skill-cache.test.ts
Core module implements canonical SHA-256 fingerprinting (harness + stable-sorted skills, folding in local .md file hashes), marker persistence (.aw-skill-cache.json, schema v2, v1 read-compatible), marker read/write/update, and isSkillCacheValid. Tests cover fingerprint determinism, order-independence, local-content sensitivity, marker round-trip, schema upgrade, upstream round-trip, and updateSkillCacheMarkerUpstream behavior.
Upstream drift probe module and tests
packages/persona-kit/src/skill-upstream-probe.ts, packages/persona-kit/src/skill-upstream-probe.test.ts
Install-time builders extract prpm/GitHub identities from cache lockfiles; launch-time probe re-checks registry/GitHub blob SHAs (with conditional GET/ETag) in parallel using injectable fetch with timeout and fail-open per-skill. Scheduling helpers parse intervals and decide due checks. Tests validate parsing, due logic, record building, conditional GET behavior, drift detection, and fail-open/timeouts.
Persona-kit barrel exports
packages/persona-kit/src/index.ts
Re-exports skill-cache APIs/types (computeSkillCacheFingerprint, skillCacheRoot, resolveSkillCacheDir, readSkillCacheMarker, writeSkillCacheMarker, isSkillCacheValid, updateSkillCacheMarkerUpstream, types) and upstream-probe APIs/types (buildUpstreamRecordsFromCacheDir, detectSkillUpstreamDrift, parseCheckInterval, isUpstreamCheckDue, types).
CLI imports, helpers, flags, and runInteractive wiring
packages/cli/src/cli.ts
Adds fs.cpSync import and persona-kit cache/probe imports; documents --no-skill-cache and --refresh-skills in agent help; introduces advisory lock + mirror/write helpers; extends runInteractive options; computes fingerprint/cacheDir, checks marker (and optional upstream TTL-based re-check), and on hit mirrors cache or on miss/refresh installs into persistent cache, writes marker, then mirrors. Suppresses installer cleanup when caching is enabled; exports resolveEnvCheckIntervalMs and acquireSkillCacheLock.
CLI flag parsing & lock tests
packages/cli/src/cli.test.ts
Adds parseAgentArgs test(s) verifying --no-skill-cache and --refresh-skills parse into AgentFlags and preserve persona selector positional arguments; adds resolveEnvCheckIntervalMs tests and concurrency tests for acquireSkillCacheLock (acquire/block/release and stale/dead-PID stealing).
Trajectories docs & index
.trajectories/completed/*, .trajectories/index.json
Adds a completed trajectory JSON and markdown documenting the persistent cache and upstream probe design and updates the trajectories index lastUpdated and entries.

Sequence Diagram

sequenceDiagram
  participant User
  participant CLI as runInteractive
  participant PersonaKit as persona-kit
  participant Installer
  participant CacheFS as cache dir (~/.agentworkforce/...)
  User->>CLI: invoke agent (flags)
  CLI->>PersonaKit: computeSkillCacheFingerprint(harness, skills)
  CLI->>PersonaKit: resolveSkillCacheDir(fingerprint)
  CLI->>PersonaKit: readSkillCacheMarker(cacheDir)
  alt marker valid and not refresh
    CLI->>CacheFS: mirrorSkillCacheInto(cacheDir, target)
  else marker missing/stale or refresh requested
    CLI->>Installer: run installer into cacheDir
    Installer->>PersonaKit: writeSkillCacheMarker(cacheDir, marker)
    CLI->>CacheFS: mirrorSkillCacheInto(cacheDir, target)
  end
  alt marker TTL expired & check-upstream enabled
    CLI->>PersonaKit: detectSkillUpstreamDrift(marker)
    PersonaKit-->>CLI: drift result
    CLI->>Installer: possibly reinstall if drifted
  end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Suggested reviewers

  • willwashburn

Poem

🐰 I stitched a cache of skills, neat and small,
Fingerprints hum so installs need not sprawl,
Mirrors copy blossoms when the hashes align,
Or refresh the spring when the flags draw a line,
The workforce naps safe in a cached little hall ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 51.22% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main objective: introducing a persistent skill cache for CLI launches to skip unnecessary installs, which aligns with the core changeset across multiple files.
Description check ✅ Passed The description is highly detailed and directly related to the changeset, explaining both the two-layer caching system (persistent source-keyed cache and upstream drift detection) and the control surface, with test results and end-to-end verification.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/persistent-skill-cache

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 ESLint

If the error stems from missing dependencies, add them to the package.json file. For unrecoverable errors (e.g., due to private dependencies), disable the tool in the CodeRabbit configuration.

ESLint skipped: no ESLint configuration detected in root package.json. To enable, add eslint to devDependencies.


Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@packages/cli/src/cli.test.ts`:
- Around line 142-163: The test relies on parseAgentArgs reading
process.env.AGENTWORKFORCE_NO_SKILL_CACHE which can make it flaky; modify the
test (the 'parseAgentArgs: --no-skill-cache and --refresh-skills set flags'
case) to explicitly isolate that env var by saving the original
process.env.AGENTWORKFORCE_NO_SKILL_CACHE, then unset or set it to a known value
before calling parseAgentArgs and restore the original value after the
assertions; ensure this isolation is applied around each call that expects the
default (false) so parseAgentArgs sees a controlled environment.

In `@packages/cli/src/cli.ts`:
- Around line 1378-1390: When the user requested a refresh, the install
currently runs into the existing cache dir which can preserve
lockfiles/artifacts; update the install flow around the block that checks
install.commandString, deferInstallToMount, and skillCacheHit so that if the
refresh-skills flag is set you delete/clean skillCacheDir (when
skillCacheFingerprint && skillCacheDir are present) before calling
runInstall(install.command, installLabel); apply the same change to the later
analogous install block (the second runInstall/writeSkillCacheMarker block) so
both flows clear the cache dir prior to reinstalling.
- Around line 1299-1303: The cache validity check using skillCacheDir,
skillCacheFingerprint, isSkillCacheValid and options.refreshSkills must be
protected by a per-fingerprint inter-process lock to prevent two processes from
simultaneously installing into the same cache; wrap the
read-check-and-possible-install sequence in an exclusive lock keyed by the
fingerprint (e.g., create/acquire a fingerprint-specific lock file like
`${skillCacheFingerprint}.lock` in the cache dir or use an OS/file-locking
library) before calling isSkillCacheValid and before performing the install path
that writes into the cache, ensure the lock is always released (use try/finally)
and fall back to re-checking isSkillCacheValid after acquiring the lock so
redundant installs are avoided.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 2bc42043-8182-4578-81dd-7e002bdf8268

📥 Commits

Reviewing files that changed from the base of the PR and between 1a9352e and 5232fbd.

📒 Files selected for processing (5)
  • packages/cli/src/cli.test.ts
  • packages/cli/src/cli.ts
  • packages/persona-kit/src/index.ts
  • packages/persona-kit/src/skill-cache.test.ts
  • packages/persona-kit/src/skill-cache.ts

Comment thread packages/cli/src/cli.test.ts
Comment thread packages/cli/src/cli.ts Outdated
Comment thread packages/cli/src/cli.ts
Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 1 potential issue.

View 5 additional findings in Devin Review.

Open in Devin Review

Comment thread packages/cli/src/cli.ts
Comment on lines +1598 to +1614
if (skillCacheHit && skillCacheDir) {
mirrorSkillCacheInto(skillCacheDir, handle.mountDir);
} else if (skillCachingEnabled && skillCacheDir && skillCacheFingerprint) {
mkdirSync(skillCacheDir, { recursive: true });
await runInstallOrThrow(install.command, installLabel, skillCacheDir);
writeSkillCacheMarker(skillCacheDir, {
fingerprint: skillCacheFingerprint,
harness,
skills: effectiveSelection.skills.map((s) => ({
id: s.id,
source: s.source
}))
});
mirrorSkillCacheInto(skillCacheDir, handle.mountDir);
} else {
await runInstallOrThrow(install.command, installLabel, handle.mountDir);
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 --refresh-skills does not remove old cache dir before reinstalling, leaving stale artifacts and risking corrupted cache on failure

When --refresh-skills is used, the code sets skillCacheHit = false at cli.ts:1302 to bypass the cache, but never removes or cleans the existing cache directory before running the install into it. The help text at line 142 says "overwriting any existing cache dir" but the install simply runs on top of the existing contents.

This has two consequences:

  1. Stale artifacts on success: If an upstream skill package changed structure between versions (removed files), those old files persist in the cache dir and get served to the harness via --plugin-dir (claude) or mirrorSkillCacheInto (mount harnesses at cli.ts:1611).
  2. Corrupted cache after partial failure: If the refresh install partially completes then fails (runInstallOrThrow throws at cli.ts:1602), the old cache marker from the previous successful install remains on disk. On the next run without --refresh-skills, isSkillCacheValid at cli.ts:1303 matches the old marker's fingerprint and returns a false cache hit, causing the harness to see a mix of old and partially-updated skill artifacts.
Prompt for agents
The --refresh-skills flag needs to clean the existing cache directory before reinstalling to ensure a truly fresh install. There are two code paths that need this fix:

1. The mount-deferred install path (cli.ts around line 1598-1614): Before the else-if branch that runs runInstallOrThrow into skillCacheDir, the old cache dir should be removed (rmSync with recursive+force) so that stale files dont persist and the old marker is cleared.

2. The claude non-mount install path (cli.ts around line 1378-1391): Similarly, when refreshSkills is true, the cache dir should be cleaned before runInstall runs.

A minimal fix would be to add something like: if (options.refreshSkills && skillCacheDir && existsSync(skillCacheDir)) { rmSync(skillCacheDir, { recursive: true, force: true }); } before the install runs in both paths. This ensures (a) stale files are removed on successful refresh, and (b) the old marker is gone if the refresh fails, so subsequent runs correctly see a cache miss instead of a false hit.
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found across 5 files

Re-trigger cubic

Ricky Schema Cascade and others added 2 commits May 15, 2026 22:14
…che hit

The pure cache-hit path is a near-instant cpSync that doesn't print its own
spinner — but the surrounding setupSpinner.stop()/start() was redrawing the
"Setting up sandbox mount…" line a second time. Only stop/start the setup
spinner around branches that actually run an install spinner of their own.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The source-key fingerprint never rotates on an upstream publish when the
persona's `source` string is unchanged (floating refs like `@org/skill`
or `github.com/org/repo#x`). This adds a cheap, opt-in second check so a
new upstream version is actually consumed without a manual refresh.

Marker schema bumped to v2 (v1 stays read-compatible, upgraded in place
with no upstream records). At install time each remote skill's upstream
identity is recorded from the installer lockfiles:
  - prpm      → resolved `version` from prpm.lock
  - skill.sh  → GitHub blob SHA of the installed SKILL.md, located via
                skills-lock.json `skillPath`

On launch, if the marker's `lastUpstreamCheckAt` is older than the
interval (default 24h), parallel lightweight probes compare recorded vs.
current — prpm registry GET, GitHub Contents API with `If-None-Match`
(304, no body, when unchanged). Any drift downgrades the cache hit to a
miss so the reinstall picks up the new content; the marker then
self-heals. Every probe fails OPEN: a network error, timeout, 4xx/5xx,
or malformed body counts as "no drift" so a flaky registry never blocks
or slows a launch beyond the timeout.

Control surface:
  - `--check-upstream`     force a probe this launch (ignore the TTL)
  - `--no-check-upstream`  skip the probe this launch
  - `AGENTWORKFORCE_SKILL_CACHE_CHECK_INTERVAL` (24h default; `0`=always,
    `never`/`off`=disable; accepts ms/s/m/h/d, bare number = hours)

Tests: 22 new persona-kit unit tests (mocked HTTP — prpm/github
success, 304, 404/5xx fail-open, timeout, mixed sets, v1→v2 migration,
marker round-trip, TTL math) + 1 CLI flag-parse test. Verified
end-to-end against live prpm.dev and api.github.com: install records
the resolved version, in-TTL launches skip probing, a tampered stale
version triggers a live-detected reinstall (1.0.0 → 1.1.3), the marker
self-heals, and --no-check-upstream bypasses.

A decision trajectory for this work is recorded under
.trajectories/completed/2026-05/traj_47ulsb0rwbid.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
packages/persona-kit/src/skill-cache.test.ts (1)

100-124: 💤 Low value

Minor: Test setup could be simplified.

Lines 103-105 attempt to write to a non-existent path, catch the error silently, then create the directory. This works but is confusing. You could simplify by just doing the mkdir with { recursive: true } first.

♻️ Suggested simplification
 test('computeSkillCacheFingerprint: local skill content changes invalidate', async () => {
   await withTmpDir(async (dir) => {
     const skillPath = 'skills/foo.md';
-    await writeFile(join(dir, 'skills', 'foo.md').replace('foo.md', ''), '', 'utf8').catch(
-      () => undefined
-    );
-    // Ensure the dir exists, then write content.
-    await import('node:fs/promises').then((fs) =>
-      fs.mkdir(join(dir, 'skills'), { recursive: true })
-    );
+    const { mkdir } = await import('node:fs/promises');
+    await mkdir(join(dir, 'skills'), { recursive: true });
     await writeFile(join(dir, skillPath), '# original', 'utf8');
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/persona-kit/src/skill-cache.test.ts` around lines 100 - 124, In the
test computeSkillCacheFingerprint: local skill content changes invalidate,
simplify setup by creating the skills directory before writing files: replace
the initial writeFile(...).catch(...) with an explicit await mkdir(join(dir,
'skills'), { recursive: true }) (you already call fs.mkdir later) so that the
skills directory exists prior to writeFile; adjust/remove the redundant
import/second mkdir call and ensure the test still writes join(dir, skillPath)
before computing fingerprints in this test function.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@packages/persona-kit/src/skill-cache.test.ts`:
- Around line 100-124: In the test computeSkillCacheFingerprint: local skill
content changes invalidate, simplify setup by creating the skills directory
before writing files: replace the initial writeFile(...).catch(...) with an
explicit await mkdir(join(dir, 'skills'), { recursive: true }) (you already call
fs.mkdir later) so that the skills directory exists prior to writeFile;
adjust/remove the redundant import/second mkdir call and ensure the test still
writes join(dir, skillPath) before computing fingerprints in this test function.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 73177675-b733-4221-8fb5-cb8a5e00890c

📥 Commits

Reviewing files that changed from the base of the PR and between 09caf5b and 2afd176.

📒 Files selected for processing (10)
  • .trajectories/completed/2026-05/traj_47ulsb0rwbid.json
  • .trajectories/completed/2026-05/traj_47ulsb0rwbid.md
  • .trajectories/index.json
  • packages/cli/src/cli.test.ts
  • packages/cli/src/cli.ts
  • packages/persona-kit/src/index.ts
  • packages/persona-kit/src/skill-cache.test.ts
  • packages/persona-kit/src/skill-cache.ts
  • packages/persona-kit/src/skill-upstream-probe.test.ts
  • packages/persona-kit/src/skill-upstream-probe.ts
✅ Files skipped from review due to trivial changes (1)
  • .trajectories/completed/2026-05/traj_47ulsb0rwbid.json
🚧 Files skipped from review as they are similar to previous changes (2)
  • packages/cli/src/cli.test.ts
  • packages/cli/src/cli.ts

Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3 issues found across 10 files (changes from recent commits).

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name=".trajectories/completed/2026-05/traj_47ulsb0rwbid.json">

<violation number="1" location=".trajectories/completed/2026-05/traj_47ulsb0rwbid.json:132">
P2: Avoid committing a user-specific absolute filesystem path in `projectId`; use a stable, repo-agnostic identifier instead.</violation>
</file>

<file name="packages/persona-kit/src/skill-upstream-probe.ts">

<violation number="1" location="packages/persona-kit/src/skill-upstream-probe.ts:344">
P2: Missing upstream metadata for remote skills is forced to `drifted: true`, so transient install-time probe failures can repeatedly trigger unnecessary reinstalls and negate cache reuse.</violation>
</file>

<file name="packages/cli/src/cli.ts">

<violation number="1" location="packages/cli/src/cli.ts:1406">
P2: `AGENTWORKFORCE_SKILL_CACHE_CHECK_INTERVAL=never` is currently ignored because `null` from `parseCheckInterval` is coalesced to the default interval. This causes upstream drift checks to run when the user explicitly asked to disable them.</violation>
</file>

Tip: Review your code locally with the cubic CLI to iterate faster.
Fix all with cubic
Re-trigger cubic

],
"commits": [],
"filesChanged": [],
"projectId": "/Users/khaliqgant/Projects/AgentWorkforce/workforce",
Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai Bot May 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: Avoid committing a user-specific absolute filesystem path in projectId; use a stable, repo-agnostic identifier instead.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At .trajectories/completed/2026-05/traj_47ulsb0rwbid.json, line 132:

<comment>Avoid committing a user-specific absolute filesystem path in `projectId`; use a stable, repo-agnostic identifier instead.</comment>

<file context>
@@ -0,0 +1,144 @@
+  ],
+  "commits": [],
+  "filesChanged": [],
+  "projectId": "/Users/khaliqgant/Projects/AgentWorkforce/workforce",
+  "tags": [],
+  "_trace": {
</file context>

Tip: Review your code locally with the cubic CLI to iterate faster.

Fix with Cubic

skillId: skill.id,
source: skill.source,
kind: 'prpm',
drifted: true,
Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai Bot May 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: Missing upstream metadata for remote skills is forced to drifted: true, so transient install-time probe failures can repeatedly trigger unnecessary reinstalls and negate cache reuse.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At packages/persona-kit/src/skill-upstream-probe.ts, line 344:

<comment>Missing upstream metadata for remote skills is forced to `drifted: true`, so transient install-time probe failures can repeatedly trigger unnecessary reinstalls and negate cache reuse.</comment>

<file context>
@@ -0,0 +1,464 @@
+          skillId: skill.id,
+          source: skill.source,
+          kind: 'prpm',
+          drifted: true,
+          note: 'no upstream record (pre-v2 marker) — reinstall to capture identity'
+        };
</file context>
Fix with Cubic

Comment thread packages/cli/src/cli.ts Outdated
…-interval

Four reviewer findings (CodeRabbit / Devin / cubic):

1. `AGENTWORKFORCE_SKILL_CACHE_CHECK_INTERVAL=never` was ignored:
   `parseCheckInterval` returns `null` for never/off/false, but the
   `?? DEFAULT` coalesced `null` back to the 24h default, silently
   re-enabling drift checks the user disabled. New
   `resolveEnvCheckIntervalMs()` only falls back on `undefined`.

2. `--refresh-skills` now wipes the cache dir before reinstalling, so a
   refresh is a true rebuild — stale files from a prior skill version no
   longer linger, and a partial-failure reinstall can't leave the old
   marker behind to fake a later cache hit. Applied to both the claude
   pre-mount path and the mount-deferred path.

3. Added a per-fingerprint advisory install lock
   (`<cacheDir>.lock`, sibling so a refresh-wipe can't delete a live
   lock). Two concurrent launches of the same persona no longer both
   miss and install into the same dir. Stale/dead-pid locks are stolen;
   if the wait budget is exceeded we proceed unlocked (never hang a
   launch). After acquiring, validity is re-checked so a peer's
   just-finished install is reused instead of redundantly reinstalled.

4. Test isolation: the parseAgentArgs flag test now saves/clears/
   restores AGENTWORKFORCE_NO_SKILL_CACHE so a machine with that env set
   can't flake the default-false assertions.

Tests: +5 (resolveEnvCheckIntervalMs never/default/0; lock acquire-
block-release, stale-steal, dead-pid-steal). Full repo 595/595 green.
E2E re-verified: a planted stale file is gone after --refresh-skills
and no lock file is left behind.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@khaliqgant
Copy link
Copy Markdown
Member Author

Addressed all review findings in 9ab2ba7:

  1. never interval ignored (cubic, cli.ts) — fixed. parseCheckInterval returns null for never/off/false, but ?? DEFAULT coalesced null back to 24h. Replaced with resolveEnvCheckIntervalMs() which only falls back to the default on undefined (unset/unparseable); null now correctly disables drift checks. Regression test added.

  2. --refresh-skills left stale artifacts (CodeRabbit + Devin) — fixed. The cache dir is now rmSync'd before the reinstall in both install paths (claude pre-mount + mount-deferred). This both removes stale files from a prior skill version and clears the old marker so a partial-failure reinstall can't fake a later cache hit. The wipe runs under the install lock so a peer's in-progress install isn't clobbered. E2E-verified (planted stale file is gone after refresh).

  3. Concurrent installs into the same cache dir (CodeRabbit) — fixed. Added a per-fingerprint advisory lock at <cacheDir>.lock (sibling file, so a refresh-wipe of the dir can't delete a live lock). Created atomically with wx/O_EXCL. A held lock is stolen only if the holder pid is dead or its heartbeat is >15min stale; if the 10min wait budget is exceeded we proceed unlocked rather than hang a launch. After acquiring, isSkillCacheValid is re-checked so a peer's just-completed install is reused instead of redundantly reinstalled. Tests cover acquire/block/release, stale-steal, and dead-pid-steal.

  4. Env-dependent test flake (CodeRabbit, cli.test.ts) — fixed. The parseAgentArgs flag test now saves/clears/restores AGENTWORKFORCE_NO_SKILL_CACHE.

Full repo: 595/595 tests green (+5 new).

Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 2 files (changes from recent commits).

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="packages/cli/src/cli.ts">

<violation number="1" location="packages/cli/src/cli.ts:724">
P1: The stale-lock check can steal a live lock after 15 minutes because it returns stale on timestamp age before validating PID liveness, which can re-enable concurrent installs into the same cache dir.</violation>
</file>

Tip: Review your code locally with the cubic CLI to iterate faster.
Fix all with cubic
Re-trigger cubic

Comment thread packages/cli/src/cli.ts
Comment on lines +724 to +730
if (Number.isFinite(ts) && Date.now() - ts > SKILL_CACHE_LOCK_STALE_MS) {
return true;
}
const pid = Number(pidRaw);
if (Number.isInteger(pid) && pid > 0 && !skillCachePidAlive(pid)) {
return true;
}
Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai Bot May 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1: The stale-lock check can steal a live lock after 15 minutes because it returns stale on timestamp age before validating PID liveness, which can re-enable concurrent installs into the same cache dir.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At packages/cli/src/cli.ts, line 724:

<comment>The stale-lock check can steal a live lock after 15 minutes because it returns stale on timestamp age before validating PID liveness, which can re-enable concurrent installs into the same cache dir.</comment>

<file context>
@@ -677,6 +677,122 @@ function sessionMountDir(sessionRoot: string): string {
+  try {
+    const [pidRaw, tsRaw] = readFileSync(lockPath, 'utf8').split('\n');
+    const ts = Number(tsRaw);
+    if (Number.isFinite(ts) && Date.now() - ts > SKILL_CACHE_LOCK_STALE_MS) {
+      return true;
+    }
</file context>
Suggested change
if (Number.isFinite(ts) && Date.now() - ts > SKILL_CACHE_LOCK_STALE_MS) {
return true;
}
const pid = Number(pidRaw);
if (Number.isInteger(pid) && pid > 0 && !skillCachePidAlive(pid)) {
return true;
}
const pid = Number(pidRaw);
if (Number.isFinite(ts) && Date.now() - ts > SKILL_CACHE_LOCK_STALE_MS) {
if (!Number.isInteger(pid) || pid <= 0 || !skillCachePidAlive(pid)) {
return true;
}
}
if (Number.isInteger(pid) && pid > 0 && !skillCachePidAlive(pid)) {
return true;
}

Tip: Review your code locally with the cubic CLI to iterate faster.

Fix with Cubic

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant