Skip to content

perf(reconcile): skip envelope on steady-state sessions, keep bootstrap safe#11

Merged
crisandrews merged 1 commit intocrisandrews:mainfrom
JD2005L:fix/defer-cron-reconcile-safely
Apr 17, 2026
Merged

perf(reconcile): skip envelope on steady-state sessions, keep bootstrap safe#11
crisandrews merged 1 commit intocrisandrews:mainfrom
JD2005L:fix/defer-cron-reconcile-safely

Conversation

@JD2005L
Copy link
Copy Markdown
Contributor

@JD2005L JD2005L commented Apr 15, 2026

Summary

The SessionStart reconcile hook (hooks/reconcile-crons.sh) emits a ToolSearch + CronList + CronCreate envelope on every session start. On any already-bootstrapped workspace that envelope is pure overhead — the agent runs the ritual, finds every expected harnessTaskId already live, and prints recreated=0 adopted=0 alive=N. Several hundred ms of blocking tool calls on every REPL open, for work that meaningfully happens only on the very first session.

This PR adds a narrow fast-path: if every active registry entry already has a populated harnessTaskId, the hook exits 0 without building or emitting the envelope. If anything is unbootstrapped or a migration offer is pending, behavior is identical to today.

Why this is safe

The fast-path condition is driven entirely by existing state — no new marker files, no flag that can get out of sync:

Scenario Registry state Fast-path? Result
First-ever boot Entries just seeded, harnessTaskId null ❌ falls through Envelope emits, agent bootstraps crons. Identical to today.
Upgrade from older version Existing entries have populated harnessTaskId ✅ taken Immediate fast-path. Zero regression, zero surprise.
Steady-state session All live ✅ taken Hook exits fast. Save the boot-time tool calls.
External cron deletion Writeback clears stale harnessTaskId ❌ falls through Envelope re-emits to reconcile.
Cron deleted, writeback missed it harnessTaskId still populated but harness lacks it ✅ taken this session Heartbeat skill's Cron reconcile step catches drift within 30 min.
Pending migration offer MIGRATION_NEEDED=1 ❌ forced through Migration question still emitted as before.

The worst-case window for unreconciled drift is bounded at 30 min by the heartbeat skill's reconcile step (HEARTBEAT.md: "Cron reconcile — verify live CronList matches memory/crons.json..."). For context, today's behavior runs reconcile every time the user opens a new session — often hours or days apart on a long-lived agent — so this PR actually tightens the guarantee for workspaces that don't session-start often.

Implementation

Single insertion in hooks/reconcile-crons.sh, placed after migration detection (step 6) and before expected-set build (step 7). 26 lines total including comment explaining each of the 6 scenarios above. No other files touched.

if [[ $MIGRATION_NEEDED -eq 0 && -f "$REGISTRY" ]]; then
  UNBOOTSTRAPPED=$(jq '[.entries[]
      | select(.paused == false and .tombstone == null
               and (.harnessTaskId == null or .harnessTaskId == ""))]
      | length' "$REGISTRY" 2>/dev/null || echo 1)
  if [[ "$UNBOOTSTRAPPED" == "0" ]]; then
    exit 0
  fi
fi

The || echo 1 fallback is deliberate: if jq fails unexpectedly (corrupt registry, syntax bug in a future edit), we fall through to the envelope path rather than silently suppress it. Fail-safe toward doing the work, not toward skipping it.

Test plan

  • Fresh workspace (no memory/crons.json pre-existing): hook emits envelope, agent creates default crons, subsequent session takes fast path
  • Existing workspace with populated harnessTaskId: hook exits immediately, no envelope
  • IMPORT_BACKLOG.md + unanswered migration: envelope still emitted (migration question fires)
  • Corrupt crons.json: jq fails, fallback triggers envelope emission
  • External CronDelete of an adopted entry: next heartbeat's reconcile catches it

Not included

This PR is just the perf fix. A tighter self-healing loop (writeback clearing harnessTaskId on CronDelete capture) is out of scope and would be a follow-up.

🤖 Generated with Claude Code

…ap safe

The SessionStart reconcile hook emits a ToolSearch + CronList + CronCreate
envelope on every session start. On any already-bootstrapped workspace that
envelope is pure overhead — the agent dutifully runs the ritual, finds every
expected harnessTaskId already live, and prints "recreated=0 adopted=0 alive=N".
Several hundred ms of blocking tool calls every single time the REPL opens,
for a reconcile that only actually did work on the very first session.

This patch adds a fast-path check after migration detection and before the
envelope is built. If every non-paused, non-tombstoned registry entry already
has a populated harnessTaskId (i.e. the cron is live in the harness), the
hook exits 0 without emitting the envelope. Drift recovery is not lost —
the heartbeat skill's existing cron-reconcile step runs within 30 min and
re-emits the envelope path via `/agent:crons reconcile` when needed.

Bootstrap remains correct because the fast-path condition is driven by
existing state (harnessTaskId presence in the registry). On:

- First-ever boot: registry just got seeded, harnessTaskId is null for every
  default entry → fast-path falls through → envelope emits → agent creates
  crons → writeback populates harnessTaskId → next session takes the fast
  path. Same behavior as before, one session later.

- Upgrade from an older version: existing registry already has populated
  harnessTaskId values from prior sessions → immediate fast-path. Zero
  regression.

- Externally deleted cron: next session's envelope captures the drift (if
  harnessTaskId was cleared by writeback) or the next heartbeat does
  (if the registry still reflects the stale id). Bounded at 30 min either way.

- Pending migration offer: the fast-path is bypassed when MIGRATION_NEEDED=1
  so the migration question still gets asked.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@crisandrews crisandrews merged commit 53d4eab into crisandrews:main Apr 17, 2026
crisandrews added a commit that referenced this pull request Apr 17, 2026
Summary of changes in this release (full detail in CHANGELOG.md):

Added
- Resume-on-restart wrapper for service mode (#7)
- Service hardening defaults: HOME/TERM env, StartLimitBurst guard, persistent log path (#8)
- /agent:update skill + heartbeat version-check with day-gate and per-version dedupe (#12)

Fixed
- WORKSPACE resolution so memory_search hits user's project dir, not plugin dir (#6, closes #5)
- Linux systemd crash loop after Claude Code auto-updates mid-run — PTY wrap in ExecStart + DISABLE_AUTOUPDATER=1 for file-integrity (#9, #17/#18)
- macOS launchd PTY wrap parity (#16)
- Cross-user /agent:import discovery + post-import path sanity check (#10)

Performance
- reconcile-crons.sh fast-path on steady-state sessions (#11)

Thanks to @JD2005L for the whole batch.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants