Skip to content

Suppress watcher restart storms during component deploy#822

Merged
kriszyp merged 4 commits into
mainfrom
kris/suppress-autoreload-during-deploy
May 30, 2026
Merged

Suppress watcher restart storms during component deploy#822
kriszyp merged 4 commits into
mainfrom
kris/suppress-autoreload-during-deploy

Conversation

@kriszyp
Copy link
Copy Markdown
Member

@kriszyp kriszyp commented May 27, 2026

Exhausting inotify handles has been observed in production servers in a couple cases.

Summary

Third of three independent mitigations for harper#488 (alongside PR #809 and PR #821). Adds a cross-thread deploy lifecycle so that during extractApplication + npm install, every Harper thread's component file watchers pause emission and suppress requestRestart(). After the deploy, watchers resume — their fresh initial scan emits add events for the post-deploy tree, which the existing restart debounce collapses into a single coalesced restart.

Why

Today, every file change inside a component directory fires scope.requestRestart(). During a deploy, the extract + install writes hundreds of files; without suppression, each one drives a watcher teardown/recreate cycle through componentLoader — briefly doubling inotify occupancy and amplifying the exhaustion risk PR #821 makes self-healing. This PR removes the storm at the source.

Where to look

  • components/deployLifecycle.ts (new) — DeployLifecycle is an EventEmitter keyed by component name with ref-counted in-flight state, so overlapping deploys of the same component compose. broadcastDeployStart awaits acknowledgement (so the caller can rely on every worker pausing before file I/O begins); broadcastDeployEnd is fire-and-forget. Receiver installed at module load — workers don't call the broadcast helpers themselves but must still react to events from main.
  • components/Application.tsprepareApplication brackets extract + install with broadcastDeployStart / finally broadcastDeployEnd. Broadcast failures are non-fatal.
  • components/Scope.ts — each Scope subscribes to deploy events for its own appName. On deploy:start: pauses all EntryHandlers, sets #deployInFlight, re-emits deploy:start on the scope so plugins can observe via scope.on('deploy:start', ...). On deploy:end: resumes EntryHandlers before notifying plugins (so a plugin throwing in its handler can't leave watchers paused), then re-emits deploy:end. requestRestart() gated on #deployInFlight.
  • components/EntryHandler.tspause() closes the chokidar watcher (releases inotify) while keeping the EntryHandler EventEmitter intact (preserving plugin listeners). resume() recreates chokidar; the fresh initial scan fires add events for current files. Includes a close-promise handoff so a rapid pause→resume doesn't overlap teardown/setup under inotify pressure.

Cross-model review feedback addressed

  • Codex flagged (both fixed):
    1. Workers never subscribed because ensureReceiver() was only called from broadcast helpers — now called at module load.
    2. Original close+recreate dropped plugin-registered handlers — refactored to pause+resume, keeping the same EntryHandler instance.
  • Gemini flagged (all fixed):
    1. Race in pause/resume — close promise wasn't retained, so resume's new watcher started while old was still releasing handles. Now #pausedClose is awaited inside #watch().
    2. Blocker: a plugin throwing on deploy:end aborted the function and left watchers permanently paused. Now: resume runs before emit, and emit is wrapped in #safeEmit.
    3. Overlapping deploys could end early — switched from Set to ref-counted Map, so 0→1 fires start, 1→0 fires end, intermediate transitions are silent.

Potential concerns to focus on

  • Cross-thread broadcast under test: I unit-tested the local emit path heavily (deployLifecycle ref-counting, Scope subscriptions, EntryHandler pause/resume), but the actual broadcastWithAcknowledgement round-trip is only smoke-covered. Gemini suggested an integration test triggering a real API deploy against a multi-threaded server. Open to adding that in this PR or a follow-up.
  • prepareApplication is also called from installApplications() on Harper startup. At that point workers haven't joined yet — broadcastWithAcknowledgement resolves immediately against zero peers, so no harm. The local _handle still fires, which is fine (Scopes don't exist yet either).
  • Plugin contract: scope.on('deploy:start' | 'deploy:end', (name) => ...) is a new public surface. Documented in the ScopeEventsMap. Open to feedback on naming.
  • Generated by an LLM (Claude Opus 4.7).

Component deploys (extractApplication + npm install) write into a directory
that every Scope's EntryHandler is watching. Without coordination, each
intermediate file change fires scope.requestRestart() and drives a watcher
teardown/recreate cycle through componentLoader — briefly doubling inotify
occupancy and amplifying the exhaustion risk that motivated harper#488.

Add a cross-thread deploy lifecycle:

- `components/deployLifecycle.ts` — module emitter with ref-counted
  in-flight state, plus `broadcastDeployStart`/`broadcastDeployEnd` helpers
  that propagate via `manageThreads.broadcastWithAcknowledgement` (start) and
  `manageThreads.broadcast` (end). Receiver installed at module load so
  worker threads react to events originating on main.
- `components/Application.ts` — `prepareApplication` wraps extract + install
  in `broadcastDeployStart` / `finally broadcastDeployEnd`. Best-effort: a
  broadcast failure doesn't block the deploy.
- `components/Scope.ts` — each Scope subscribes to the deploy emitter for
  its own component name. On deploy:start, pauses all EntryHandlers and sets
  an in-flight flag that suppresses requestRestart(). On deploy:end, resumes
  the EntryHandlers (which re-scan and fire add events that collapse into a
  single coalesced restart via the existing debounce). Exposes `deploy:start`
  / `deploy:end` events on the scope so plugins can observe.
- `components/EntryHandler.ts` — adds `pause()` / `resume()`. pause()
  closes the chokidar watcher (releasing inotify) but preserves the
  EntryHandler EventEmitter and its listener attachments, so plugins
  registered via `scope.handleEntry(handler)` survive the pause. resume()
  awaits the pause-initiated close before installing a new watcher to avoid
  overlapping teardown/setup under inotify pressure.

Refs harper#488 (third of three independent mitigations, after #809 and
#821).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@claude

This comment has been minimized.

Pre-existing format drift on main, unrelated to the EntryHandler ignore
list change in this PR. Included here to unblock the Format Check CI
that's been red on every push to this branch (and on main too). Three
ternary spreads collapsed to single lines per prettier's line width.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@kriszyp kriszyp marked this pull request as ready for review May 27, 2026 12:58
@kriszyp
Copy link
Copy Markdown
Member Author

kriszyp commented May 27, 2026

/gemini review

gemini-code-assist[bot]

This comment was marked as resolved.

Three independent fixes from automated review:

1. deployLifecycle._resetForTests no longer flips receiverInstalled.
   manageThreads.onMessageByType has no deregistration API, so flipping
   the flag let a subsequent ensureReceiver() pile a duplicate listener
   and double-increment the refcount on every broadcast.

2. broadcastDeployStart now races broadcastWithAcknowledgement against
   a 5s timeout. Comment claimed best-effort, but an unresponsive worker
   could hang the deploy indefinitely. Now matches the documented intent.

3. EntryHandler.pause() now resets `this.ready` to a fresh pending
   promise. Without this, `await entryHandler.ready` after pause()
   would resolve immediately when the watcher had already become ready
   before pause — violating the documented contract that ready awaits
   resume(). New test covers the contract.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Member

@Ethan-Arrowood Ethan-Arrowood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like it!

if (this.#watcher) {
// Retain the close promise so resume()→#watch() can await full
// teardown before opening a new watcher.
this.#pausedClose = Promise.resolve(this.#watcher.close()).catch(() => {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why wrap the this.#watcher.close() in a Promise.resolve() when it returns a promise itself? Is this in case it doesn't?

@kriszyp kriszyp merged commit 9f7c19c into main May 30, 2026
45 checks passed
@kriszyp kriszyp deleted the kris/suppress-autoreload-during-deploy branch May 30, 2026 13:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants