Suppress watcher restart storms during component deploy#822
Merged
Conversation
Component deploys (extractApplication + npm install) write into a directory that every Scope's EntryHandler is watching. Without coordination, each intermediate file change fires scope.requestRestart() and drives a watcher teardown/recreate cycle through componentLoader — briefly doubling inotify occupancy and amplifying the exhaustion risk that motivated harper#488. Add a cross-thread deploy lifecycle: - `components/deployLifecycle.ts` — module emitter with ref-counted in-flight state, plus `broadcastDeployStart`/`broadcastDeployEnd` helpers that propagate via `manageThreads.broadcastWithAcknowledgement` (start) and `manageThreads.broadcast` (end). Receiver installed at module load so worker threads react to events originating on main. - `components/Application.ts` — `prepareApplication` wraps extract + install in `broadcastDeployStart` / `finally broadcastDeployEnd`. Best-effort: a broadcast failure doesn't block the deploy. - `components/Scope.ts` — each Scope subscribes to the deploy emitter for its own component name. On deploy:start, pauses all EntryHandlers and sets an in-flight flag that suppresses requestRestart(). On deploy:end, resumes the EntryHandlers (which re-scan and fire add events that collapse into a single coalesced restart via the existing debounce). Exposes `deploy:start` / `deploy:end` events on the scope so plugins can observe. - `components/EntryHandler.ts` — adds `pause()` / `resume()`. pause() closes the chokidar watcher (releasing inotify) but preserves the EntryHandler EventEmitter and its listener attachments, so plugins registered via `scope.handleEntry(handler)` survive the pause. resume() awaits the pause-initiated close before installing a new watcher to avoid overlapping teardown/setup under inotify pressure. Refs harper#488 (third of three independent mitigations, after #809 and #821). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This comment has been minimized.
This comment has been minimized.
Pre-existing format drift on main, unrelated to the EntryHandler ignore list change in this PR. Included here to unblock the Format Check CI that's been red on every push to this branch (and on main too). Three ternary spreads collapsed to single lines per prettier's line width. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Member
Author
|
/gemini review |
Three independent fixes from automated review: 1. deployLifecycle._resetForTests no longer flips receiverInstalled. manageThreads.onMessageByType has no deregistration API, so flipping the flag let a subsequent ensureReceiver() pile a duplicate listener and double-increment the refcount on every broadcast. 2. broadcastDeployStart now races broadcastWithAcknowledgement against a 5s timeout. Comment claimed best-effort, but an unresponsive worker could hang the deploy indefinitely. Now matches the documented intent. 3. EntryHandler.pause() now resets `this.ready` to a fresh pending promise. Without this, `await entryHandler.ready` after pause() would resolve immediately when the watcher had already become ready before pause — violating the documented contract that ready awaits resume(). New test covers the contract. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Ethan-Arrowood
approved these changes
May 28, 2026
| if (this.#watcher) { | ||
| // Retain the close promise so resume()→#watch() can await full | ||
| // teardown before opening a new watcher. | ||
| this.#pausedClose = Promise.resolve(this.#watcher.close()).catch(() => { |
Member
There was a problem hiding this comment.
Why wrap the this.#watcher.close() in a Promise.resolve() when it returns a promise itself? Is this in case it doesn't?
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Exhausting inotify handles has been observed in production servers in a couple cases.
Summary
Third of three independent mitigations for harper#488 (alongside PR #809 and PR #821). Adds a cross-thread deploy lifecycle so that during
extractApplication+npm install, every Harper thread's component file watchers pause emission and suppressrequestRestart(). After the deploy, watchers resume — their fresh initial scan emits add events for the post-deploy tree, which the existing restart debounce collapses into a single coalesced restart.Why
Today, every file change inside a component directory fires
scope.requestRestart(). During a deploy, the extract + install writes hundreds of files; without suppression, each one drives a watcher teardown/recreate cycle throughcomponentLoader— briefly doubling inotify occupancy and amplifying the exhaustion risk PR #821 makes self-healing. This PR removes the storm at the source.Where to look
components/deployLifecycle.ts(new) —DeployLifecycleis anEventEmitterkeyed by component name with ref-counted in-flight state, so overlapping deploys of the same component compose.broadcastDeployStartawaits acknowledgement (so the caller can rely on every worker pausing before file I/O begins);broadcastDeployEndis fire-and-forget. Receiver installed at module load — workers don't call the broadcast helpers themselves but must still react to events from main.components/Application.ts—prepareApplicationbrackets extract + install withbroadcastDeployStart/finally broadcastDeployEnd. Broadcast failures are non-fatal.components/Scope.ts— each Scope subscribes to deploy events for its own appName. Ondeploy:start: pauses all EntryHandlers, sets#deployInFlight, re-emitsdeploy:starton the scope so plugins can observe viascope.on('deploy:start', ...). Ondeploy:end: resumes EntryHandlers before notifying plugins (so a plugin throwing in its handler can't leave watchers paused), then re-emitsdeploy:end.requestRestart()gated on#deployInFlight.components/EntryHandler.ts—pause()closes the chokidar watcher (releases inotify) while keeping the EntryHandler EventEmitter intact (preserving plugin listeners).resume()recreates chokidar; the fresh initial scan fires add events for current files. Includes a close-promise handoff so a rapid pause→resume doesn't overlap teardown/setup under inotify pressure.Cross-model review feedback addressed
ensureReceiver()was only called from broadcast helpers — now called at module load.#pausedCloseis awaited inside#watch().deploy:endaborted the function and left watchers permanently paused. Now: resume runs before emit, and emit is wrapped in#safeEmit.Setto ref-countedMap, so 0→1 fires start, 1→0 fires end, intermediate transitions are silent.Potential concerns to focus on
broadcastWithAcknowledgementround-trip is only smoke-covered. Gemini suggested an integration test triggering a real API deploy against a multi-threaded server. Open to adding that in this PR or a follow-up.prepareApplicationis also called frominstallApplications()on Harper startup. At that point workers haven't joined yet —broadcastWithAcknowledgementresolves immediately against zero peers, so no harm. The local_handlestill fires, which is fine (Scopes don't exist yet either).scope.on('deploy:start' | 'deploy:end', (name) => ...)is a new public surface. Documented in the ScopeEventsMap. Open to feedback on naming.