feat: persistent hed-lsp client (closes #144)#146
Merged
Conversation
src/lsp/ adds a persistent client for the hed-lsp Node server. HedLspClient owns either a spawned 'node server.js --stdio' child (spawn_stdio) or a Unix-socket connection to a running daemon (connect_unix). Both transports speak the same LSP wire protocol (Content-Length framed JSON-RPC); the client demultiplexes responses by id so concurrent suggest() calls work. src/lsp/daemon.py is the 'hedit-lspd' supervisor entry point. It owns one Node child and proxies LSP messages bidirectionally to multiple Unix-socket peers, so a single warm server can serve many 'hedit annotate' CLI invocations. Per-peer 'shutdown' and 'exit' are intercepted (a peer disconnecting must not kill the shared child). Real-LSP integration tests cover stdio spawn, batched suggest, concurrent demuxing, empty-query failure, idempotent shutdown, missing-server-js error, and the full daemon start -> connect -> suggest -> SIGTERM lifecycle. No mocks; tests skip cleanly when node or hed-lsp isn't installed.
HedAnnotationWorkflow now takes an injected lsp_client and uses one batched suggest(*keywords) call from _semantic_preprocess_node instead of spawning hed-suggest per keyword in a sequential loop. Same change in ValidationAgent for tag-replacement suggestions. The FastAPI lifespan spawns the persistent client at startup (HED_LSP_SERVER_JS env or /app/hed-lsp/server/out/server.js by default) and shuts it down on teardown. A module-level sentinel keeps the 8 per-request workflow construction sites unchanged. HED_LSP_DISABLE=1 falls back to keyword-only preprocessing. For the standalone CLI: - 'hedit lsp start|stop|status' manage an opt-in 'hedit-lspd' daemon under the per-user runtime dir. - 'hedit annotate' picks up the daemon's socket if present; otherwise runs without LSP (HEDIT_LSP=0 to force-skip). - No auto-spawn, no idle-timeout magic; users opt in explicitly. src/validation/hed_lsp.py is now a thin re-export shim over src.lsp plus find_lsp_server_js() / is_hed_lsp_available() helpers. The synchronous subprocess.run path is deleted; no backwards-compat shims.
Dockerfile pins the hed-lsp clone to the feat/hed-suggest-request branch via the new HED_LSP_REF build arg so the runtime image picks up the hed/suggest request handler. Adds HED_LSP_SERVER_JS to the container ENV so the FastAPI lifespan can find the entrypoint without configuration. .env.example documents the new env vars: HED_LSP_SERVER_JS, HED_LSP_DISABLE, HEDIT_LSP_RUNTIME_DIR, HEDIT_LSP. Drops the stale docs for the now-removed HED_LSP_USE_SEMANTIC and HED_LSP_MAX_RESULTS knobs. Patch bump with alpha prerelease per the develop -> main release workflow.
Deploying hedit with
|
| Latest commit: |
3250f94
|
| Status: | ✅ Deploy successful! |
| Preview URL: | https://c3bcda98.hedit.pages.dev |
| Branch Preview URL: | https://feature-issue-144-persistent.hedit.pages.dev |
Resolves trivial conflict on version files. Keeping the feature branch's 0.7.10a0 (patch+alpha bump) over main's 0.7.9a2 since this PR is the next alpha promotion off develop.
Code-review and silent-failure audit raised a few real defects and
logging-quality issues. Fixes, paired with the test that catches
the most important one:
- src/lsp/client.py: when the read loop exits (clean EOF, protocol
error, or unexpected crash) the client now sets _shutdown_started
so subsequent suggest() calls fail fast instead of hanging on a
future that will never be resolved. Added a regression test
(test_suggest_after_shutdown_fails_fast) that exercises this.
- src/lsp/client.py: _request_unshutdownable now pops the pending
future on write failure, matching _request's contract, so a
broken pipe during shutdown doesn't strand a future for
_fail_pending to clean up by side effect.
- src/lsp/client.py: shutdown() now cancels the reader task before
closing the writer, so the reader doesn't observe a half-closed
stream and log a spurious 'connection closed' before shutdown
completes. Writer close failures are logged at debug level
instead of being silently swallowed.
- src/lsp/client.py: _dispatch logs unknown-id responses and
late-arriving responses at warning level, and unhandled server
notifications at debug, so protocol-level anomalies aren't
invisible.
- src/lsp/daemon.py: _NodeChild error sentinel now echoes the
pending request id so HedLspClient._dispatch can resolve the
caller's future instead of silently dropping the message.
- src/lsp/daemon.py: _NodeChild._read_loop catches unexpected
exceptions with logger.exception so a reader crash isn't an
asyncio.Task ghost. Server task crashes in serve_until_signal
now log at error with traceback. Peer writer close errors get
a debug log.
- src/api/main.py: LSP spawn / shutdown failures now go through
logging.getLogger('hedit.lsp') with exc_info, not print(), so
log aggregation in prod can detect a silently-degraded server.
The exception clause around spawn_stdio narrows to (RuntimeError,
OSError, TimeoutError) so programming errors propagate.
- src/cli/local_executor.py: connect-to-daemon failure is logged
at debug with the socket path and the error; previously it was
a bare except: pass.
- src/agents/workflow.py and validation_agent.py: the outer except
around lsp_client.suggest() now passes exc_info=True so an
unexpected exception (one that suggest() didn't already convert
to a failure result) leaves a usable traceback.
- tests/lsp/test_daemon_lifecycle.py: replace deprecated
asyncio.get_event_loop().time() with time.monotonic().
| for path in (pid_file_path(rt_dir), socket_file_path(rt_dir), meta_file_path(rt_dir)): | ||
| try: | ||
| path.unlink() | ||
| except FileNotFoundError: |
| task.cancel() | ||
| try: | ||
| await task | ||
| except (asyncio.CancelledError, Exception): |
| def remove_broadcast_queue(self, queue: asyncio.Queue[dict]) -> None: | ||
| try: | ||
| self._broadcast.remove(queue) | ||
| except ValueError: |
| if self._process.returncode is None: | ||
| try: | ||
| self._stdin.close() | ||
| except Exception: |
| task.cancel() | ||
| try: | ||
| await task | ||
| except (asyncio.CancelledError, Exception): |
| for task in list(self._connections): | ||
| try: | ||
| await task | ||
| except (asyncio.CancelledError, Exception): |
| for path in (self._socket_path, self._pid_path, self._meta_path): | ||
| try: | ||
| path.unlink() | ||
| except FileNotFoundError: |
| forward_task.cancel() | ||
| try: | ||
| await forward_task | ||
| except (asyncio.CancelledError, Exception): |
|
|
||
| try: | ||
| asyncio.run(_run()) | ||
| except KeyboardInterrupt: |
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
hed-lsp migrated from npm to pnpm workspaces (hed-standard/hed-lsp#28), so 'npm install' at the repo root no longer hydrates server/node_modules and the subsequent tsc -b fails with TS2307 'Cannot find module vscode-languageserver' etc. Switch to pnpm via corepack, pinned to the same version the upstream CI uses (10.33.4), and drop the now-redundant 'npm link' for the hed-suggest CLI binary — the persistent LSP path spawns server.js directly via HED_LSP_SERVER_JS, so the CLI shim doesn't ship in the image anymore.
The CI build uses deploy/Dockerfile (see docker-build.yml + security-scan.yml), not the root Dockerfile, so the previous npm->pnpm migration commit missed the file CI actually runs against. Mirror the same change: pin hed-lsp to feat/hed-suggest-request, switch the build to pnpm@10.33.4 via corepack, drop the obsolete 'npm link' of hed-suggest, and export HED_LSP_SERVER_JS so the FastAPI lifespan finds the server entrypoint without configuration.
The src/lsp/ module is the bulk of this PR but was gated behind pytest.mark.integration, so the unit-test CI job skipped it and codecov/patch dropped below threshold. The hed_lsp_server_js conftest fixture already skips cleanly when node or hed-lsp aren't available, so the integration marker isn't actually needed for these tests — they're 'integration' with a Node subprocess, not with a paid LLM API. Drop the marker so the tests run in the default lane. CI then needs hed-lsp available: setup-node@v6 + a build step that clones the pinned hed-lsp branch, runs pnpm install + compile, and exports HED_LSP_SERVER_JS so the conftest finds the entrypoint. This makes src/lsp/* execute under coverage in unit-tests, lifting patch coverage above the 50% codecov threshold without weakening any policy.
hed-lsp PR #31 merged as 36447afb5e48bfecc45af0464b652614ba571414 on main (v0.4.0). Pin both deploy/Dockerfile and .github/workflows/test.yml to that commit SHA so the build is reproducible and survives any branch cleanup upstream. tests/lsp/test_daemon_internals.py: in-process exercise of _NodeChild and _Daemon. The existing end-to-end test launches the daemon in a subprocess, which means coverage.py never sees the daemon code. These five new tests instantiate _NodeChild and _Daemon directly within the test process so coverage on src/lsp/daemon.py jumps from ~14% (just the entry-point main()) to most of the proxy and lifecycle logic. Still no mocks: tests use the real Node hed-lsp child and a real Unix socket; the hed_lsp_server_js fixture skips when neither is available.
This was referenced May 21, 2026
neuromechanist
added a commit
that referenced
this pull request
May 21, 2026
Per CLAUDE.md 'Develop Branch Sync Rule': after each alpha release on main (0.7.10a2 here), develop bumps the patch and resets to .dev0 so the two branches share a clean version lineage and dev builds publish to TestPyPI under the next patch series. Fast-forwarded merge from main (no divergence: develop had nothing ahead). All #146 (persistent hed-lsp) and #151 (#148+#150 latency) work is now on develop.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Eliminates the dominant per-request latency on the prod backend: the multi-second "Initializing annotation workflow..." gap before annotation starts. The cause was
_semantic_preprocess_nodecallingsubprocess.run(["hed-suggest", ...])once per extracted keyword in a sequential loop. Each spawn cold-booted the LSP server, loaded ~600 MB embeddings, answered one query, and exited (~6 s warm, 91 s very-cold). With 10 keywords that stacked to 60+ s.This PR uses the hed-lsp Node server as designed: one persistent JSON-RPC connection held for the lifetime of the FastAPI process, batched
hed/suggestqueries answered in ~40 ms.Closes #144.
Companion change
hed-lsp branch
feat/hed-suggest-requestadds thehed/suggestJSON-RPC handler. The Dockerfile pins to that branch viaARG HED_LSP_REF; will switch back to a tag-based pin once that PR merges.What's in this PR
src/lsp/— async LSP/JSON-RPC client over stdio and Unix sockets, plushedit-lspdsupervisor daemon.node server.js --stdiochild on startup and shuts it down on teardown.HedAnnotationWorkflow._semantic_preprocess_nodereplaces the per-keywordsubprocess.runloop with one batchedawait client.suggest(*keywords).ValidationAgentuses the same persistent client for invalid-tag replacement suggestions.hedit lsp start|stop|statusfor the opt-in standalone CLI daemon.hedit annotateauto-detects the socket;HEDIT_LSP=0disables.src/validation/hed_lsp.pyreduced to a thin re-export shim; the synchronoussubprocess.runpath is deleted (no backwards-compat shims per project policy).tests/lsp/(7 tests, no mocks): client stdio, batched suggest, concurrent demux, idempotent shutdown, missing-server-js error, full daemon lifecycle.tests/test_hed_lsp.py+tests/test_keyword_extraction.pydeleted.0.7.9.dev4->0.7.10a0.Verification
Local against the hed-lsp
feat/hed-suggest-requestbuild:initializehandshake: 163 mshed/suggest(5 queries, includes schema load): 37 msExisting test suite: 450 passed, 1 skipped, 43 deselected (
-m "not integration").LSP-specific suite: 7 passed in 1.27 s (real LSP, no mocks).
Out of scope
Test plan
uv run pytest -m "not integration"-- 450 passed.uv run pytest tests/lsp/against a local hed-lsp build -- 7 passed.uv run ruff checkclean on all touched files.docker exec hedit pgrep -fa "server.js --stdio"shows the persistent child, and a fresh/annotaterequest shows preprocess wall time <500 ms.