Perf + cleanup: unified OSC scanner, streaming CRLF, smaller render hot path#121
Perf + cleanup: unified OSC scanner, streaming CRLF, smaller render hot path#121
Conversation
Five separate scanners were each sweeping the PTY input to find their own OSC code (color queries, PWD, 51/E, 52, 133), doing O(5N) work per write. Replace them with a single `OscIterator` that yields `(code, payload, term)` for any well-formed OSC, and one `dispatchPostWriteOscs` that handles codes 7/51/52/133 in document order. Colour queries (pre-vtWrite) use the same iterator. Drop the now-unused OscScanner, findOscTerminator, and parseDecimal helpers. Engine micro-benchmarks on bulk input improve ~20-28%: plain: 65.8 → 81.5 MB/s styled: 60.8 → 77.7 MB/s unicode: 40.1 → 48.3 MB/s stream: 63.2 → 73.3 MB/s PTY-path numbers are process-I/O-bound and unchanged within noise.
The old path allocated up to 131 KB of stack scratch (plus heap fallback with a silent-truncation failure mode) and did two passes over the input: one to count bare \n, one to copy-and-insert \r. Replace with a single streaming pass that writes maximal raw segments directly to libghostty's VT parser via vtWrite and emits "\r\n" inline at each bare \n. libghostty's state machine already handles arbitrary chunking (it's how the filter feeds data anyway), so no scratch buffer is needed. Zero allocation, no truncation risk, no behaviour change.
The per-cell loop was issuing two independent cells_get(RAW) calls: one inside the in-prompt branch (for semantic content) and one inside the empty-grapheme branch (for the wide-spacer check). A cell that hit both — empty prompt padding — paid for the fetch twice. Introduce a small tri-state loader that memoizes the fetch per cell. The first branch to need it pays; the second reuses. Cells that need neither (the common case for content rows outside the prompt region) still pay nothing. No behaviour change.
redraw() used to call computeFirstScrollbackRowHash twice in the common active-shell case: once at the top for rotation detection, once at the bottom to stash the baseline for the next redraw's check. Each call scrolls libghostty's viewport to the top, re-runs render_state_update, and scrolls back — not free. Cache the start-of-redraw hash when rotation was not detected and reuse it at end-of-redraw. Invalidate on the two events that actually shift scrollback row 0 within a single redraw: rotation (erase + rebuild) and trim (top rows deleted). Promotion and insert-at-tail append to the scrollback bottom, so row 0 is unchanged and the cache is still valid. Bench is insensitive (each iteration rebuilds the terminal from scratch, so scrollback is rarely populated on the first redraw); savings show up in sustained real-world sessions.
The top-level load block at file-load time and the later `ghostel--load-module' helper did near-identical work with subtly different behaviour: different guards (featurep vs fboundp), different directory-resolution orders, and different failure modes (display-warning vs user-error). Merge both into a single `ghostel--load-module' defined before the top-level call site. A `prompt-user' flag chooses between the warning (load-time, keep the file usable) and user-error (interactive command, abort the flow) failure modes. Directory resolution now tries load-file-name, then locate-library, then buffer-file-name — covering the three distinct contexts (load, post-load-interactive, eval-buffer) that each path had to handle individually.
There was a problem hiding this comment.
Pull request overview
This PR applies a set of performance-focused refactors across the Zig native module and the Emacs Lisp integration to reduce redundant scanning/work in hot paths (PTY input handling and redraw/rendering), while consolidating module-loading behavior.
Changes:
- Replace multi-pass OSC scanning with a single
OscIteratorpass and centralized dispatch for OSC 7/51/52/133. - Stream CRLF normalization directly into libghostty VT writes (removing large scratch buffering and extra passes).
- Reduce redraw/render overhead by caching per-cell RAW handles in
buildRowContentand reusing the first-scrollback-row hash within a redraw; consolidate module-load paths inghostel.el.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
src/render.zig |
Adds a per-cell RAW handle memoizer to avoid duplicate cells_get(RAW) calls; caches row-0 scrollback hash across a redraw to avoid redundant recomputation. |
src/module.zig |
Streams CRLF normalization into vtWrite, introduces a unified OSC iterator/scanner, and centralizes post-write OSC dispatch. |
ghostel.el |
Consolidates module load logic into ghostel--load-module with load-time warning vs interactive user-error behavior. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| var seg_start: usize = 0; | ||
| var prev_was_cr: bool = false; | ||
| for (raw, 0..) |ch, i| { | ||
| if (ch == '\n' and !prev_was_cr) { | ||
| if (i > seg_start) term.vtWrite(raw[seg_start..i]); | ||
| term.vtWrite("\r\n"); | ||
| seg_start = i + 1; | ||
| prev_was_cr = false; | ||
| } else { | ||
| prev_was_cr = (ch == '\r'); | ||
| } |
There was a problem hiding this comment.
CRLF normalization only tracks prev_was_cr within this single fnWriteInput call. If Emacs delivers a CRLF pair split across two ghostel--write-input calls (chunk A ends with \r, chunk B begins with \n), this logic will treat the leading \n as “bare” and insert an extra \r, producing \r\r\n in libghostty. Consider persisting a last_write_ended_with_cr flag on Terminal (or similar) so prev_was_cr is initialized from prior state and updated based on the final byte written.
There was a problem hiding this comment.
Good catch — fixed in 3b27eb9. Persisted last_wrote_cr on the Terminal struct so a CRLF pair split across two ghostel--write-input calls no longer gets double-normalized to \r\r\n. Cleared alongside the other resize-reset state. Added ghostel-test-crlf-split-across-writes as a regression test.
f2d21c5 to
c46c7c0
Compare
Rolling up all post-initial-push fixes from CI, Copilot review, and the second-round review agent into one commit. Loader & tests - Honour `(fboundp 'ghostel--new)' in the unified module loader so `cl-letf' stubs in the pure-Elisp test suite short-circuit as they did before the consolidation. Restores the title-tracking tests on the elisp-only CI job. CRLF across calls - Persist `last_input_was_cr' on the Terminal struct so a CRLF pair split across two `ghostel--write-input' calls isn't double- normalized into \r\r\n. Reset in `resize' alongside the other reset state. Regression tests: canonical split, empty chunk between CR and LF, standalone CR followed by complete CRLF. OSC iterator correctness - Treat a following `\e]' introducer as a payload boundary so one OSC missing its terminator doesn't cannibalize the next OSC's bytes as its own payload. Input `\e]7;PARTIAL\e]52;c;aGVsbG8=\a' previously dispatched OSC 7 with garbage and starved OSC 52 entirely; now OSC 7 is recognized as partial and skipped, OSC 52 dispatches normally. Regression test added. - Rename `OscEntry.term' -> `OscEntry.terminator' to remove the visual collision with `Terminal *term' in dispatchPostWriteOscs. - Clarify the `dispatchPostWriteOscs' docstring: it handles the four post-vtWrite OSCs; color queries use the same iterator from a different call site. Redraw hash gate - Skip `computeFirstScrollbackRowHash' at end-of-redraw when no writes happened this frame and the start-of-redraw rotation check didn't populate `cached_row0_hash'. Row 0 can't have moved, so the stored hash is still current. Covers cursor-only and idle-timer redraws. Naming and comments - Rename `last_wrote_cr' -> `last_input_was_cr': the field names the input-stream property, not a normalizer implementation detail that might change underneath it. - `loadRawCell' docstring notes the memoize assumes idempotent fetch (true today in libghostty). README - Refresh the 5 MB PTY throughput table: ghostel 70 -> 87 MB/s plain, 56 -> 64 MB/s URL-heavy. Other backends within noise.
c46c7c0 to
1729f24
Compare
Summary
Five self-contained, individually-tested perf and cleanup commits from an architectural review. Each was benchmarked against
mainbefore moving on.OscIteratorthat yields(code, payload, term); post-vtWrite dispatch handles 7/51/52/133 in document order.buildRowContent— a cell that hit both the in-prompt branch and the empty-grapheme branch used to pay forcells_get(RAW)twice. Lazy tri-state loader dedupes without penalizing cells that need neither.redraw()was callingcomputeFirstScrollbackRowHashtwice in the common case. Cache the start-of-redraw value and reuse at end; invalidate only on trim / rotation-erase.ghostel--load-module) merged into one, with aprompt-userflag selecting between warning (load-time) anduser-error(interactive) failure modes.Bench delta (
make bench-quick, ±10% noise)Most of the engine/stream win is commit #1 alone; the remaining commits compound small improvements on top.
Not done (originally in the review, rejected after deeper look)
ghostel--raw-key-sequencefallback — turned out to be defensive, not dead.test/ghostel-test.el:4503explicitly exercises it, and libghostty can legitimately returnSUCCESSwith 0 bytes written.window-starttopoint-minwhen the buffer shrinks; content-keys recover across that. A marker-based substitution would still need the same content-search fallback, adding a code path rather than removing one.Test plan
make -j4 all— 30 + 68 + 126 = 224 tests green at every commitmake bench-quick— numbers abovecat large_log, scroll back, resize window