Perf + cleanup: unified OSC scanner, streaming CRLF, smaller render hot path by dakra · Pull Request #121 · dakra/ghostel

dakra · 2026-04-17T14:55:19Z

Summary

Five self-contained, individually-tested perf and cleanup commits from an architectural review. Each was benchmarked against main before moving on.

Unify OSC scanning into one pass — every PTY write used to get scanned five times (one scanner per of OSC 7 / 51 / 52 / 133 / 4-10-11). Replaced with a single OscIterator that yields (code, payload, term); post-vtWrite dispatch handles 7/51/52/133 in document order.
Stream CRLF normalization — the old path allocated up to 131 KB of stack scratch (with silent-truncation fallback on heap alloc failure) and did two passes over the input. Now streams directly into libghostty's VT parser — zero allocation, no truncation risk.
Share raw-cell fetch in buildRowContent — a cell that hit both the in-prompt branch and the empty-grapheme branch used to pay for cells_get(RAW) twice. Lazy tri-state loader dedupes without penalizing cells that need neither.
Reuse scrollback row-0 hash within a redraw — redraw() was calling computeFirstScrollbackRowHash twice in the common case. Cache the start-of-redraw value and reuse at end; invalidate only on trim / rotation-erase.
Consolidate module-load paths — two near-duplicate loaders (top-level block + ghostel--load-module) merged into one, with a prompt-user flag selecting between warning (load-time) and user-error (interactive) failure modes.

Bench delta (`make bench-quick`, ±10% noise)

scenario	before	after	delta
engine/plain/ghostel-incr	65.8 MB/s	83.6 MB/s	+27%
engine/styled/ghostel-incr	60.8 MB/s	79.4 MB/s	+31%
engine/unicode/ghostel-incr	40.1 MB/s	47.7 MB/s	+19%
stream/ghostel-incr	63.2 MB/s	75.9 MB/s	+20%
pty/plain/ghostel-incr	26.1 MB/s	27.6 MB/s	noise (I/O-bound)

Most of the engine/stream win is commit #1 alone; the remaining commits compound small improvements on top.

Not done (originally in the review, rejected after deeper look)

Remove ghostel--raw-key-sequence fallback — turned out to be defensive, not dead. test/ghostel-test.el:4503 explicitly exercises it, and libghostty can legitimately return SUCCESS with 0 bytes written.
Marker-based scroll preservation — markers can't replace the content-key system. The real problem is that Emacs autonomously clamps window-start to point-min when the buffer shrinks; content-keys recover across that. A marker-based substitution would still need the same content-search fallback, adding a code path rather than removing one.

Test plan

make -j4 all — 30 + 68 + 126 = 224 tests green at every commit
make bench-quick — numbers above
Manual smoke: open a ghostel, run cat large_log, scroll back, resize window
Manual smoke: run a program that uses OSC 7 / 51 / 52 / 133 markers (e.g. the bundled shell integration)

Five separate scanners were each sweeping the PTY input to find their own OSC code (color queries, PWD, 51/E, 52, 133), doing O(5N) work per write. Replace them with a single `OscIterator` that yields `(code, payload, term)` for any well-formed OSC, and one `dispatchPostWriteOscs` that handles codes 7/51/52/133 in document order. Colour queries (pre-vtWrite) use the same iterator. Drop the now-unused OscScanner, findOscTerminator, and parseDecimal helpers. Engine micro-benchmarks on bulk input improve ~20-28%: plain: 65.8 → 81.5 MB/s styled: 60.8 → 77.7 MB/s unicode: 40.1 → 48.3 MB/s stream: 63.2 → 73.3 MB/s PTY-path numbers are process-I/O-bound and unchanged within noise.

The old path allocated up to 131 KB of stack scratch (plus heap fallback with a silent-truncation failure mode) and did two passes over the input: one to count bare \n, one to copy-and-insert \r. Replace with a single streaming pass that writes maximal raw segments directly to libghostty's VT parser via vtWrite and emits "\r\n" inline at each bare \n. libghostty's state machine already handles arbitrary chunking (it's how the filter feeds data anyway), so no scratch buffer is needed. Zero allocation, no truncation risk, no behaviour change.

The per-cell loop was issuing two independent cells_get(RAW) calls: one inside the in-prompt branch (for semantic content) and one inside the empty-grapheme branch (for the wide-spacer check). A cell that hit both — empty prompt padding — paid for the fetch twice. Introduce a small tri-state loader that memoizes the fetch per cell. The first branch to need it pays; the second reuses. Cells that need neither (the common case for content rows outside the prompt region) still pay nothing. No behaviour change.

redraw() used to call computeFirstScrollbackRowHash twice in the common active-shell case: once at the top for rotation detection, once at the bottom to stash the baseline for the next redraw's check. Each call scrolls libghostty's viewport to the top, re-runs render_state_update, and scrolls back — not free. Cache the start-of-redraw hash when rotation was not detected and reuse it at end-of-redraw. Invalidate on the two events that actually shift scrollback row 0 within a single redraw: rotation (erase + rebuild) and trim (top rows deleted). Promotion and insert-at-tail append to the scrollback bottom, so row 0 is unchanged and the cache is still valid. Bench is insensitive (each iteration rebuilds the terminal from scratch, so scrollback is rarely populated on the first redraw); savings show up in sustained real-world sessions.

The top-level load block at file-load time and the later `ghostel--load-module' helper did near-identical work with subtly different behaviour: different guards (featurep vs fboundp), different directory-resolution orders, and different failure modes (display-warning vs user-error). Merge both into a single `ghostel--load-module' defined before the top-level call site. A `prompt-user' flag chooses between the warning (load-time, keep the file usable) and user-error (interactive command, abort the flow) failure modes. Directory resolution now tries load-file-name, then locate-library, then buffer-file-name — covering the three distinct contexts (load, post-load-interactive, eval-buffer) that each path had to handle individually.

Copilot

Pull request overview

This PR applies a set of performance-focused refactors across the Zig native module and the Emacs Lisp integration to reduce redundant scanning/work in hot paths (PTY input handling and redraw/rendering), while consolidating module-loading behavior.

Changes:

Replace multi-pass OSC scanning with a single OscIterator pass and centralized dispatch for OSC 7/51/52/133.
Stream CRLF normalization directly into libghostty VT writes (removing large scratch buffering and extra passes).
Reduce redraw/render overhead by caching per-cell RAW handles in buildRowContent and reusing the first-scrollback-row hash within a redraw; consolidate module-load paths in ghostel.el.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File	Description
`src/render.zig`	Adds a per-cell RAW handle memoizer to avoid duplicate `cells_get(RAW)` calls; caches row-0 scrollback hash across a redraw to avoid redundant recomputation.
`src/module.zig`	Streams CRLF normalization into `vtWrite`, introduces a unified OSC iterator/scanner, and centralizes post-write OSC dispatch.
`ghostel.el`	Consolidates module load logic into `ghostel--load-module` with load-time warning vs interactive `user-error` behavior.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-17T15:37:50Z

+    var seg_start: usize = 0;
+    var prev_was_cr: bool = false;
+    for (raw, 0..) |ch, i| {
+        if (ch == '\n' and !prev_was_cr) {
+            if (i > seg_start) term.vtWrite(raw[seg_start..i]);
+            term.vtWrite("\r\n");
+            seg_start = i + 1;
+            prev_was_cr = false;
+        } else {
+            prev_was_cr = (ch == '\r');
        }


CRLF normalization only tracks prev_was_cr within this single fnWriteInput call. If Emacs delivers a CRLF pair split across two ghostel--write-input calls (chunk A ends with \r, chunk B begins with \n), this logic will treat the leading \n as “bare” and insert an extra \r, producing \r\r\n in libghostty. Consider persisting a last_write_ended_with_cr flag on Terminal (or similar) so prev_was_cr is initialized from prior state and updated based on the final byte written.

Good catch — fixed in 3b27eb9. Persisted last_wrote_cr on the Terminal struct so a CRLF pair split across two ghostel--write-input calls no longer gets double-normalized to \r\r\n. Cleared alongside the other resize-reset state. Added ghostel-test-crlf-split-across-writes as a regression test.

Rolling up all post-initial-push fixes from CI, Copilot review, and the second-round review agent into one commit. Loader & tests - Honour `(fboundp 'ghostel--new)' in the unified module loader so `cl-letf' stubs in the pure-Elisp test suite short-circuit as they did before the consolidation. Restores the title-tracking tests on the elisp-only CI job. CRLF across calls - Persist `last_input_was_cr' on the Terminal struct so a CRLF pair split across two `ghostel--write-input' calls isn't double- normalized into \r\r\n. Reset in `resize' alongside the other reset state. Regression tests: canonical split, empty chunk between CR and LF, standalone CR followed by complete CRLF. OSC iterator correctness - Treat a following `\e]' introducer as a payload boundary so one OSC missing its terminator doesn't cannibalize the next OSC's bytes as its own payload. Input `\e]7;PARTIAL\e]52;c;aGVsbG8=\a' previously dispatched OSC 7 with garbage and starved OSC 52 entirely; now OSC 7 is recognized as partial and skipped, OSC 52 dispatches normally. Regression test added. - Rename `OscEntry.term' -> `OscEntry.terminator' to remove the visual collision with `Terminal *term' in dispatchPostWriteOscs. - Clarify the `dispatchPostWriteOscs' docstring: it handles the four post-vtWrite OSCs; color queries use the same iterator from a different call site. Redraw hash gate - Skip `computeFirstScrollbackRowHash' at end-of-redraw when no writes happened this frame and the start-of-redraw rotation check didn't populate `cached_row0_hash'. Row 0 can't have moved, so the stored hash is still current. Covers cursor-only and idle-timer redraws. Naming and comments - Rename `last_wrote_cr' -> `last_input_was_cr': the field names the input-stream property, not a normalizer implementation detail that might change underneath it. - `loadRawCell' docstring notes the memoize assumes idempotent fetch (true today in libghostty). README - Refresh the 5 MB PTY throughput table: ghostel 70 -> 87 MB/s plain, 56 -> 64 MB/s URL-heavy. Other backends within noise.

dakra added 5 commits April 17, 2026 16:26

dakra requested a review from Copilot April 17, 2026 15:31

Copilot started reviewing on behalf of dakra April 17, 2026 15:32 View session

Copilot AI reviewed Apr 17, 2026

View reviewed changes

dakra force-pushed the perf-and-cleanup-ultrareview branch 2 times, most recently from f2d21c5 to c46c7c0 Compare April 17, 2026 16:47

dakra force-pushed the perf-and-cleanup-ultrareview branch from c46c7c0 to 1729f24 Compare April 17, 2026 19:52

dakra merged commit 1729f24 into main Apr 17, 2026
16 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Perf + cleanup: unified OSC scanner, streaming CRLF, smaller render hot path#121

Perf + cleanup: unified OSC scanner, streaming CRLF, smaller render hot path#121
dakra merged 6 commits intomainfrom
perf-and-cleanup-ultrareview

dakra commented Apr 17, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 17, 2026

Uh oh!

dakra Apr 17, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

dakra commented Apr 17, 2026

Summary

Bench delta (make bench-quick, ±10% noise)

Not done (originally in the review, rejected after deeper look)

Test plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

dakra Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Bench delta (`make bench-quick`, ±10% noise)