Skip to content

Perf + cleanup: unified OSC scanner, streaming CRLF, smaller render hot path#121

Merged
dakra merged 6 commits intomainfrom
perf-and-cleanup-ultrareview
Apr 17, 2026
Merged

Perf + cleanup: unified OSC scanner, streaming CRLF, smaller render hot path#121
dakra merged 6 commits intomainfrom
perf-and-cleanup-ultrareview

Conversation

@dakra
Copy link
Copy Markdown
Owner

@dakra dakra commented Apr 17, 2026

Summary

Five self-contained, individually-tested perf and cleanup commits from an architectural review. Each was benchmarked against main before moving on.

  • Unify OSC scanning into one pass — every PTY write used to get scanned five times (one scanner per of OSC 7 / 51 / 52 / 133 / 4-10-11). Replaced with a single OscIterator that yields (code, payload, term); post-vtWrite dispatch handles 7/51/52/133 in document order.
  • Stream CRLF normalization — the old path allocated up to 131 KB of stack scratch (with silent-truncation fallback on heap alloc failure) and did two passes over the input. Now streams directly into libghostty's VT parser — zero allocation, no truncation risk.
  • Share raw-cell fetch in buildRowContent — a cell that hit both the in-prompt branch and the empty-grapheme branch used to pay for cells_get(RAW) twice. Lazy tri-state loader dedupes without penalizing cells that need neither.
  • Reuse scrollback row-0 hash within a redrawredraw() was calling computeFirstScrollbackRowHash twice in the common case. Cache the start-of-redraw value and reuse at end; invalidate only on trim / rotation-erase.
  • Consolidate module-load paths — two near-duplicate loaders (top-level block + ghostel--load-module) merged into one, with a prompt-user flag selecting between warning (load-time) and user-error (interactive) failure modes.

Bench delta (make bench-quick, ±10% noise)

scenario before after delta
engine/plain/ghostel-incr 65.8 MB/s 83.6 MB/s +27%
engine/styled/ghostel-incr 60.8 MB/s 79.4 MB/s +31%
engine/unicode/ghostel-incr 40.1 MB/s 47.7 MB/s +19%
stream/ghostel-incr 63.2 MB/s 75.9 MB/s +20%
pty/plain/ghostel-incr 26.1 MB/s 27.6 MB/s noise (I/O-bound)

Most of the engine/stream win is commit #1 alone; the remaining commits compound small improvements on top.

Not done (originally in the review, rejected after deeper look)

  • Remove ghostel--raw-key-sequence fallback — turned out to be defensive, not dead. test/ghostel-test.el:4503 explicitly exercises it, and libghostty can legitimately return SUCCESS with 0 bytes written.
  • Marker-based scroll preservation — markers can't replace the content-key system. The real problem is that Emacs autonomously clamps window-start to point-min when the buffer shrinks; content-keys recover across that. A marker-based substitution would still need the same content-search fallback, adding a code path rather than removing one.

Test plan

  • make -j4 all — 30 + 68 + 126 = 224 tests green at every commit
  • make bench-quick — numbers above
  • Manual smoke: open a ghostel, run cat large_log, scroll back, resize window
  • Manual smoke: run a program that uses OSC 7 / 51 / 52 / 133 markers (e.g. the bundled shell integration)

dakra added 5 commits April 17, 2026 16:26
Five separate scanners were each sweeping the PTY input to find their
own OSC code (color queries, PWD, 51/E, 52, 133), doing O(5N) work per
write.  Replace them with a single `OscIterator` that yields
`(code, payload, term)` for any well-formed OSC, and one
`dispatchPostWriteOscs` that handles codes 7/51/52/133 in document
order.  Colour queries (pre-vtWrite) use the same iterator.

Drop the now-unused OscScanner, findOscTerminator, and parseDecimal
helpers.

Engine micro-benchmarks on bulk input improve ~20-28%:
  plain:   65.8 → 81.5 MB/s
  styled:  60.8 → 77.7 MB/s
  unicode: 40.1 → 48.3 MB/s
  stream:  63.2 → 73.3 MB/s
PTY-path numbers are process-I/O-bound and unchanged within noise.
The old path allocated up to 131 KB of stack scratch (plus heap
fallback with a silent-truncation failure mode) and did two passes
over the input: one to count bare \n, one to copy-and-insert \r.

Replace with a single streaming pass that writes maximal raw segments
directly to libghostty's VT parser via vtWrite and emits "\r\n" inline
at each bare \n.  libghostty's state machine already handles arbitrary
chunking (it's how the filter feeds data anyway), so no scratch buffer
is needed.

Zero allocation, no truncation risk, no behaviour change.
The per-cell loop was issuing two independent
cells_get(RAW) calls: one inside the in-prompt branch (for
semantic content) and one inside the empty-grapheme branch (for
the wide-spacer check).  A cell that hit both — empty prompt
padding — paid for the fetch twice.

Introduce a small tri-state loader that memoizes the fetch per
cell.  The first branch to need it pays; the second reuses.
Cells that need neither (the common case for content rows outside
the prompt region) still pay nothing.

No behaviour change.
redraw() used to call computeFirstScrollbackRowHash twice in the
common active-shell case: once at the top for rotation detection,
once at the bottom to stash the baseline for the next redraw's
check.  Each call scrolls libghostty's viewport to the top,
re-runs render_state_update, and scrolls back — not free.

Cache the start-of-redraw hash when rotation was not detected and
reuse it at end-of-redraw.  Invalidate on the two events that
actually shift scrollback row 0 within a single redraw: rotation
(erase + rebuild) and trim (top rows deleted).  Promotion and
insert-at-tail append to the scrollback bottom, so row 0 is
unchanged and the cache is still valid.

Bench is insensitive (each iteration rebuilds the terminal from
scratch, so scrollback is rarely populated on the first redraw);
savings show up in sustained real-world sessions.
The top-level load block at file-load time and the later
`ghostel--load-module' helper did near-identical work with subtly
different behaviour: different guards (featurep vs fboundp),
different directory-resolution orders, and different failure modes
(display-warning vs user-error).

Merge both into a single `ghostel--load-module' defined before the
top-level call site.  A `prompt-user' flag chooses between the
warning (load-time, keep the file usable) and user-error
(interactive command, abort the flow) failure modes.  Directory
resolution now tries load-file-name, then locate-library, then
buffer-file-name — covering the three distinct contexts (load,
post-load-interactive, eval-buffer) that each path had to handle
individually.
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR applies a set of performance-focused refactors across the Zig native module and the Emacs Lisp integration to reduce redundant scanning/work in hot paths (PTY input handling and redraw/rendering), while consolidating module-loading behavior.

Changes:

  • Replace multi-pass OSC scanning with a single OscIterator pass and centralized dispatch for OSC 7/51/52/133.
  • Stream CRLF normalization directly into libghostty VT writes (removing large scratch buffering and extra passes).
  • Reduce redraw/render overhead by caching per-cell RAW handles in buildRowContent and reusing the first-scrollback-row hash within a redraw; consolidate module-load paths in ghostel.el.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File Description
src/render.zig Adds a per-cell RAW handle memoizer to avoid duplicate cells_get(RAW) calls; caches row-0 scrollback hash across a redraw to avoid redundant recomputation.
src/module.zig Streams CRLF normalization into vtWrite, introduces a unified OSC iterator/scanner, and centralizes post-write OSC dispatch.
ghostel.el Consolidates module load logic into ghostel--load-module with load-time warning vs interactive user-error behavior.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/module.zig
Comment on lines +160 to 170
var seg_start: usize = 0;
var prev_was_cr: bool = false;
for (raw, 0..) |ch, i| {
if (ch == '\n' and !prev_was_cr) {
if (i > seg_start) term.vtWrite(raw[seg_start..i]);
term.vtWrite("\r\n");
seg_start = i + 1;
prev_was_cr = false;
} else {
prev_was_cr = (ch == '\r');
}
Copy link

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CRLF normalization only tracks prev_was_cr within this single fnWriteInput call. If Emacs delivers a CRLF pair split across two ghostel--write-input calls (chunk A ends with \r, chunk B begins with \n), this logic will treat the leading \n as “bare” and insert an extra \r, producing \r\r\n in libghostty. Consider persisting a last_write_ended_with_cr flag on Terminal (or similar) so prev_was_cr is initialized from prior state and updated based on the final byte written.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch — fixed in 3b27eb9. Persisted last_wrote_cr on the Terminal struct so a CRLF pair split across two ghostel--write-input calls no longer gets double-normalized to \r\r\n. Cleared alongside the other resize-reset state. Added ghostel-test-crlf-split-across-writes as a regression test.

@dakra dakra force-pushed the perf-and-cleanup-ultrareview branch 2 times, most recently from f2d21c5 to c46c7c0 Compare April 17, 2026 16:47
Rolling up all post-initial-push fixes from CI, Copilot review, and
the second-round review agent into one commit.

Loader & tests
- Honour `(fboundp 'ghostel--new)' in the unified module loader so
  `cl-letf' stubs in the pure-Elisp test suite short-circuit as they
  did before the consolidation.  Restores the title-tracking tests
  on the elisp-only CI job.

CRLF across calls
- Persist `last_input_was_cr' on the Terminal struct so a CRLF pair
  split across two `ghostel--write-input' calls isn't double-
  normalized into \r\r\n.  Reset in `resize' alongside the other
  reset state.  Regression tests: canonical split, empty chunk
  between CR and LF, standalone CR followed by complete CRLF.

OSC iterator correctness
- Treat a following `\e]' introducer as a payload boundary so one
  OSC missing its terminator doesn't cannibalize the next OSC's
  bytes as its own payload.  Input `\e]7;PARTIAL\e]52;c;aGVsbG8=\a'
  previously dispatched OSC 7 with garbage and starved OSC 52
  entirely; now OSC 7 is recognized as partial and skipped, OSC 52
  dispatches normally.  Regression test added.
- Rename `OscEntry.term' -> `OscEntry.terminator' to remove the
  visual collision with `Terminal *term' in dispatchPostWriteOscs.
- Clarify the `dispatchPostWriteOscs' docstring: it handles the
  four post-vtWrite OSCs; color queries use the same iterator from
  a different call site.

Redraw hash gate
- Skip `computeFirstScrollbackRowHash' at end-of-redraw when no
  writes happened this frame and the start-of-redraw rotation
  check didn't populate `cached_row0_hash'.  Row 0 can't have
  moved, so the stored hash is still current.  Covers cursor-only
  and idle-timer redraws.

Naming and comments
- Rename `last_wrote_cr' -> `last_input_was_cr': the field names
  the input-stream property, not a normalizer implementation
  detail that might change underneath it.
- `loadRawCell' docstring notes the memoize assumes idempotent
  fetch (true today in libghostty).

README
- Refresh the 5 MB PTY throughput table: ghostel 70 -> 87 MB/s
  plain, 56 -> 64 MB/s URL-heavy.  Other backends within noise.
@dakra dakra force-pushed the perf-and-cleanup-ultrareview branch from c46c7c0 to 1729f24 Compare April 17, 2026 19:52
@dakra dakra merged commit 1729f24 into main Apr 17, 2026
16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants