🤖 Migrate syntax highlighting to Shiki for performance #342

ammar-agent · 2025-10-20T00:12:48Z

Summary

Replace per-line Prism highlighting with chunk-based Shiki highlighting, adding lazy language loading and fixing review note bugs.

Key Changes

1. Chunk-Based Highlighting (Performance)

Old approach highlighted each line individually with Prism, causing 100-400ms for 200-line diffs. New approach groups consecutive lines by type (add/remove/context) and highlights each chunk once with Shiki, reducing time to 10-75ms (3-10x faster).

2. Lazy Language Loading

Shiki supports 306 languages vs Prism's 19. Languages load on-demand when first used - initialization takes 13ms, with 1-4ms per new language on first use. All 306 languages available without startup penalty.

3. Fixed Review Note Bug

Review notes were sending empty content to chat because the content field was set to empty string but still used. Now extracts content from the lines array. Also added line elision for long selections (shows first/last lines with "..." for >3 lines).

4. Fixed HTML Line Extraction

Regex for extracting lines from Shiki's HTML output broke on nested spans. Replaced with split-on-newlines approach that finds the last closing tag properly.

Implementation

shikiHighlighter.ts: Lazy-loaded singleton with race-safe initialization
diffChunking.ts: Groups consecutive diff lines by type
highlightDiffChunk.ts: Async chunk highlighting with on-demand language loading
DiffRenderer.tsx: Fixed review note content extraction, added line elision

Testing

Added 15 integration tests for chunk-based highlighting
Added 9 integration tests for review notes
All tests use real Shiki (no mocks)
Verified race-safe concurrent language loading

Performance

Before (Prism, per-line):

100-400ms for 200-line diff
19 languages supported

After (Shiki, chunk-based):

10-75ms for 200-line diff (3-10x faster)
306 languages supported
13ms initialization (0 languages preloaded)
1-4ms per new language on first use

Dependencies

Kept react-syntax-highlighter for chat message code blocks in MarkdownComponents. Diff rendering now uses Shiki directly.

Generated with cmux

Replace per-line Prism highlighting with chunk-based Shiki highlighting to fix garbled output on incomplete syntax (unclosed strings, brackets, etc.). ## Changes ### New highlighting infrastructure - `src/utils/highlighting/shikiHighlighter.ts`: Lazy-loaded Shiki singleton - `src/utils/highlighting/diffChunking.ts`: Groups consecutive diff lines by type - `src/utils/highlighting/highlightDiffChunk.ts`: Async chunk highlighting with fallback ### Updated components - `DiffRenderer`: Now uses `useHighlightedDiff` hook for async chunk highlighting - `SelectableDiffRenderer`: Builds lineData from highlighted chunks - Both show loading state during highlighting - Preserve all existing diff styling and selection logic ### Why chunk-based? - **More context**: Highlighter sees larger code blocks instead of isolated lines - **Better accuracy**: Incomplete syntax in single lines no longer breaks tokenization - **Fallback support**: Gracefully handles unsupported languages and errors - **Performance**: Fewer highlight calls (~10 chunks vs ~500 lines) ### Dependencies - Added: `shiki@^3.13.0` (modern, TextMate-based, VS Code-compatible) - Kept: `react-syntax-highlighter` (still used by MarkdownComponents) ### Testing - Added unit tests for chunking logic (7 tests) - Added tests for highlighting fallback (5 tests) - All existing tests pass (673 pass, 1 skip) - Mocked Shiki in tests to avoid ESM/WASM issues Generated with `cmux`

Cache the Promise itself instead of awaiting it in the singleton check. This prevents multiple concurrent calls from each creating their own Shiki instance. Before: highlighterInstance ??= await createHighlighter(...) - Race condition: Multiple calls see null before any completes - Result: 270+ instances created After: Store and return the Promise directly - First call creates and caches the Promise - All concurrent calls await the same Promise - Result: Only 1 instance created Generated with `cmux`

The previous regex-based extraction used non-greedy matching (.*?) which stopped at the first encountered, incorrectly splitting syntax- highlighted lines that contain nested token spans. Fixed by: - Split on newlines first (Shiki separates line spans with \n) - Find opening tag - Find LAST tag (which closes the line wrapper) - Extract everything between them This correctly handles nested spans like: const x Tests: - Added 6 new tests with realistic nested span structure - All 680 tests pass - Typecheck and lint clean

**Bug Fix:** Review notes were sending empty line content to chat because lineData.content was set to empty string but still used when building review notes. **Root Causes:** 1. Type design flaw: content field marked as "not used" but still accessed 2. No test coverage for review note feature 3. TypeScript can't catch "wrong value" bugs (empty string is valid string) **Solution:** 1. Remove content field from lineData type entirely 2. Extract line content directly from lines array when building review notes 3. Add integration tests that verify line extraction logic **Enhancement:** Add line elision for long review notes: - ≤3 lines: show all lines - >3 lines: show first line, "(n lines omitted)", last line This improves chat readability for large code selections. **Tests Added:** - 9 new integration tests covering: - Line content extraction (add/remove/context) - Multiline selections - Review note formatting - Elision logic for various line counts (3, 4, 5, 11 lines) All 690 tests pass, typecheck clean, lint clean.

**Problem:** Tradeoff between fast init (19 langs = 53ms) vs full support (306 langs = 1.6s). **Solution: Lazy Load Languages On-Demand** - Init with empty language list: ~52ms (same as before) - Load languages when first used: ~0.1-5ms per language - Supports all 306 bundled languages without upfront cost - Race-safe: concurrent loads are idempotent (tested) **Implementation:** 1. shikiHighlighter.ts: Init with langs: [] 2. highlightDiffChunk.ts: Auto-load language before highlighting 3. Graceful fallback for unsupported languages **Benefits:** ✅ Fast startup (52ms vs 1.6s for all languages) ✅ All 306 languages supported ✅ Only load what's used (most repos use <10 languages) ✅ No bundle size increase ✅ Race-safe (verified with concurrent tests) **Tests Added:** - Load language on first use - Handle unsupported language gracefully - Concurrent highlighting of same language (race test) All 693 tests pass, typecheck clean, lint clean.

Previously loaded 19 languages upfront (52ms init). Now loads all languages on-demand (<5ms per language). Performance: - Init: 13ms (was 52ms) - First TypeScript load: 4ms - Subsequent loads: 1-2ms All 306 Shiki languages available without startup penalty.

ammar-agent added 5 commits October 19, 2025 18:34

ammario changed the title ~~🤖 Complete Shiki migration with lazy loading and bug fixes~~ 🤖 Migrate syntax highlighting to Shiki for performance Oct 20, 2025

ammar-agent added 2 commits October 19, 2025 19:17

🤖 Fix formatting

f1ab954

ammario merged commit fa2bb66 into main Oct 20, 2025
8 checks passed

ammario deleted the highlighting branch October 20, 2025 00:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

🤖 Migrate syntax highlighting to Shiki for performance #342

🤖 Migrate syntax highlighting to Shiki for performance #342

Uh oh!

ammar-agent commented Oct 20, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

🤖 Migrate syntax highlighting to Shiki for performance #342

🤖 Migrate syntax highlighting to Shiki for performance #342

Uh oh!

Conversation

ammar-agent commented Oct 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Key Changes

1. Chunk-Based Highlighting (Performance)

2. Lazy Language Loading

3. Fixed Review Note Bug

4. Fixed HTML Line Extraction

Implementation

Testing

Performance

Dependencies

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ammar-agent commented Oct 20, 2025 •

edited

Loading