-
Notifications
You must be signed in to change notification settings - Fork 11
🤖 Migrate syntax highlighting to Shiki for performance #342
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Replace per-line Prism highlighting with chunk-based Shiki highlighting to fix garbled output on incomplete syntax (unclosed strings, brackets, etc.). ## Changes ### New highlighting infrastructure - `src/utils/highlighting/shikiHighlighter.ts`: Lazy-loaded Shiki singleton - `src/utils/highlighting/diffChunking.ts`: Groups consecutive diff lines by type - `src/utils/highlighting/highlightDiffChunk.ts`: Async chunk highlighting with fallback ### Updated components - `DiffRenderer`: Now uses `useHighlightedDiff` hook for async chunk highlighting - `SelectableDiffRenderer`: Builds lineData from highlighted chunks - Both show loading state during highlighting - Preserve all existing diff styling and selection logic ### Why chunk-based? - **More context**: Highlighter sees larger code blocks instead of isolated lines - **Better accuracy**: Incomplete syntax in single lines no longer breaks tokenization - **Fallback support**: Gracefully handles unsupported languages and errors - **Performance**: Fewer highlight calls (~10 chunks vs ~500 lines) ### Dependencies - Added: `shiki@^3.13.0` (modern, TextMate-based, VS Code-compatible) - Kept: `react-syntax-highlighter` (still used by MarkdownComponents) ### Testing - Added unit tests for chunking logic (7 tests) - Added tests for highlighting fallback (5 tests) - All existing tests pass (673 pass, 1 skip) - Mocked Shiki in tests to avoid ESM/WASM issues Generated with `cmux`
Cache the Promise itself instead of awaiting it in the singleton check. This prevents multiple concurrent calls from each creating their own Shiki instance. Before: highlighterInstance ??= await createHighlighter(...) - Race condition: Multiple calls see null before any completes - Result: 270+ instances created After: Store and return the Promise directly - First call creates and caches the Promise - All concurrent calls await the same Promise - Result: Only 1 instance created Generated with `cmux`
The previous regex-based extraction used non-greedy matching (.*?) which stopped at the first </span> encountered, incorrectly splitting syntax- highlighted lines that contain nested token spans. Fixed by: - Split on newlines first (Shiki separates line spans with \n) - Find opening <span class="line"> tag - Find LAST </span> tag (which closes the line wrapper) - Extract everything between them This correctly handles nested spans like: <span class="line"> <span style="color:#DCDCAA">const</span> <span style="color:#DCDCAA">x</span> </span> Tests: - Added 6 new tests with realistic nested span structure - All 680 tests pass - Typecheck and lint clean
**Bug Fix:** Review notes were sending empty line content to chat because lineData.content was set to empty string but still used when building review notes. **Root Causes:** 1. Type design flaw: content field marked as "not used" but still accessed 2. No test coverage for review note feature 3. TypeScript can't catch "wrong value" bugs (empty string is valid string) **Solution:** 1. Remove content field from lineData type entirely 2. Extract line content directly from lines array when building review notes 3. Add integration tests that verify line extraction logic **Enhancement:** Add line elision for long review notes: - ≤3 lines: show all lines - >3 lines: show first line, "(n lines omitted)", last line This improves chat readability for large code selections. **Tests Added:** - 9 new integration tests covering: - Line content extraction (add/remove/context) - Multiline selections - Review note formatting - Elision logic for various line counts (3, 4, 5, 11 lines) All 690 tests pass, typecheck clean, lint clean.
**Problem:** Tradeoff between fast init (19 langs = 53ms) vs full support (306 langs = 1.6s). **Solution: Lazy Load Languages On-Demand** - Init with empty language list: ~52ms (same as before) - Load languages when first used: ~0.1-5ms per language - Supports all 306 bundled languages without upfront cost - Race-safe: concurrent loads are idempotent (tested) **Implementation:** 1. shikiHighlighter.ts: Init with langs: [] 2. highlightDiffChunk.ts: Auto-load language before highlighting 3. Graceful fallback for unsupported languages **Benefits:** ✅ Fast startup (52ms vs 1.6s for all languages) ✅ All 306 languages supported ✅ Only load what's used (most repos use <10 languages) ✅ No bundle size increase ✅ Race-safe (verified with concurrent tests) **Tests Added:** - Load language on first use - Handle unsupported language gracefully - Concurrent highlighting of same language (race test) All 693 tests pass, typecheck clean, lint clean.
Previously loaded 19 languages upfront (52ms init). Now loads all languages on-demand (<5ms per language). Performance: - Init: 13ms (was 52ms) - First TypeScript load: 4ms - Subsequent loads: 1-2ms All 306 Shiki languages available without startup penalty.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
Replace per-line Prism highlighting with chunk-based Shiki highlighting, adding lazy language loading and fixing review note bugs.
Key Changes
1. Chunk-Based Highlighting (Performance)
Old approach highlighted each line individually with Prism, causing 100-400ms for 200-line diffs. New approach groups consecutive lines by type (add/remove/context) and highlights each chunk once with Shiki, reducing time to 10-75ms (3-10x faster).
2. Lazy Language Loading
Shiki supports 306 languages vs Prism's 19. Languages load on-demand when first used - initialization takes 13ms, with 1-4ms per new language on first use. All 306 languages available without startup penalty.
3. Fixed Review Note Bug
Review notes were sending empty content to chat because the
contentfield was set to empty string but still used. Now extracts content from the lines array. Also added line elision for long selections (shows first/last lines with "..." for >3 lines).4. Fixed HTML Line Extraction
Regex for extracting lines from Shiki's HTML output broke on nested spans. Replaced with split-on-newlines approach that finds the last closing tag properly.
Implementation
shikiHighlighter.ts: Lazy-loaded singleton with race-safe initializationdiffChunking.ts: Groups consecutive diff lines by typehighlightDiffChunk.ts: Async chunk highlighting with on-demand language loadingDiffRenderer.tsx: Fixed review note content extraction, added line elisionTesting
Performance
Before (Prism, per-line):
After (Shiki, chunk-based):
Dependencies
Kept
react-syntax-highlighterfor chat message code blocks in MarkdownComponents. Diff rendering now uses Shiki directly.Generated with
cmux