Add profiling and improve parsing performance#105
Merged
DecimalTurn merged 19 commits intolatestfrom Feb 14, 2026
Merged
Conversation
Replace Cursor/utf16Iterator generator with direct index-based scanning: - Eliminate per-character IteratorResult object allocation (~14% CPU) - Replace IS_WHITESPACE regex with charCode comparisons on hot path - Replace string concatenation (raw += char) with input.slice(start, end) - Optimize checkThree: backward backslash walk instead of input.slice(0,n).match() - Inline specialCharacter() — yield token objects directly in main loop 549 unit tests + 855 TOML spec compliance tests pass. Parse speedup: ~2x across all file sizes (spec example: 9x→4.7x vs smol-toml).
parse-string.ts: - Replace 7-pass regex pipeline with single-pass state machine - Handle all escape sequences (\b,\t,\n,\f,\r,\",\\,\e,\uXXXX,\UXXXXXXXX,\xHH) in one traversal instead of separate regex passes - Fold multiline preprocessing (line-ending backslash, newline escaping, quote escaping) into the same single pass - Use hex lookup table for fast hex digit parsing - Add fast path: skip processing entirely when no backslash present - Properly reject \<space> without newline in multiline strings location.ts: - Replace Array.findIndex (linear scan) with binary search in findPosition - Lines array is already sorted; O(T*log L) instead of O(T*L) 549 unit tests + 855 TOML spec compliance tests pass. Parse improvement vs smol-toml: spec example: 4.7x -> 3.4x inline-arrays: 12.8x -> 5.3x inline-tables: 5.4x -> 4.3x
The generic traverse visits every AST node (Key, String, Integer, etc.) even though toJS only cares about Table, TableArray, and KeyValue. The specialized walk iterates only the relevant nodes and skips Comments and all Value sub-nodes entirely, eliminating traverseNode dispatch, visitor lookup, and type-check overhead for ~60% of nodes.
…te findLines pass Instead of scanning the entire input upfront to find newline positions (createLocate → findLines), record newlines as the tokenizer encounters them in its main loop and multiline string scanner. findPosition is updated to handle the missing end-sentinel gracefully. This eliminates one full O(n) pass through the input on every parse.
There was a problem hiding this comment.
Pull request overview
This PR introduces significant performance improvements to the TOML parser through algorithmic optimizations and micro-optimizations across multiple modules. The main focus is on reducing allocations, replacing regex with character code comparisons, and optimizing hot-path operations. A comprehensive profiling script is also added to help identify performance bottlenecks.
Changes:
- Refactored tokenizer from cursor-based to index-based scanning with incremental line indexing
- Optimized string parsing with single-pass unescaping and hex lookup tables
- Replaced generic AST traversal with specialized inline walk for faster processing
- Added micro-optimizations for bare key validation, array operations, and iterator completion
Reviewed changes
Copilot reviewed 9 out of 10 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| src/tokenizer.ts | Replaced cursor-based scanning with index-based approach, added character code constants, implemented incremental line indexing |
| src/to-js.ts | Removed traverse dependency, added specialized inline walk, optimized validateKey with pre-computed candidates |
| src/parse-toml.ts | Added isBareKeyCode function using charCode ranges, removed unnecessary cloneLocation calls, replaced merge with direct array push |
| src/parse-string.ts | Implemented single-pass unescapeBasicString with hex lookup table, removed regex-based approaches |
| src/location.ts | Replaced linear search with binary search in findPosition, optimized findLines with charCode comparisons |
| src/cursor.ts | Added frozen DONE sentinel object to avoid repeated allocations |
| src/utils.ts | Optimized has() function to use in operator instead of hasOwnProperty.call |
| benchmark/profile.mjs | Added comprehensive profiling script with V8 CPU profiling integration |
| package.json | Added "profile" npm script |
| .gitignore | Added *.cpuprofile to ignore profiling outputs |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This pull request introduces performance improvements and a profiling script.
The main optimizations are in string parsing and bare key validation, improvements to line/column location finding and some micro-optimizations for array merging and iterator completion.