Skip to content

Add profiling and improve parsing performance#105

Merged
DecimalTurn merged 19 commits intolatestfrom
perf-impr2
Feb 14, 2026
Merged

Add profiling and improve parsing performance#105
DecimalTurn merged 19 commits intolatestfrom
perf-impr2

Conversation

@DecimalTurn
Copy link
Copy Markdown
Owner

@DecimalTurn DecimalTurn commented Feb 13, 2026

This pull request introduces performance improvements and a profiling script.
The main optimizations are in string parsing and bare key validation, improvements to line/column location finding and some micro-optimizations for array merging and iterator completion.

Replace Cursor/utf16Iterator generator with direct index-based scanning:
- Eliminate per-character IteratorResult object allocation (~14% CPU)
- Replace IS_WHITESPACE regex with charCode comparisons on hot path
- Replace string concatenation (raw += char) with input.slice(start, end)
- Optimize checkThree: backward backslash walk instead of input.slice(0,n).match()
- Inline specialCharacter() — yield token objects directly in main loop

549 unit tests + 855 TOML spec compliance tests pass.
Parse speedup: ~2x across all file sizes (spec example: 9x→4.7x vs smol-toml).
parse-string.ts:
- Replace 7-pass regex pipeline with single-pass state machine
- Handle all escape sequences (\b,\t,\n,\f,\r,\",\\,\e,\uXXXX,\UXXXXXXXX,\xHH)
  in one traversal instead of separate regex passes
- Fold multiline preprocessing (line-ending backslash, newline escaping,
  quote escaping) into the same single pass
- Use hex lookup table for fast hex digit parsing
- Add fast path: skip processing entirely when no backslash present
- Properly reject \<space> without newline in multiline strings

location.ts:
- Replace Array.findIndex (linear scan) with binary search in findPosition
- Lines array is already sorted; O(T*log L) instead of O(T*L)

549 unit tests + 855 TOML spec compliance tests pass.
Parse improvement vs smol-toml:
  spec example: 4.7x -> 3.4x
  inline-arrays: 12.8x -> 5.3x
  inline-tables: 5.4x -> 4.3x
The generic traverse visits every AST node (Key, String, Integer, etc.)
even though toJS only cares about Table, TableArray, and KeyValue.
The specialized walk iterates only the relevant nodes and skips Comments
and all Value sub-nodes entirely, eliminating traverseNode dispatch,
visitor lookup, and type-check overhead for ~60% of nodes.
…te findLines pass

Instead of scanning the entire input upfront to find newline positions
(createLocate → findLines), record newlines as the tokenizer encounters
them in its main loop and multiline string scanner. findPosition is
updated to handle the missing end-sentinel gracefully. This eliminates
one full O(n) pass through the input on every parse.
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces significant performance improvements to the TOML parser through algorithmic optimizations and micro-optimizations across multiple modules. The main focus is on reducing allocations, replacing regex with character code comparisons, and optimizing hot-path operations. A comprehensive profiling script is also added to help identify performance bottlenecks.

Changes:

  • Refactored tokenizer from cursor-based to index-based scanning with incremental line indexing
  • Optimized string parsing with single-pass unescaping and hex lookup tables
  • Replaced generic AST traversal with specialized inline walk for faster processing
  • Added micro-optimizations for bare key validation, array operations, and iterator completion

Reviewed changes

Copilot reviewed 9 out of 10 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
src/tokenizer.ts Replaced cursor-based scanning with index-based approach, added character code constants, implemented incremental line indexing
src/to-js.ts Removed traverse dependency, added specialized inline walk, optimized validateKey with pre-computed candidates
src/parse-toml.ts Added isBareKeyCode function using charCode ranges, removed unnecessary cloneLocation calls, replaced merge with direct array push
src/parse-string.ts Implemented single-pass unescapeBasicString with hex lookup table, removed regex-based approaches
src/location.ts Replaced linear search with binary search in findPosition, optimized findLines with charCode comparisons
src/cursor.ts Added frozen DONE sentinel object to avoid repeated allocations
src/utils.ts Optimized has() function to use in operator instead of hasOwnProperty.call
benchmark/profile.mjs Added comprehensive profiling script with V8 CPU profiling integration
package.json Added "profile" npm script
.gitignore Added *.cpuprofile to ignore profiling outputs

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@DecimalTurn DecimalTurn marked this pull request as ready for review February 14, 2026 06:20
@DecimalTurn DecimalTurn merged commit 3187316 into latest Feb 14, 2026
2 checks passed
@DecimalTurn DecimalTurn changed the title Add profiling and improve performance Add profiling and improve parsing performance Feb 14, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants