perf: bulk text block scanner bypasses fastparse per-line overhead#689
Open
He-Pin wants to merge 1 commit intodatabricks:masterfrom
Open
perf: bulk text block scanner bypasses fastparse per-line overhead#689He-Pin wants to merge 1 commit intodatabricks:masterfrom
He-Pin wants to merge 1 commit intodatabricks:masterfrom
Conversation
Replace the per-line fastparse combinator loop in tripleBarStringBody with a custom bulk scanner that directly accesses the underlying String data. For a 600KB text block with ~8000 lines, this eliminates ~8000 intermediate String allocations and the Seq[String] + mkString join overhead. Key changes: - tripleBarStringBodyBulk: Custom scanner using IndexedParserInput.data for zero-copy StringBuilder.append(CharSequence, start, end) instead of fastparse's repX combinator which creates one String per line. - Hybrid approach: first line still uses fastparse for proper error messages, subsequent lines use the bulk scanner. - constructString: Skip string interning for strings >1024 chars (avoids expensive hashCode computation on 600KB strings), single-string fast path, pre-sized StringBuilder for multi-line blocks. - Falls back to original fastparse path for non-IndexedParserInput. JMH large_string_template: 2.251 → 1.762 ms/op (-21.7%) Native large_string_template: ~37% faster Upstream: explored in he-pin/sjsonnet jit branch
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
Large text blocks (
|||...|||) in Jsonnet are parsed line-by-line using fastparse combinators. For a 600KB text block with ~8000 lines (e.g.,large_string_template.jsonnet), this creates ~8000 intermediateStringobjects via fastparse.!captures, accumulates them in aSeq[String], and then joins them withmkString. This overhead dominates parsing time for large text blocks.Key Design Decision
Replace the per-line fastparse combinator loop with a custom bulk scanner that directly accesses the underlying
StringfromIndexedParserInput.data. Instead of creating oneStringper line, we use a singleStringBuilderwithappend(CharSequence, start, end)for zero-copy bulk appends. The first line is still parsed with fastparse to preserve error message quality for malformed input.A hybrid approach is used:
StringaccessIndexedParserInputAdditional optimizations in
constructString:mkStringwhen only one string segmenthashCodecomputation on 600KB strings that are unlikely to repeatModification
Parser.scala:tripleBarStringBody: delegates totripleBarStringBodyBulkafter first linetripleBarStringBodyBulk(new): custom scanner usingIndexedParserInput.datawith:String.regionMatchesfor zero-allocation indent matchingStringBuilder.append(CharSequence, start, end)for zero-copy line extractionconstructString: single-string fast path, pre-sized StringBuilder, interning thresholdBenchmark Results
JMH (JVM, Scala 3.3.7)
All 35 benchmarks checked, zero regressions.
Native (Scala Native, hyperfine --warmup 5 --runs 20)
Native improvement: -18% on large_string_template (17.3ms → 14.2ms)
The remaining gap vs jrsonnet is primarily:
Analysis
The optimization targets the parsing phase specifically. The 600KB text block benchmark spends significant time in per-line
Stringallocation andSeqmanagement. By replacing ~8000 individual string captures with a singleStringBuilderbulk scan, we eliminate:Stringobject allocations (one per line)Seq[String]growth and management overheadmkStringjoin of ~8000 stringshashCodecomputation on the 600KB result string (interning skip)The
regionMatchesandStringBuilder.append(CharSequence, start, end)APIs enable zero-copy processing where the sourceStringdata is read directly without intermediate allocations.References
Result