perf: optimize parseInt with 4-digits-at-a-time parsing#897
Open
He-Pin wants to merge 1 commit into
Open
Conversation
Inspired by jsoniter-scala's SWAR digit parsing technique, process 4 decimal digits at a time instead of 1. This reduces the number of loop iterations and charAt calls by 4x for the common case. The optimization validates all 4 digits before combining them, falling through to the scalar loop if any non-digit is found. Uses Double accumulator throughout to handle numbers beyond Long range. Benchmark results (parseInt): - Before: jrsonnet 1.97x faster - After: jrsonnet 1.27x faster (36% gap reduction) Reference: plokhotnyuk/jsoniter-scala@df92601
He-Pin
added a commit
to He-Pin/sjsonnet
that referenced
this pull request
Jun 6, 2026
Motivation: The bulk numeric array parser (previous commit) used Double.parseDouble(data.substring()) for each element, creating a substring allocation and invoking the full double parser even for simple integers. Modification: - Parse simple integers directly using 4-digits-at-a-time technique (inspired by jsoniter-scala/PR databricks#897), avoiding substring allocation and Double.parseDouble overhead entirely - Use Val.cachedNum for values 0-255, reusing pre-allocated instances instead of creating new Val.Num objects - Float/exponent numbers still fall back to Double.parseDouble Result: Native A/B (member): 7.1ms → 5.9ms (-17.4%). Combined with bulk parser: member gap vs jrsonnet 1.97x → 1.42x. Also improves base64_byte_array: 11.3ms → 10.4ms (-7.9%).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
The
std.parseIntfunction was 1.97x slower than jrsonnet (Rust implementation) on Scala Native. The main bottleneck was the per-character digit parsing loop.Key Design Decision
Inspired by jsoniter-scala's SWAR digit parsing technique, process 4 decimal digits at a time instead of 1. This reduces the number of loop iterations and
charAtcalls by 4x for the common case.Modification
Changed
sjsonnet/src/sjsonnet/stdlib/StringModule.scala:Doubleaccumulator throughout to handle numbers beyond Long rangeBenchmark Results
Scala Native vs jrsonnet (hyperfine)
JMH (JVM)
Baseline JMH benchmarks are stable; the change benefits all platforms.
Analysis
The 4-digits-at-a-time approach reduces loop overhead by processing 4 digits per iteration instead of 1. For the benchmark input "-123949595" (9 digits), this means 2 iterations instead of 9.
The optimization validates all 4 digits before combining them, falling through to the scalar loop if any non-digit is found. This ensures correct error reporting for invalid inputs.
References
Result
./mill __.test)./mill __.reformat)