Skip to content

perf: optimize identity function composition#826

Merged
stephenamar-db merged 1 commit into
databricks:masterfrom
He-Pin:perf/native-lazy-array-stack
May 11, 2026
Merged

perf: optimize identity function composition#826
stephenamar-db merged 1 commit into
databricks:masterfrom
He-Pin:perf/native-lazy-array-stack

Conversation

@He-Pin
Copy link
Copy Markdown
Contributor

@He-Pin He-Pin commented May 7, 2026

Motivation:

bench/resources/cpp_suite/bench.07.jsonnet builds a lazy array of identity-equivalent composed functions. Current master evaluates the deep function/lazy chain normally; it is slow on JVM and still overflows Scala Native even with --max-stack 100000.

Key Design Decision

Recognize only statically safe identity-equivalent unary functions and keep the runtime check conservative. Direct identity (function(x) x) and the exact non-tailstrict self-composition shape (function(x) g(g(...g(x)))) can be elided only after the captured g is proven effectively identity. Recursive composition cycles are treated as non-identity so normal max-stack behavior is preserved.

Modification:

  • Tag identity/self-composition function shapes in StaticOptimizer.
  • Forward the tags from Evaluator into Val.Func instances.
  • Add an iterative cached isEffectivelyIdentity probe with explicit Unknown/Yes/No/InProgress byte states.
  • Short-circuit unary function application when the callee is effectively identity.
  • Preserve laziness, tailstrict forcing, non-function errors, non-identity composition, and recursive-cycle behavior with evaluator tests.

Benchmark Results:

JMH on JDK 21, ./mill -j 1 bench.runRegressions bench/resources/cpp_suite/bench.07.jsonnet, lower ms/op is better; ops/ms = 1 / ms_per_op, higher is better.

Case master ms/op PR ms/op master ops/ms PR ops/ms Delta
cpp_suite/bench.07.jsonnet 2.447 0.039 0.409 25.64 +6173% ops/ms (~62.7x faster)

Scala Native hyperfine, lower is better, comparing locally built Scala Native binary and source-built jrsonnet 0.5.0-pre98.

Target Result
master native StackOverflowError with --max-stack 100000
PR native 5.2 ± 1.1 ms
jrsonnet (-s 1000000) 28.4 ± 1.9 ms
PR vs jrsonnet PR is ~5.46x faster

Analysis:

This optimization intentionally does not make arbitrary function equality assumptions. It only uses static shape tags plus runtime proof of the captured base function's effective identity. Errors and tailstrict are preserved: constructing f2(error ...) remains lazy, calling it forces the original error, and tailstrict still forces eagerly.

References:

Result:

  • ./mill --no-server -j 1 __.reformat && ./mill --no-server -j 1 __.test passed locally.
  • ./mill --no-server -j 1 'sjsonnet.native[3.3.7]'.nativeLink passed locally.
  • PR comments were reviewed; there are no review threads. The prior safety-update comment remains accurate after rebase.

@He-Pin He-Pin force-pushed the perf/native-lazy-array-stack branch from e3e480f to 19fb7ae Compare May 7, 2026 20:50
@He-Pin He-Pin marked this pull request as draft May 7, 2026 20:52
@He-Pin He-Pin force-pushed the perf/native-lazy-array-stack branch from 19fb7ae to 26bedd2 Compare May 8, 2026 05:17
@He-Pin He-Pin marked this pull request as ready for review May 8, 2026 05:17
@He-Pin He-Pin marked this pull request as draft May 8, 2026 06:08
@He-Pin He-Pin force-pushed the perf/native-lazy-array-stack branch from 26bedd2 to 53556cf Compare May 8, 2026 06:09
@He-Pin
Copy link
Copy Markdown
Contributor Author

He-Pin commented May 8, 2026

Safety update pushed in 53556cfd:

  • Added an IdentityChecking state so identity-composition probing detects recursive composition cycles and falls back to normal application instead of spinning in the optimization path.
  • Added self-recursive and object-mutual-recursive regression tests.

Validation:

  • ./mill -i sjsonnet.jvm[3.3.7].test.testOnly sjsonnet.EvaluatorTests
  • ./mill -i sjsonnet.native[3.3.7].test.testOnly sjsonnet.EvaluatorTests
  • ./mill -i bench.runRegressions bench/resources/cpp_suite/bench.07.jsonnet -> 0.039 ms/op
  • ./mill -i __.checkFormat

@He-Pin He-Pin marked this pull request as ready for review May 8, 2026 06:15
@He-Pin He-Pin force-pushed the perf/native-lazy-array-stack branch 3 times, most recently from 8390d28 to 515691b Compare May 8, 2026 06:39
@He-Pin He-Pin marked this pull request as draft May 8, 2026 06:48
@He-Pin He-Pin force-pushed the perf/native-lazy-array-stack branch 4 times, most recently from 6d61e83 to 420a898 Compare May 10, 2026 04:38
Motivation:
bench.07 builds a deep chain of function(x) f(f(x)) over identity functions. Scala Native overflows the stack on this case with --max-stack 100000, and the JVM path creates tens of thousands of lazy values and function calls.

Modification:
Add an apply1 fast path for unary identity functions and recognize the exact non-tailstrict function(x) f(f(x)) shape. The wrapper preserves laziness, keeps explicit tailstrict eager semantics, and checks identity-composition chains iteratively instead of recursively.

Result:
bench.07 now passes on Scala Native, reduces the JVM debug counters from lazy_created=32786/function_calls=65550 to lazy_created=19/function_calls=16, and reports 0.036 ms/op in the single-case JMH run.
@He-Pin He-Pin force-pushed the perf/native-lazy-array-stack branch from 420a898 to 50403de Compare May 10, 2026 09:07
@He-Pin
Copy link
Copy Markdown
Contributor Author

He-Pin commented May 10, 2026

Rebased onto latest master and refreshed local validation/benchmark data in the PR description. Master still overflows on the Scala Native bench.07 case; the rebased PR passes and remains strongly positive.

@He-Pin He-Pin marked this pull request as ready for review May 10, 2026 09:19
@stephenamar-db stephenamar-db merged commit 957cdf5 into databricks:master May 11, 2026
5 checks passed
@He-Pin He-Pin deleted the perf/native-lazy-array-stack branch May 11, 2026 06:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants