Optimize splitPart scalar function to reduce allocation #17660

justahuman1 · 2026-02-07T20:52:19Z

Optimized the splitPart function to use direct string traversal instead of allocating a full String[] array, significantly reducing memory pressure in high-throughput query scenarios.

Performance improvements:

Space complexity: O(n) → O(1)
Time complexity: O(n) for both implementations

Added JMH benchmarks to demonstrate improvements and regressions. The primary regression is in the backward index case so I added splitPartNegativeIdxSingleCharDelim (b4272e2) which increases complexity but makes the kernel much faster

Addresses #17362

JMH Results

Yeah, the common cases are more than 10x faster 🚀 . The original results showed a regression with negative_index, hence b4272e2. However, this code may be harder to manage so that is the trade-off. The below results are from all commits in the PR

Operation Type	New (ns/op)	Old (ns/op)	Speedup (Old/New)
small_index	777.8	38,722.2	≈ 49.8× faster
large_index	295,474.5	3,867,775.5	≈ 13.1× faster
negative_index	13.2	38,635.4	≈ 2,929× faster
large_negative_index	47,797.4	3,824,985.5	≈ 80× faster
adjacent_delimiters	779.9	3,864.4	≈ 5× faster

Benchmark                                       (_scenario)  Mode  Cnt           Score          Error  Units
BenchmarkSplitPart.testSplitPartNew             small_index  avgt   10         777.797 ±       22.025  ns/op
BenchmarkSplitPart.testSplitPartNew             large_index  avgt   10      295474.514 ±     2876.945  ns/op
BenchmarkSplitPart.testSplitPartNew          negative_index  avgt   10          13.198 ±        2.553  ns/op
BenchmarkSplitPart.testSplitPartNew    large_negative_index  avgt   10       47797.384 ±      465.119  ns/op
BenchmarkSplitPart.testSplitPartNew     adjacent_delimiters  avgt   10         779.944 ±       14.194  ns/op
BenchmarkSplitPart.testSplitPartNew             many_fields  avgt   10        1935.942 ±       59.342  ns/op
BenchmarkSplitPart.testSplitPartNew        multi_char_delim  avgt   10        4281.167 ±      102.780  ns/op
BenchmarkSplitPart.testSplitPartNew  large_multi_char_delim  avgt   10     1160607.923 ±    20648.116  ns/op
BenchmarkSplitPart.testSplitPartOld             small_index  avgt   10       38722.193 ±     1463.000  ns/op
BenchmarkSplitPart.testSplitPartOld             large_index  avgt   10     3867775.470 ±   107083.008  ns/op
BenchmarkSplitPart.testSplitPartOld          negative_index  avgt   10       38635.403 ±     1325.020  ns/op
BenchmarkSplitPart.testSplitPartOld    large_negative_index  avgt   10     3824985.465 ±    89641.937  ns/op
BenchmarkSplitPart.testSplitPartOld     adjacent_delimiters  avgt   10        3864.391 ±      101.146  ns/op
BenchmarkSplitPart.testSplitPartOld             many_fields  avgt   10     3814644.277 ±   132924.021  ns/op
BenchmarkSplitPart.testSplitPartOld        multi_char_delim  avgt   10      182955.787 ±     3254.768  ns/op
BenchmarkSplitPart.testSplitPartOld  large_multi_char_delim  avgt   10  1043425954.300 ± 29614093.332  ns/op

Allocation via `-prof gc`

     "org.apache.pinot.perf.BenchmarkSplitPart.testSplitPartNew:gc.alloc.rate.norm":  48.000 B/op
     "org.apache.pinot.perf.BenchmarkSplitPart.testSplitPartNew:gc.alloc.rate.norm":  48.001 B/op
     "org.apache.pinot.perf.BenchmarkSplitPart.testSplitPartNew:gc.alloc.rate.norm":  48.002 B/op
     "org.apache.pinot.perf.BenchmarkSplitPart.testSplitPartNew:gc.alloc.rate.norm":  48.007 B/op
     "org.apache.pinot.perf.BenchmarkSplitPart.testSplitPartNew:gc.alloc.rate.norm":  56.036 B/op
     "org.apache.pinot.perf.BenchmarkSplitPart.testSplitPartNew:gc.alloc.rate.norm":  56.043 B/op
     "org.apache.pinot.perf.BenchmarkSplitPart.testSplitPartNew:gc.alloc.rate.norm":  6616.001 B/op

     "org.apache.pinot.perf.BenchmarkSplitPart.testSplitPartOld:gc.alloc.rate.norm":  304.000 B/op
     "org.apache.pinot.perf.BenchmarkSplitPart.testSplitPartOld:gc.alloc.rate.norm":  3560.001 B/op
     "org.apache.pinot.perf.BenchmarkSplitPart.testSplitPartOld:gc.alloc.rate.norm":  6616.001 B/op
     "org.apache.pinot.perf.BenchmarkSplitPart.testSplitPartOld:gc.alloc.rate.norm":  760984.043 B/op
     "org.apache.pinot.perf.BenchmarkSplitPart.testSplitPartOld:gc.alloc.rate.norm":  760984.044 B/op
     "org.apache.pinot.perf.BenchmarkSplitPart.testSplitPartOld:gc.alloc.rate.norm":  760984.045 B/op
     "org.apache.pinot.perf.BenchmarkSplitPart.testSplitPartOld:gc.alloc.rate.norm":  760984.046 B/op

justahuman1 · 2026-02-07T20:52:46Z

pinot-common/src/main/java/org/apache/pinot/common/function/scalar/StringFunctions.java

   * TODO: Revisit if index should be one-based (both Presto and Postgres use one-based index, which starts with 1)
-   * @param input
-   * @param delimiter
+   * @param input the input String to be split into parts.


copied verbatim from the the overloaded splitPart function below to keep in sync.

justahuman1 · 2026-02-07T20:54:00Z

pinot-common/src/test/java/org/apache/pinot/common/function/scalar/StringFunctionsTest.java

        {"org.apache.pinot.common.function", ".", 3, 3, "common", "null"},
        {"+++++", "+", 0, 100, "", ""},
        {"+++++", "+", 1, 100, "null", "null"},
+        {"+++++org++apache++", "", 1, 100, "null", "null"},


These were missing cases I added to validate identical behavior to previous version

justahuman1 · 2026-02-07T20:54:47Z

pinot-perf/src/main/java/org/apache/pinot/perf/BenchmarkSplitPart.java

@@ -0,0 +1,160 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one


I can remove this if required. Added it for my own validation of the perf

Updated the splitPart function to use direct string traversal instead of allocating a full String[] array, significantly reducing memory pressure in high-throughput query scenarios. Performance improvements: - Space complexity: O(n) → O(1) - Time complexity: O(n) for both implementations Added JMH benchmarks to demonstrate improvements and regressions. The primary regression is in the backward index case with a large index value. This is the uncommon case and maybe worth the tradeoff since the memory allocation in the common case is now significantly reduced.

justahuman1 commented Feb 7, 2026

View reviewed changes

justahuman1 force-pushed the split-part-malloc-fix branch from 33336f1 to 5a2b4f1 Compare February 7, 2026 20:53

justahuman1 commented Feb 7, 2026

View reviewed changes

justahuman1 mentioned this pull request Feb 7, 2026

Improve splitPart function #17362

Open

justahuman1 force-pushed the split-part-malloc-fix branch from 5a2b4f1 to 24644cf Compare February 7, 2026 21:38

add negative pass

b4272e2

justahuman1 changed the title ~~Update splitPart scalar function to reduce allocation~~ Optimize splitPart scalar function to reduce allocation Feb 10, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize splitPart scalar function to reduce allocation #17660

Optimize splitPart scalar function to reduce allocation #17660

justahuman1 commented Feb 7, 2026 •

edited

Loading

Uh oh!

justahuman1 Feb 7, 2026

Uh oh!

justahuman1 Feb 7, 2026

Uh oh!

justahuman1 Feb 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		@@ -0,0 +1,160 @@
		/**
		* Licensed to the Apache Software Foundation (ASF) under one

Optimize splitPart scalar function to reduce allocation #17660

Are you sure you want to change the base?

Optimize splitPart scalar function to reduce allocation #17660

Conversation

justahuman1 commented Feb 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

JMH Results

Allocation via -prof gc

Uh oh!

justahuman1 Feb 7, 2026

Choose a reason for hiding this comment

Uh oh!

justahuman1 Feb 7, 2026

Choose a reason for hiding this comment

Uh oh!

justahuman1 Feb 7, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

justahuman1 commented Feb 7, 2026 •

edited

Loading

Allocation via `-prof gc`