Skip to content

Conversation

@justahuman1
Copy link

@justahuman1 justahuman1 commented Feb 7, 2026

Optimized the splitPart function to use direct string traversal instead of allocating a full String[] array, significantly reducing memory pressure in high-throughput query scenarios.

Performance improvements:

  • Space complexity: O(n) → O(1)
  • Time complexity: O(n) for both implementations

Added JMH benchmarks to demonstrate improvements and regressions. The primary regression is in the backward index case so I added splitPartNegativeIdxSingleCharDelim (b4272e2) which increases complexity but makes the kernel much faster

Addresses #17362

JMH Results

Yeah, the common cases are more than 10x faster 🚀 . The original results showed a regression with negative_index, hence b4272e2. However, this code may be harder to manage so that is the trade-off. The below results are from all commits in the PR

Operation Type New (ns/op) Old (ns/op) Speedup (Old/New)
small_index 777.8 38,722.2 ≈ 49.8× faster
large_index 295,474.5 3,867,775.5 ≈ 13.1× faster
negative_index 13.2 38,635.4 ≈ 2,929× faster
large_negative_index 47,797.4 3,824,985.5 ≈ 80× faster
adjacent_delimiters 779.9 3,864.4 ≈ 5× faster
Benchmark                                       (_scenario)  Mode  Cnt           Score          Error  Units
BenchmarkSplitPart.testSplitPartNew             small_index  avgt   10         777.797 ±       22.025  ns/op
BenchmarkSplitPart.testSplitPartNew             large_index  avgt   10      295474.514 ±     2876.945  ns/op
BenchmarkSplitPart.testSplitPartNew          negative_index  avgt   10          13.198 ±        2.553  ns/op
BenchmarkSplitPart.testSplitPartNew    large_negative_index  avgt   10       47797.384 ±      465.119  ns/op
BenchmarkSplitPart.testSplitPartNew     adjacent_delimiters  avgt   10         779.944 ±       14.194  ns/op
BenchmarkSplitPart.testSplitPartNew             many_fields  avgt   10        1935.942 ±       59.342  ns/op
BenchmarkSplitPart.testSplitPartNew        multi_char_delim  avgt   10        4281.167 ±      102.780  ns/op
BenchmarkSplitPart.testSplitPartNew  large_multi_char_delim  avgt   10     1160607.923 ±    20648.116  ns/op
BenchmarkSplitPart.testSplitPartOld             small_index  avgt   10       38722.193 ±     1463.000  ns/op
BenchmarkSplitPart.testSplitPartOld             large_index  avgt   10     3867775.470 ±   107083.008  ns/op
BenchmarkSplitPart.testSplitPartOld          negative_index  avgt   10       38635.403 ±     1325.020  ns/op
BenchmarkSplitPart.testSplitPartOld    large_negative_index  avgt   10     3824985.465 ±    89641.937  ns/op
BenchmarkSplitPart.testSplitPartOld     adjacent_delimiters  avgt   10        3864.391 ±      101.146  ns/op
BenchmarkSplitPart.testSplitPartOld             many_fields  avgt   10     3814644.277 ±   132924.021  ns/op
BenchmarkSplitPart.testSplitPartOld        multi_char_delim  avgt   10      182955.787 ±     3254.768  ns/op
BenchmarkSplitPart.testSplitPartOld  large_multi_char_delim  avgt   10  1043425954.300 ± 29614093.332  ns/op

Allocation via -prof gc

     "org.apache.pinot.perf.BenchmarkSplitPart.testSplitPartNew:gc.alloc.rate.norm":  48.000 B/op
     "org.apache.pinot.perf.BenchmarkSplitPart.testSplitPartNew:gc.alloc.rate.norm":  48.001 B/op
     "org.apache.pinot.perf.BenchmarkSplitPart.testSplitPartNew:gc.alloc.rate.norm":  48.002 B/op
     "org.apache.pinot.perf.BenchmarkSplitPart.testSplitPartNew:gc.alloc.rate.norm":  48.007 B/op
     "org.apache.pinot.perf.BenchmarkSplitPart.testSplitPartNew:gc.alloc.rate.norm":  56.036 B/op
     "org.apache.pinot.perf.BenchmarkSplitPart.testSplitPartNew:gc.alloc.rate.norm":  56.043 B/op
     "org.apache.pinot.perf.BenchmarkSplitPart.testSplitPartNew:gc.alloc.rate.norm":  6616.001 B/op

     "org.apache.pinot.perf.BenchmarkSplitPart.testSplitPartOld:gc.alloc.rate.norm":  304.000 B/op
     "org.apache.pinot.perf.BenchmarkSplitPart.testSplitPartOld:gc.alloc.rate.norm":  3560.001 B/op
     "org.apache.pinot.perf.BenchmarkSplitPart.testSplitPartOld:gc.alloc.rate.norm":  6616.001 B/op
     "org.apache.pinot.perf.BenchmarkSplitPart.testSplitPartOld:gc.alloc.rate.norm":  760984.043 B/op
     "org.apache.pinot.perf.BenchmarkSplitPart.testSplitPartOld:gc.alloc.rate.norm":  760984.044 B/op
     "org.apache.pinot.perf.BenchmarkSplitPart.testSplitPartOld:gc.alloc.rate.norm":  760984.045 B/op
     "org.apache.pinot.perf.BenchmarkSplitPart.testSplitPartOld:gc.alloc.rate.norm":  760984.046 B/op

* TODO: Revisit if index should be one-based (both Presto and Postgres use one-based index, which starts with 1)
* @param input
* @param delimiter
* @param input the input String to be split into parts.
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

copied verbatim from the the overloaded splitPart function below to keep in sync.

@justahuman1 justahuman1 force-pushed the split-part-malloc-fix branch from 33336f1 to 5a2b4f1 Compare February 7, 2026 20:53
{"org.apache.pinot.common.function", ".", 3, 3, "common", "null"},
{"+++++", "+", 0, 100, "", ""},
{"+++++", "+", 1, 100, "null", "null"},
{"+++++org++apache++", "", 1, 100, "null", "null"},
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These were missing cases I added to validate identical behavior to previous version

@@ -0,0 +1,160 @@
/**
* Licensed to the Apache Software Foundation (ASF) under one
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can remove this if required. Added it for my own validation of the perf

Updated the splitPart function to use direct string traversal instead of
allocating a full String[] array, significantly reducing memory pressure
in high-throughput query scenarios.

Performance improvements:
  - Space complexity: O(n) → O(1)
  - Time complexity: O(n) for both implementations

Added JMH benchmarks to demonstrate improvements and regressions. The primary regression is in the backward index case with a large index value. This is the uncommon case and maybe worth the tradeoff since the memory allocation in the common case is now significantly reduced.
@justahuman1 justahuman1 force-pushed the split-part-malloc-fix branch from 5a2b4f1 to 24644cf Compare February 7, 2026 21:38
@justahuman1 justahuman1 changed the title Update splitPart scalar function to reduce allocation Optimize splitPart scalar function to reduce allocation Feb 10, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant