Use binary search partitioning in ReaderUtil#partitionByLeaf by gsmiller · Pull Request #15938 · apache/lucene

gsmiller · 2026-04-07T19:57:44Z

Description

This change modifies the "partition" step of ReaderUtil#partitionByLeaf to leverage binary search instead of a linear scan.

Current Implementation: Iterates a sorted list of docIDs, checking each against leaf/segment boundaries to partition the sorted docs into their corresponding segments.
Proposed Implementation: Iterate the leaves/segments and binary search the sorted docIDs to determine the partitions.

I've included a jmh benchmark with this change that I used to test the two approaches. Benchmark results are below. At a high level, the binary search approach outperforms except when there's a very large number of index segments relative to docs being partitioned. My sense is that the cases where the linear scan approach performs best are uncommon cases "in the wild," so I think the binary search approach generally makes sense. But open to feedback/thoughts of course!

Benchmarks

Some iterations and more details are in #15934 where I initially experimented with this, but the most concise benchmarks are detailed here.

Benchmark Hardware

I ran benchmarks on two AWS ec2 Amazon Linux hosts—one with x86 (m5.12xlarge) and one with ARM (m6g.4xlarge):

	x86 Linux	ARM Linux
CPU	Intel Xeon Platinum 8175M	Neoverse-N1 (Graviton2)
Clock	2.5 GHz (base), 3.1 GHz (turbo)	~2.5 GHz
L1d cache	32 KB	64 KB
L2 cache	1 MB	1 MB
Cores	48	16

Summary Results

Ran benchmarks as: java -jar lucene/benchmark-jmh/build/benchmarks/lucene-benchmark-jmh-*.jar PartitionByLeafBenchmark

Fed the results to an AI tool to build summary tables, but including raw output below.

x86 Results

numDocIds	numLeaves	Linear (ops/ms)	BS (ops/ms)	Difference
100	5	4,352	8,429	BS +94% ✅
100	10	3,501	4,880	BS +39% ✅
100	20	2,682	2,589	~tie
100	50	1,470	1,180	Linear +25% ❌
100	200	625	461	Linear +36% ❌
1,000	5	484	1,911	BS +295% ✅
1,000	10	479	1,922	BS +301% ✅
1,000	20	463	1,470	BS +217% ✅
1,000	50	416	735	BS +77% ✅
1,000	200	264	195	Linear +35% ❌
10,000	5	45	182	BS +304% ✅
10,000	10	46	175	BS +283% ✅
10,000	20	44	155	BS +252% ✅
10,000	50	47	193	BS +311% ✅
10,000	200	43	119	BS +176% ✅
100,000	5	6.1	10.9	BS +80% ✅
100,000	10	7.5	15.8	BS +111% ✅
100,000	20	8.0	17.2	BS +115% ✅
100,000	50	8.2	17.7	BS +116% ✅
100,000	200	8.0	14.1	BS +76% ✅

ARM Results

numDocIds	numLeaves	Linear (ops/ms)	BS (ops/ms)	Difference
100	5	5,285	6,608	BS +25% ✅
100	10	3,967	3,920	~tie
100	20	2,331	1,948	Linear +20% ❌
100	50	1,470	840	Linear +75% ❌
100	200	580	363	Linear +60% ❌
1,000	5	703	1,798	BS +156% ✅
1,000	10	648	1,402	BS +116% ✅
1,000	20	615	1,068	BS +74% ✅
1,000	50	486	550	BS +13% ✅
1,000	200	265	140	Linear +89% ❌
10,000	5	69	273	BS +295% ✅
10,000	10	68	205	BS +201% ✅
10,000	20	67	196	BS +193% ✅
10,000	50	65	169	BS +160% ✅
10,000	200	58	74	BS +27% ✅
100,000	5	7.9	38.9	BS +392% ✅
100,000	10	7.5	36.6	BS +388% ✅
100,000	20	7.6	31.6	BS +316% ✅
100,000	50	7.2	23.9	BS +232% ✅
100,000	200	6.6	16.2	BS +145% ✅

Raw Benchmark Output

x86
Benchmark PartitionByLeafBenchmark.binarySearchPartition PartitionByLeafBenchmark.binarySearchPartition PartitionByLeafBenchmark.binarySearchPartition PartitionByLeafBenchmark.binarySearchPartition PartitionByLeafBenchmark.binarySearchPartition PartitionByLeafBenchmark.binarySearchPartition PartitionByLeafBenchmark.binarySearchPartition PartitionByLeafBenchmark.binarySearchPartition PartitionByLeafBenchmark.binarySearchPartition PartitionByLeafBenchmark.binarySearchPartition PartitionByLeafBenchmark.binarySearchPartition PartitionByLeafBenchmark.binarySearchPartition PartitionByLeafBenchmark.binarySearchPartition PartitionByLeafBenchmark.binarySearchPartition PartitionByLeafBenchmark.binarySearchPartition PartitionByLeafBenchmark.binarySearchPartition PartitionByLeafBenchmark.binarySearchPartition PartitionByLeafBenchmark.binarySearchPartition PartitionByLeafBenchmark.binarySearchPartition PartitionByLeafBenchmark.binarySearchPartition PartitionByLeafBenchmark.linearPartition PartitionByLeafBenchmark.linearPartition PartitionByLeafBenchmark.linearPartition PartitionByLeafBenchmark.linearPartition PartitionByLeafBenchmark.linearPartition PartitionByLeafBenchmark.linearPartition PartitionByLeafBenchmark.linearPartition PartitionByLeafBenchmark.linearPartition PartitionByLeafBenchmark.linearPartition PartitionByLeafBenchmark.linearPartition PartitionByLeafBenchmark.linearPartition PartitionByLeafBenchmark.linearPartition PartitionByLeafBenchmark.linearPartition PartitionByLeafBenchmark.linearPartition PartitionByLeafBenchmark.linearPartition PartitionByLeafBenchmark.linearPartition PartitionByLeafBenchmark.linearPartition PartitionByLeafBenchmark.linearPartition PartitionByLeafBenchmark.linearPartition PartitionByLeafBenchmark.linearPartition (numDocIds) (numLeaves) Mode Cnt Score Error Units
100 5 thrpt 15 8429.348 ± 440.448 ops/ms
100 10 thrpt 15 4880.362 ± 54.478 ops/ms
100 20 thrpt 15 2589.496 ± 65.347 ops/ms
100 50 thrpt 15 1180.417 ± 93.940 ops/ms
100 200 thrpt 15 460.616 ± 22.819 ops/ms
1000 5 thrpt 15 1911.034 ± 23.741 ops/ms
1000 10 thrpt 15 1922.153 ± 77.887 ops/ms
1000 20 thrpt 15 1470.400 ± 32.570 ops/ms
1000 50 thrpt 15 734.610 ± 3.304 ops/ms
1000 200 thrpt 15 195.314 ± 1.438 ops/ms
10000 5 thrpt 15 182.224 ± 2.416 ops/ms
10000 10 thrpt 15 174.868 ± 1.510 ops/ms
10000 20 thrpt 15 155.064 ± 2.676 ops/ms
10000 50 thrpt 15 193.320 ± 1.270 ops/ms
10000 200 thrpt 15 119.443 ± 2.780 ops/ms
100000 5 thrpt 15 10.909 ± 0.058 ops/ms
100000 10 thrpt 15 15.837 ± 0.137 ops/ms
100000 20 thrpt 15 17.244 ± 0.160 ops/ms
100000 50 thrpt 15 17.748 ± 0.121 ops/ms
100000 200 thrpt 15 14.092 ± 0.460 ops/ms
100 5 thrpt 15 4351.953 ± 32.738 ops/ms
100 10 thrpt 15 3500.875 ± 134.317 ops/ms
100 20 thrpt 15 2681.611 ± 14.056 ops/ms
100 50 thrpt 15 1469.889 ± 283.971 ops/ms
100 200 thrpt 15 624.940 ± 33.866 ops/ms
1000 5 thrpt 15 484.472 ± 1.494 ops/ms
1000 10 thrpt 15 478.512 ± 5.470 ops/ms
1000 20 thrpt 15 463.223 ± 7.010 ops/ms
1000 50 thrpt 15 416.226 ± 9.632 ops/ms
1000 200 thrpt 15 264.385 ± 2.228 ops/ms
10000 5 thrpt 15 44.861 ± 0.470 ops/ms
10000 10 thrpt 15 45.709 ± 0.101 ops/ms
10000 20 thrpt 15 44.448 ± 0.156 ops/ms
10000 50 thrpt 15 47.107 ± 0.505 ops/ms
10000 200 thrpt 15 43.276 ± 0.152 ops/ms
100000 5 thrpt 15 6.059 ± 0.015 ops/ms
100000 10 thrpt 15 7.519 ± 0.169 ops/ms
100000 20 thrpt 15 8.019 ± 0.033 ops/ms
100000 50 thrpt 15 8.233 ± 0.045 ops/ms
100000 200 thrpt 15 8.048 ± 0.034 ops/ms

ARM
Benchmark (numDocIds) (numLeaves) Mode Cnt Score Error Units
PartitionByLeafBenchmark.binarySearchPartition 100 5 thrpt 15 6607.852 ± 248.856 ops/ms
PartitionByLeafBenchmark.binarySearchPartition 100 10 thrpt 15 3920.374 ± 147.577 ops/ms
PartitionByLeafBenchmark.binarySearchPartition 100 20 thrpt 15 1948.412 ± 3.964 ops/ms
PartitionByLeafBenchmark.binarySearchPartition 100 50 thrpt 15 840.036 ± 24.553 ops/ms
PartitionByLeafBenchmark.binarySearchPartition 100 200 thrpt 15 362.670 ± 5.729 ops/ms
PartitionByLeafBenchmark.binarySearchPartition 1000 5 thrpt 15 1798.015 ± 26.241 ops/ms
PartitionByLeafBenchmark.binarySearchPartition 1000 10 thrpt 15 1401.657 ± 25.145 ops/ms
PartitionByLeafBenchmark.binarySearchPartition 1000 20 thrpt 15 1067.532 ± 38.887 ops/ms
PartitionByLeafBenchmark.binarySearchPartition 1000 50 thrpt 15 550.367 ± 5.071 ops/ms
PartitionByLeafBenchmark.binarySearchPartition 1000 200 thrpt 15 139.719 ± 1.729 ops/ms
PartitionByLeafBenchmark.binarySearchPartition 10000 5 thrpt 15 273.486 ± 17.731 ops/ms
PartitionByLeafBenchmark.binarySearchPartition 10000 10 thrpt 15 204.891 ± 1.491 ops/ms
PartitionByLeafBenchmark.binarySearchPartition 10000 20 thrpt 15 195.903 ± 1.537 ops/ms
PartitionByLeafBenchmark.binarySearchPartition 10000 50 thrpt 15 168.672 ± 8.833 ops/ms
PartitionByLeafBenchmark.binarySearchPartition 10000 200 thrpt 15 74.241 ± 3.183 ops/ms
PartitionByLeafBenchmark.binarySearchPartition 100000 5 thrpt 15 38.910 ± 0.332 ops/ms
PartitionByLeafBenchmark.binarySearchPartition 100000 10 thrpt 15 36.608 ± 0.374 ops/ms
PartitionByLeafBenchmark.binarySearchPartition 100000 20 thrpt 15 31.596 ± 0.274 ops/ms
PartitionByLeafBenchmark.binarySearchPartition 100000 50 thrpt 15 23.895 ± 0.305 ops/ms
PartitionByLeafBenchmark.binarySearchPartition 100000 200 thrpt 15 16.244 ± 0.143 ops/ms
PartitionByLeafBenchmark.linearPartition 100 5 thrpt 15 5284.656 ± 31.317 ops/ms
PartitionByLeafBenchmark.linearPartition 100 10 thrpt 15 3966.757 ± 32.919 ops/ms
PartitionByLeafBenchmark.linearPartition 100 20 thrpt 15 2330.805 ± 15.104 ops/ms
PartitionByLeafBenchmark.linearPartition 100 50 thrpt 15 1469.664 ± 49.736 ops/ms
PartitionByLeafBenchmark.linearPartition 100 200 thrpt 15 579.968 ± 8.910 ops/ms
PartitionByLeafBenchmark.linearPartition 1000 5 thrpt 15 703.493 ± 4.639 ops/ms
PartitionByLeafBenchmark.linearPartition 1000 10 thrpt 15 648.284 ± 6.565 ops/ms
PartitionByLeafBenchmark.linearPartition 1000 20 thrpt 15 614.930 ± 6.369 ops/ms
PartitionByLeafBenchmark.linearPartition 1000 50 thrpt 15 485.980 ± 12.309 ops/ms
PartitionByLeafBenchmark.linearPartition 1000 200 thrpt 15 264.950 ± 10.029 ops/ms
PartitionByLeafBenchmark.linearPartition 10000 5 thrpt 15 69.465 ± 4.038 ops/ms
PartitionByLeafBenchmark.linearPartition 10000 10 thrpt 15 68.010 ± 0.186 ops/ms
PartitionByLeafBenchmark.linearPartition 10000 20 thrpt 15 67.008 ± 0.213 ops/ms
PartitionByLeafBenchmark.linearPartition 10000 50 thrpt 15 65.174 ± 3.478 ops/ms
PartitionByLeafBenchmark.linearPartition 10000 200 thrpt 15 58.380 ± 3.943 ops/ms
PartitionByLeafBenchmark.linearPartition 100000 5 thrpt 15 7.858 ± 0.032 ops/ms
PartitionByLeafBenchmark.linearPartition 100000 10 thrpt 15 7.525 ± 0.506 ops/ms
PartitionByLeafBenchmark.linearPartition 100000 20 thrpt 15 7.645 ± 0.029 ops/ms
PartitionByLeafBenchmark.linearPartition 100000 50 thrpt 15 7.245 ± 0.033 ops/ms
PartitionByLeafBenchmark.linearPartition 100000 200 thrpt 15 6.632 ± 0.005 ops/ms

jainankitk

The existing logic was O(D + L) where D = docs and L = leaves. The new approach is O(L * log(D)). This explains why:

Binary search wins big when D >> L (the common case) — e.g., 100K docs / 5 leaves: +80-392%
Linear scan wins when L is large relative to D (uncommon) — e.g., 100 docs / 200 leaves: linear +36-60%

Given we know the number of matching documents and leaves upfront, I am wondering if it makes sense to keep existing logic for sparse cases?

gsmiller · 2026-04-08T13:45:02Z

Given we know the number of matching documents and leaves upfront, I am wondering if it makes sense to keep existing logic for sparse cases?

Great question! So I did play with this a bit in benchmarking. I forked the logic based on whether-or-not there were more leaves than docs (essentially handling the outlier cases where there are lots of leaves and very few docs with the current linear scan). I ultimately shelved that idea for two reasons: (1) I questioned the trade-off of having two separate implementations to handle outlier cases that I don't think are super likely, and (2) the tuning heuristic of when to use linear isn't quite as clean as checking which is larger, docs or leaves (it seemed to be hardware dependent). Also, for the most part, the cases where linear scan is more performant are already very fast anyway (because you need to have very few docs), so I wasn't sure it was worth further optimizing those cases with a forked implementation when the practical difference would likely be small anyway. One thing I like about the binary search approach is that it makes the "slow cases" (i.e., lots of docs) much faster.

But stepping back, I'm not opposed to having two implementations if there's support for it. We do that sort of thing in other places. I was trying to start simple though and avoid feedback in the other direction (e.g., "why are you overcomplicating this with two implementations?" :)

Also includes a jmh benchmark comparing bsearch to linear scanning

mikemccand · 2026-04-10T13:41:12Z

+1 to keep it simple (just binary search option). I think it's OK if the already fast cases get a bit slower, and the slow cases get sizably faster? We've made similar tradeoffs in the past for query execution optimizations.

mikemccand

I love it -- it's also a nice code simplification since we no longer need the additional / duplicated special cased handling after the loop.

Thanks @gsmiller.

I also made a fun little JMH benchmark with CC (Claude Code) ... will try to open PR soon. @zihanx's partitionByLeaf PR (#15803) and your JMH benchmarking inspired me!

mikemccand · 2026-04-10T13:42:44Z

lucene/core/src/java/org/apache/lucene/index/ReaderUtil.java

+      LeafReaderContext leaf = leaves.get(leafIdx);
+      int leafEnd = leaf.docBase + leaf.reader().maxDoc();
+      if (sortedDocIds[from] >= leafEnd) {
+        result[leafIdx] = EMPTY_INT_ARRAY;


I wish we had per-line code coverage pulled out to the GitHub PR (here) so we could quickly confirm whether unit tests are covering this path... I think we do somewhere run tests with code coverage but it's a fully separate UI maybe?

mikemccand · 2026-04-10T13:47:42Z

lucene/core/src/java/org/apache/lucene/index/ReaderUtil.java

+      if (to < 0) {
+        to = -to - 1;
      }
+      int count = to - from;


Maybe assert count > 0 since we think we're always handling/optimizing the empty case above?

gsmiller added this to the 10.5.0 milestone Apr 7, 2026

github-actions bot added the module:core/index label Apr 7, 2026

gsmiller force-pushed the GH/leaf-partition-bsearch branch from bf23eee to f301e05 Compare April 7, 2026 20:00

This was referenced Apr 7, 2026

Explore binary search in ReaderUtil#partitionByLeaf #15934

Closed

Add jmh benchmark for ReaderUtil#partitionByLeaf approaches #15933

Closed

jainankitk reviewed Apr 7, 2026

View reviewed changes

Greg Miller added 3 commits April 9, 2026 07:14

Use binary search partitioning in ReaderUtil#partitionByLeaf

4fab009

Also includes a jmh benchmark comparing bsearch to linear scanning

fixup changes entry

b639316

fix busted changes merge

d614658

gsmiller force-pushed the GH/leaf-partition-bsearch branch from f301e05 to d614658 Compare April 9, 2026 14:37

mikemccand approved these changes Apr 10, 2026

View reviewed changes

mikemccand mentioned this pull request Apr 10, 2026

Add JMH benchmark comparing various sort algorithms specifically for sorting ScoreDoc[] of varying lengths #15950

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use binary search partitioning in ReaderUtil#partitionByLeaf#15938

Use binary search partitioning in ReaderUtil#partitionByLeaf#15938
gsmiller wants to merge 3 commits intoapache:mainfrom
gsmiller:GH/leaf-partition-bsearch

gsmiller commented Apr 7, 2026

Uh oh!

jainankitk left a comment

Uh oh!

gsmiller commented Apr 8, 2026

Uh oh!

mikemccand commented Apr 10, 2026

Uh oh!

mikemccand left a comment

Uh oh!

mikemccand Apr 10, 2026

Uh oh!

mikemccand Apr 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

gsmiller commented Apr 7, 2026

Description

Benchmarks

Benchmark Hardware

Summary Results

Raw Benchmark Output

Uh oh!

jainankitk left a comment

Choose a reason for hiding this comment

Uh oh!

gsmiller commented Apr 8, 2026

Uh oh!

mikemccand commented Apr 10, 2026

Uh oh!

mikemccand left a comment

Choose a reason for hiding this comment

Uh oh!

mikemccand Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

mikemccand Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants