Add JMH benchmark comparing various sort algorithms specifically for sorting ScoreDoc[] of varying lengths#15950
Open
mikemccand wants to merge 28 commits intoapache:mainfrom
Open
Add JMH benchmark comparing various sort algorithms specifically for sorting ScoreDoc[] of varying lengths#15950mikemccand wants to merge 28 commits intoapache:mainfrom
mikemccand wants to merge 28 commits intoapache:mainfrom
Conversation
…e many options for sorting a Lucene ScoreDoc[] by docid, helpful often for post-processing hits from a search. For example, if you will go and retrieve doc values or stored fields for each of these hits, you'll want to collate+sort by leaf so you can use a single iterator, always moving forwards by docid
Model: Claude AI Opus 4.6 (claude-opus-4-6)
This commit captures the full prompt history for the iterative development
of ScoreDocSortBenchmark.java and jmh-table.py across this session.
The prior commit (65b3068) included the code but not the prompts.
Files changed:
- lucene/benchmark-jmh/src/java/org/apache/lucene/benchmark/jmh/ScoreDocSortBenchmark.java
(javadoc tweak + tidy reformatting)
- lucene/benchmark-jmh/jmh-table.py (created in prior commit)
== User prompt 1 ==
i'd like to compare performance of Lucene's numerous built-in sorters on the use case of sorting an array of Lucene's ScoreDoc by ascending docid. one simply way is to use java Arrays.sort with lambda key that compares the .doc of each. another is to instantiate IntroSorter and in the anonymous subclass sort by compare doc. write a standalone benchmark that creates a random array of "realistic"
== User prompt 2 ==
ScoreDoc and then tests many sort methods, measuring their latency
== User prompt 3 (after build required Java 25) ==
- always use Java 25, installed at /usr/lib/jvm/java-25-openjdk/
== User prompt 4 ==
could you write the necessary code/tool so i can pipe JMH output through this tool and it makes a simple interactive table with algo on each row, and columns for the size of each batch. and when i click column header it sorts whole table by that column. click again and it reverses that sort.
== User prompt 5 ==
what is the xdg-open tool
== User prompt 6 ==
is there some way to make the table more visual? shades from green to red for the background of each cell maybe?
== User prompt 7 ==
can JMH sort the results by fastest to slowest somehow? can you rename the methods using JDK sort functions (Arrays.sort) to jdkSuchAndSuch?
== User prompt 8 ==
is it possible to ask jmh to output all values it recorded, in the end
== User prompt 9 ==
yes -- specifically, when i click on a cell in the HTML table, draw a histogram under the table showing the distribution of values that led to that cell's avg (hmm is it average that JMH is reporting?)
== User prompt 10 ==
i'd like to improve efficiency of running this benchmark by using multiple jvms/threads concurretnly so generate more data points per second of runtime. is that simple?
== User prompt 11 ==
hmm i see it has a threads option? but that caused exceptions when i tried it. do you need to make each benchy thread safe?
== User prompt 12 ==
sometimes the histogram has poor x axis labels, in one case 10 labels that all say 0.04. can you fix the x axis to always have enough precision? and switch to ns/op if appropriate?
== User prompt 13 ==
fix the html report to also state how JMH was run (forking, threads, iterations). when i click on a cell, in addition to showing histogram, to the right of the full table, show the method (in fixed width font) source code for that algorithm.
== User prompt 14 ==
modify the table so that each cell, under the numeric mean, renders a small sparkle histogram (with no axis labels) so human can visually consume any lurking monsters
== User prompt 15 ==
let's do the trivial comment, and a commit that includes all past prompts in our session here
== Summary of work done ==
1. Created ScoreDocSortBenchmark.java: JMH benchmark comparing 8 sort
implementations for sorting ScoreDoc[] by ascending doc ID:
- jdkSortLambda, jdkSortComparator (JDK Arrays.sort)
- arrayUtilIntroSort, arrayUtilTimSort (Lucene ArrayUtil wrappers)
- introSorterAnonymous, timSorterAnonymous, inPlaceMergeSorterAnonymous
(anonymous Lucene Sorter subclasses)
- jdkSortPrimitiveExtract (pack doc+index into long[], sort primitives)
2. Created jmh-table.py: Python script that parses JMH output (JSON or
plain text) and generates an interactive HTML report with:
- Heatmap table (green=fast, red=slow) with sortable columns
- Inline sparkline SVG histograms in each cell showing distribution
- Click-to-expand full histogram with stats (mean, median, stddev,
p5/p95, range) and adaptive axis precision/unit scaling
- Source code panel showing the clicked benchmark method
- JMH config banner (mode, forks, threads, warmup, measurement, JVM args)
- Auto-detection of JSON vs plain text input
Co-authored-by: Mike McCandless <mikemccand@apache.org>
- × was double-escaped by html.escape(); use Unicode \u00d7 instead - config banner is now a proper <table> with one label+value per row instead of all on one line Model: Claude AI Opus 4.6 (claude-opus-4-6) == User prompt == the new section in the HTML report that shows JMH settings is rendering × visually, not as a single unicode multiplication character. something is escaping & maybe? also, make that table one config+value per line. Co-authored-by: Mike McCandless <mikemccand@apache.org>
Clicking a cell now updates the browser URL to #method|size (e.g. #introSorterAnonymous|1000). Opening that URL highlights the cell, shows its histogram and source code. Uses history.replaceState to avoid polluting browser history. Model: Claude AI Opus 4.6 (claude-opus-4-6) == User prompt == can you make each cell externally linkable? when i click on the cell, somehow the brower's location changes to reference that cell, so that when i copy that url and share it, others will also show that cell highlighted and its large histogram under? Co-authored-by: Mike McCandless <mikemccand@apache.org>
…tions Documents full command lines for building, running with JSON output, and piping through jmh-table.py. Lists all interactive HTML report features (heatmap, sparklines, click-to-histogram, source code panel, deep linking, sortable columns, config banner). Model: Claude AI Opus 4.6 (claude-opus-4-6) == User prompt == enhance the javadoc on ScoreDocSortBenchmark to explain how to run everything (show the full commandline, including the python rendering tool), explain about clicking on cells, etc. Co-authored-by: Mike McCandless <mikemccand@apache.org>
Config banner now shows: - JVM identity (JDK version, VM name+version, path) - JMH version - Fork JVM args: user/@fork annotation args (-Xmx, -XX:, etc.) - Harness JVM args: JMH infrastructure args (--module-path, -Djdk.module.main) Split is heuristic: args starting with --module-path or -Djdk.module.main are classified as harness; everything else as fork/benchmark args. Model: Claude AI Opus 4.6 (claude-opus-4-6) == User prompt == when i ran JMH i had provided other command-line options (to output detailed json). is that preserved anywhere in the JMH output? can you add that to top table (args i provided originally to the JVM, in addition to args JMH is adding when it forks JVMs). in the table, differentiate those two jvms. Co-authored-by: Mike McCandless <mikemccand@apache.org>
Hash format is now #cellKey;sort=col,asc (e.g. #introSorterAnonymous|1000;sort=2,desc). Both cell selection and sort state are restored on page load from the hash. Clicking a column header or a cell updates the hash to reflect current state. Model: Claude AI Opus 4.6 (claude-opus-4-6) == User prompt == does the deep link also preserve which column i am sorting by, and its direction Co-authored-by: Mike McCandless <mikemccand@apache.org>
Fail with usage message if no Java source path is provided, and fail if no @benchmark methods are found in the given file. Model: Claude AI Opus 4.6 (claude-opus-4-6) == User prompt == ahh please fix jmh-table.py to fail if i don't provide source Co-authored-by: Mike McCandless <mikemccand@apache.org>
- Added sizes 50 and 500 to @param - Renamed jdkSortPrimitiveExtract -> jdkSortPrimitiveExtractLong - Added jdkSortPrimitiveExtractAdaptive: packs (doc, index) into int[] when docBits + indexBits <= 32, else falls back to long[]. With maxDoc=5M (23 bits), sizes 10/50/100/500 (<=9 bits) use int[]; sizes 1000/10000 fall back to long[]. Model: Claude AI Opus 4.6 (claude-opus-4-6) == User prompt 1 == if the max_doc and size of ScoreDoc[] to sort is small enough, couldn't the primitive extract sort use int[] sometimes? == User prompt 2 == add 50 and 500 sizes too Co-authored-by: Mike McCandless <mikemccand@apache.org>
Shell script that runs gradlew :lucene:benchmark-jmh:assemble first,
then execs java with the module-path, passing all args through to JMH.
Prevents running stale bytecode.
Usage:
./lucene/benchmark-jmh/run-benchmark.sh ScoreDocSortBenchmark \
-rf json -rff results.json
Model: Claude AI Opus 4.6 (claude-opus-4-6)
== User prompt ==
is it somehow possible when i run the benchmark for it to check whether
i failed to compile? or maybe i'm not running the right gradle target
and it's bypassing that check? or we have a bug in our gradle config
that fails to check & compile the benhcy source? oh the problem is,
i run the benchy directly with a java invocation. is there a better
way to run via gradle so it recompiles the benchy source if necessary?
Co-authored-by: Mike McCandless <mikemccand@apache.org>
Promotes run-benchmark.sh (compile + run) as the recommended approach, keeps manual gradlew + java as an alternative. Model: Claude AI Opus 4.6 (claude-opus-4-6) == User prompt == also fix the docs in the java benchmark source to talk about this new script Co-authored-by: Mike McCandless <mikemccand@apache.org>
Change @State(Scope.Benchmark) to @State(Scope.Thread) so each JMH thread gets its own state instance (template and work arrays), enabling safe concurrent execution with -t N. User prompt: "i'd like to enable threads to run bench methods concurrently. can you make them threadsafe?" Co-authored-by: Mike McCandless <mikemccand@apache.org>
Replace manual loop with Arrays.copyOf (delegates to System.arraycopy). User prompt: "why not use Arrays.copy to make the shallow 'work' copy?" Co-authored-by: Mike McCandless <mikemccand@apache.org>
User Prompt: read review.txt and make a plan for those imnprovements Files changed: - lucene/benchmark-jmh/src/java/org/apache/lucene/benchmark/jmh/ScoreDocSortBenchmark.java Summary: Phase 1 of improvements from review.txt: - Extracted MAX_DOC constant (5_000_000) to ensure consistency between setup and adaptive sort. - Updated bitsNeeded comment from Javadoc to a simple code comment. - Optimized setupInvocation to use System.arraycopy. Co-authored-by: Mike McCandless <mikemccand@apache.org>
User Prompt: read review.txt and make a plan for those imnprovements Files changed: - lucene/benchmark-jmh/src/java/org/apache/lucene/benchmark/jmh/ScoreDocSortBenchmark.java Summary: Phase 2 of improvements from review.txt: - Added LSBRadixSorter contender (using Lucene's internal utility). - Added manual 2-pass 16-bit radix sort for int[] packed values. - Added Arrays.parallelSort contender. - These additions help identify the fastest sorting approach across different array sizes. Co-authored-by: Mike McCandless <mikemccand@apache.org>
User Prompt: read review.txt and make a plan for those imnprovements Files changed: - lucene/benchmark-jmh/src/java/org/apache/lucene/benchmark/jmh/ScoreDocSortBenchmark.java Summary: Phase 3 of improvements from review.txt: - Added distribution parameter (random, nearly_sorted, reversed) to test algorithm performance on different input patterns. - Updated setupTrial to handle new distributions. - Added documentation comments explaining Level.Invocation overhead, intentional inline lambda usage, and adaptive sort execution paths. Co-authored-by: Mike McCandless <mikemccand@apache.org>
User Prompt: read review.txt and make a plan for those imnprovements Files changed: - lucene/benchmark-jmh/jmh-table.py Summary: Phase 4 of improvements from review.txt (Visualization): - Made inline sparklines monochrome (#6688aa) to avoid semantic confusion with heatmap colors. - Added a 'Show relative (x fastest)' toggle to the HTML report. - Implemented JS logic to switch cell values between absolute time and relative speedup. - Updated URL hash handling to preserve the toggle state. Co-authored-by: Mike McCandless <mikemccand@apache.org>
User Prompt: is there some way to validate that each sort is sorting correctly, once up front, without counting that validation in the benchy time for each method? Files changed: - lucene/benchmark-jmh/src/java/org/apache/lucene/benchmark/jmh/ScoreDocSortBenchmark.java Summary: Refactored the benchmark to include a comprehensive validation suite during the @setup(Level.Trial) phase. - Extracted sorting logic into private 'run' methods for each implementation. - Implemented a 'verify' helper that checks for ascending order and data integrity (using IdentityHashMap to ensure all original ScoreDoc instances are preserved). - All 12 sorting implementations are now validated once per trial (per parameter combination) before measurement starts. - Throws IllegalStateException with detailed error context if any sort fails validation. Co-authored-by: Mike McCandless <mikemccand@apache.org>
User Prompt: let's fix the visualization. put back the different sizes as column headers, and make the selection of random, nearly sorted, and reversed a dropbox letting me pick which mode to study. also, the new runXXX methods broke the source code display panel -- fix it to show the runXXX method instead of the one-line delegation it now shows Files changed: - lucene/benchmark-jmh/jmh-table.py Summary: Improved the JMH visualization script: - Restored 'size' as the primary column dimension. - Added a distribution dropdown (Distribution: random/nearly_sorted/reversed) to switch between different benchmark runs. - Enhanced method extraction to include both the @benchmark delegator and the corresponding runXXX implementation in the source panel. - Refactored JS to dynamically update cell values, colors, and sparklines when switching distributions. - Recalculated heatmap gradients per distribution-size group for better contrast. Co-authored-by: Mike McCandless <mikemccand@apache.org>
User Prompt: is there some way to validate that each sort is sorting correctly, once up front, without counting that validation in the benchy time for each method? let's fix the visualization. put back the different sizes as column headers, and make the selection of random, nearly sorted, and reversed a dropbox letting me pick which mode to study. also, the new runXXX methods broke the source code display panel -- fix it to show the runXXX method instead of the one-line delegation it now shows Files changed: - lucene/benchmark-jmh/jmh-table.py Summary: Fixed and improved the JMH results visualization: - Resolved a Python SyntaxError (unterminated triple-quoted string) by correctly restoring missing JavaScript helper functions (pickDisplayUnit, smartPrecision, fmtVal, drawHistogram). - Restored 'size' as the primary column dimension in the results table. - Added a distribution selection dropdown to the UI, allowing users to switch the view between 'random', 'nearly_sorted', and 'reversed' data distributions dynamically. - Refactored the JavaScript logic to update cell values (absolute vs relative), heatmap colors, and sparklines based on the selected distribution. - Enhanced the source code extraction in the Python script to include both the @benchmark delegator and the corresponding runXXX implementation method in the interactive source panel. - Verified that URL hash state correctly preserves the selected distribution, sort column, and relative toggle. Co-authored-by: Mike McCandless <mikemccand@apache.org>
… --skip flag for jmh-table.py User prompts (Mike McCandless): ============= Prompt 1: "i ran the jmh bench with recent changes to also add distributions like nearly sorted and reversed. but why is radixsort2pass having a ~77 usec floor even for the tiny sorts? ScoreDocSortBenchmark.radixSort2Pass random 10 avgt 50 77.581 ± 0.381 us/op ScoreDocSortBenchmark.radixSort2Pass random 50 avgt 50 77.182 ± 0.576 us/op ScoreDocSortBenchmark.radixSort2Pass random 100 avgt 50 77.405 ± 0.491 us/op ScoreDocSortBenchmark.radixSort2Pass random 500 avgt 50 78.457 ± 0.526 us/op ScoreDocSortBenchmark.radixSort2Pass random 1000 avgt 50 9.232 ± 0.120 us/op ScoreDocSortBenchmark.radixSort2Pass random 10000 avgt 50 298.583 ± 1.500 us/op ScoreDocSortBenchmark.radixSort2Pass nearly_sorted 10 avgt 50 77.323 ± 0.358 us/op ScoreDocSortBenchmark.radixSort2Pass nearly_sorted 50 avgt 50 77.309 ± 0.361 us/op ScoreDocSortBenchmark.radixSort2Pass nearly_sorted 100 avgt 50 77.299 ± 0.374 us/op ScoreDocSortBenchmark.radixSort2Pass nearly_sorted 500 avgt 50 78.108 ± 0.505 us/op ScoreDocSortBenchmark.radixSort2Pass nearly_sorted 1000 avgt 50 4.415 ± 0.041 us/op ScoreDocSortBenchmark.radixSort2Pass nearly_sorted 10000 avgt 50 49.229 ± 0.517 us/op ScoreDocSortBenchmark.radixSort2Pass reversed 10 avgt 50 76.901 ± 0.625 us/op ScoreDocSortBenchmark.radixSort2Pass reversed 50 avgt 50 77.004 ± 0.391 us/op ScoreDocSortBenchmark.radixSort2Pass reversed 100 avgt 50 77.178 ± 0.473 us/op ScoreDocSortBenchmark.radixSort2Pass reversed 500 avgt 50 77.887 ± 0.488 us/op ScoreDocSortBenchmark.radixSort2Pass reversed 1000 avgt 50 3.386 ± 0.029 us/op ScoreDocSortBenchmark.radixSort2Pass reversed 10000 avgt 50 60.995 ± 0.658 us/op" AI analysis: The ~77us floor for sizes <= 500 is caused by allocating int[65536] (256KB bucket array) every invocation. Sizes 10-500 take the int path (docBits=23 + indexBits<=9 = 32 bits), paying 256KB alloc+zero cost. Size 1000 takes the long[] fallback (23+10=33 > 32 bits) which uses Arrays.sort -- no big bucket. AI proposed 4 fixes; user chose option 1. Prompt 2: "let's do 1" Prompt 3: "how are the cell colors assigned? they are too green now, because we have standout ssuper slow sort algos. can you change it to log scale maybe?" Prompt 4: "fix the visualizer to take an argument of which algos to skip. i want to skip the crazy slow radix two pass." Changes: -------- 1. ScoreDocSortBenchmark.java: Replaced 2-pass 16-bit radix sort (int[65536] = 256KB bucket) with up to 4-pass 8-bit radix sort (int[256] = 1KB bucket). Dynamically computes number of passes as ceil(totalBits/8). Ping-pongs between packed[] and workArray[] across passes. 2. jmh-table.py: Changed heatmap color scale from linear to logarithmic (t = log(score/min) / log(max/min)) so outlier-slow algorithms don't compress everything else into green. Added --skip argument for excluding algorithms by substring match (case-insensitive). 3. run-benchmark.sh: Moved --sun-misc-unsafe-memory-access=allow into -jvmArgs so it applies to forked JMH processes. Co-authored-by: Mike McCandless <mikemccand@apache.org>
User prompt (Mike McCandless): "fix the benchy to allow running only one distribution (random) but keep the others in case i want to run again" Changed @param({"random", "nearly_sorted", "reversed"}) to @param({"random"}) with a comment noting the other values. Cuts benchmark matrix by 3x. Co-authored-by: Mike McCandless <mikemccand@apache.org>
User prompt (Mike McCandless): "java.lang.IllegalStateException: jdkSortPrimitiveExtractAdaptive failed: doc mismatch at index 0. expected 9029 but got 4194350" Root cause: for size=500, docBits=23 + indexBits=9 = 32 uses all 32 bits. Docs >= 2^22 (4,194,304) set bit 31 (sign bit) when shifted left by 9, producing negative packed ints. Arrays.sort uses signed comparison, so these sort before positive values, corrupting output. Fix: changed threshold from <= 32 to <= 31 in runJdkSortPrimitiveExtractAdaptive. Size=500 now takes the long[] fallback path. The radix sort methods are unaffected (they sort by unsigned bit pattern). Co-authored-by: Mike McCandless <mikemccand@apache.org>
User prompt (Mike McCandless): "i see, any chance we could contain that excption so the other methods can run" Added safeVerify() wrapper that catches IllegalStateException and prints a warning to stderr. A single broken sort algorithm no longer aborts the entire trial for that parameter set. Co-authored-by: Mike McCandless <mikemccand@apache.org>
User Prompt: does the output results json tell you total time it took JMH to run all tests? if so, let's add that to the HTML report? Files changed: - lucene/benchmark-jmh/jmh-table.py Summary: Updated the visualization script to calculate and display the total benchmark execution time. - Modified parse_jmh_json to estimate total time by summing (forks * (warmup + measurement) * time_per_iteration) across all benchmark configurations. - Added logic to format the total time into seconds, minutes, or hours for readability. - Updated build_html to include a 'Total time (approx)' row in the configuration banner at the top of the report. Co-authored-by: Mike McCandless <mikemccand@apache.org>
Contributor
|
Oh fun! I took a look through the jmh benchmark itself and left a couple questions but nothing major. I haven't looked at the other two scripts yet. This is cool though! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
[Spinoff from @zihanx's recent PR (#15803) adding
ReaderUtil.partitionByLeafhelper to collate anyScoreDoc[]into per-leaf sorted groups.]TL;DR: see this cool benchmark results UI testing various algorithms to sort
ScoreDoc[](smaller is better, the numbers are microseconds per sort call).The hits from a Lucene query (
ScoreDoc[]) come out sorted by something important to the user (e.g. relevance), but in order to retrieve stored fields or doc values, you instead need to partition by leaf and sort bydocidwithin each leaf, retrieve your values via Lucene's usual iterators, and somehow then invert that step so you can stick/associate the retrieved values with originalScoreDoc[](sorted for the user).We currently use
Arrays.sort(int[])after extracting just thedocidfrom each hit, but that loses association with the originalScoreDoc[]. #15905 is about NOT losing that association by also keeping track of theint ordinalposition (in the original array). And then @gsmiller made the cool JMH bench to find a faster partition step (after sorting) -- #15938.This all made me curious about the absolute fastest way to sort these hits. So, I worked with Claude Code (and maybe Gemini for some prompts too) to create this JMH benchmark (
ScoreDocSortBenchmark.java) plus simple HTML UI to understand the results.It tests 12 different approaches across array sizes from 10 to 10,000:
Arrays.sortwith lambdas, static comparators, andparallelSort).IntroSorter,TimSorter,InPlaceMergeSorter, andLSBRadixSorter).docidand the original arrayindexintoint[]orlong[]for cache-friendly primitive sorting).The benchmark does upfront validation during the
@Setupphase to ensure every sort is actually sorting correctly and not losing/duplicating anyScoreDocinstances, all without polluting the actual JMH measurement time.Claude and I also added a companion Python script (
jmh-table.py) to convert the raw JMH JSON output into an interactive HTML heatmap.I tried to preserve all local commits, and tell the ai to make a local commit after each prompt+code iteration.
Here is a screenshot of the output:
Here's a hosted version that should work for you (click column headers to sort all rows!).
NOTES/Learnings
Level.Invocationoverhead because we have to do a shallow array copy before every invocation. It inflates the absolute numbers a bit for tiny arrays, but the overhead is consistent across all algorithms.parallelSorton large (10K) arrays, but that's using multiple CPU cores. This, despite that they must make first pass to encode tolong[]and ending pass to get back toScoreDoc[].TimSorter(is this still used for Python'slistobjectsort?) was surprisingly slow