Background
The 6 perf-optimization PRs currently open (#3494, #3496, #3500, #3504, #3506, #3510) report headline numbers (12x decode speedup, 7193% Binary.hashCode improvement, etc.) but cite JMH benchmarks that do not exist on master. Reviewers cannot reproduce the numbers without manually copying benchmark sources from elsewhere.
This issue tracks contributing the JMH benchmarks themselves so reviewers can reproduce, validate, and continue measuring across future changes.
Problems
1. parquet-benchmarks shaded jar is broken on master
A build of parquet-benchmarks from the current master produces a jar that is non-functional:
$ java -jar parquet-benchmarks/target/parquet-benchmarks.jar
Exception in thread "main" java.lang.RuntimeException: ERROR: Unable to find the resource: /META-INF/BenchmarkList
The parquet-benchmarks/pom.xml is missing two pieces of configuration:
- The
maven-compiler-plugin lacks the annotationProcessorPaths / annotationProcessors config for jmh-generator-annprocess. As a result the JMH annotation processor never runs, and META-INF/BenchmarkList and META-INF/CompilerHints are never generated. (Workaround: pass -Dmaven.compiler.proc=full, but this is undiscoverable.)
- The
maven-shade-plugin lacks AppendingTransformer entries for META-INF/BenchmarkList and META-INF/CompilerHints. Even if the resources were generated, shading would drop them.
2. No benchmarks for the optimizations under review
The 6 open perf PRs touch encode/decode paths in parquet-column and parquet-common (PlainValuesReader/Writer, Binary.hashCode, ByteStreamSplitValuesReader/Writer, BinaryPlainValuesReader). Master's parquet-benchmarks covers only file-level read/write, not these CPU-bound encoding paths.
Proposal
Land the following in a single PR against parquet-benchmarks:
- pom.xml fix: add JMH annotation-processor config +
AppendingTransformer entries so the shaded jar is runnable.
- 11 new JMH benchmark files covering the encoding/decoding paths under optimization, plus supporting infrastructure:
IntEncodingBenchmark — encode/decode with PLAIN, DELTA_BINARY_PACKED, BYTE_STREAM_SPLIT, RLE, and dictionary, parameterized on value count and data distribution
BinaryEncodingBenchmark — Binary write/read paths (PLAIN, dictionary), parameterized on length and cardinality
ByteStreamSplitEncodingBenchmark, ByteStreamSplitDecodingBenchmark — BSS encode/decode for float/double/int/long
FixedLenByteArrayEncodingBenchmark — FLBA encode/decode
FileReadBenchmark, FileWriteBenchmark — CPU-focused file-level benchmarks (minimal I/O via temp files)
RowGroupFlushBenchmark — flush-path benchmark
ConcurrentReadWriteBenchmark — multi-threaded read/write throughput
BlackHoleOutputFile — OutputFile that discards bytes, used to isolate CPU work from I/O
TestDataFactory — shared test-data generation utilities
After this lands, each existing perf PR will be amended with a one-line "How to reproduce" snippet pointing at the relevant *Benchmark class.
Out of scope (deferred)
The existing ReadBenchmarks, WriteBenchmarks, and NestedNullWritingBenchmarks could be modernized (Hadoop-free LocalInputFile, parameterized over compression and writer version, JMH-idiomatic state setup). That is a separate concern and will be proposed in a follow-up PR.
Validation
With the proposed pom changes, the shaded jar contains a populated META-INF/BenchmarkList (87 benchmarks registered) and runs cleanly. As a sanity check, IntEncodingBenchmark.decodePlain reproduces the ~91M ops/s baseline cited in #3493/#3494 (master JDK 21, JMH 1.37, 3 warmup + 5 measurement iterations).
Background
The 6 perf-optimization PRs currently open (#3494, #3496, #3500, #3504, #3506, #3510) report headline numbers (12x decode speedup, 7193% Binary.hashCode improvement, etc.) but cite JMH benchmarks that do not exist on master. Reviewers cannot reproduce the numbers without manually copying benchmark sources from elsewhere.
This issue tracks contributing the JMH benchmarks themselves so reviewers can reproduce, validate, and continue measuring across future changes.
Problems
1.
parquet-benchmarksshaded jar is broken on masterA build of
parquet-benchmarksfrom the current master produces a jar that is non-functional:The
parquet-benchmarks/pom.xmlis missing two pieces of configuration:maven-compiler-pluginlacks theannotationProcessorPaths/annotationProcessorsconfig forjmh-generator-annprocess. As a result the JMH annotation processor never runs, andMETA-INF/BenchmarkListandMETA-INF/CompilerHintsare never generated. (Workaround: pass-Dmaven.compiler.proc=full, but this is undiscoverable.)maven-shade-pluginlacksAppendingTransformerentries forMETA-INF/BenchmarkListandMETA-INF/CompilerHints. Even if the resources were generated, shading would drop them.2. No benchmarks for the optimizations under review
The 6 open perf PRs touch encode/decode paths in
parquet-columnandparquet-common(PlainValuesReader/Writer, Binary.hashCode, ByteStreamSplitValuesReader/Writer, BinaryPlainValuesReader). Master'sparquet-benchmarkscovers only file-level read/write, not these CPU-bound encoding paths.Proposal
Land the following in a single PR against
parquet-benchmarks:AppendingTransformerentries so the shaded jar is runnable.IntEncodingBenchmark— encode/decode with PLAIN, DELTA_BINARY_PACKED, BYTE_STREAM_SPLIT, RLE, and dictionary, parameterized on value count and data distributionBinaryEncodingBenchmark— Binary write/read paths (PLAIN, dictionary), parameterized on length and cardinalityByteStreamSplitEncodingBenchmark,ByteStreamSplitDecodingBenchmark— BSS encode/decode for float/double/int/longFixedLenByteArrayEncodingBenchmark— FLBA encode/decodeFileReadBenchmark,FileWriteBenchmark— CPU-focused file-level benchmarks (minimal I/O via temp files)RowGroupFlushBenchmark— flush-path benchmarkConcurrentReadWriteBenchmark— multi-threaded read/write throughputBlackHoleOutputFile—OutputFilethat discards bytes, used to isolate CPU work from I/OTestDataFactory— shared test-data generation utilitiesAfter this lands, each existing perf PR will be amended with a one-line "How to reproduce" snippet pointing at the relevant
*Benchmarkclass.Out of scope (deferred)
The existing
ReadBenchmarks,WriteBenchmarks, andNestedNullWritingBenchmarkscould be modernized (Hadoop-freeLocalInputFile, parameterized over compression and writer version, JMH-idiomatic state setup). That is a separate concern and will be proposed in a follow-up PR.Validation
With the proposed pom changes, the shaded jar contains a populated
META-INF/BenchmarkList(87 benchmarks registered) and runs cleanly. As a sanity check,IntEncodingBenchmark.decodePlainreproduces the ~91M ops/s baseline cited in #3493/#3494 (master JDK 21, JMH 1.37, 3 warmup + 5 measurement iterations).