Add JMH benchmarks for encoding/decoding paths and fix parquet-benchmarks shaded jar

## Background

The 6 perf-optimization PRs currently open (#3494, #3496, #3500, #3504, #3506, #3510) report headline numbers (12x decode speedup, 7193% Binary.hashCode improvement, etc.) but cite JMH benchmarks that **do not exist on master**. Reviewers cannot reproduce the numbers without manually copying benchmark sources from elsewhere.

This issue tracks contributing the JMH benchmarks themselves so reviewers can reproduce, validate, and continue measuring across future changes.

## Problems

### 1. `parquet-benchmarks` shaded jar is broken on master

A build of `parquet-benchmarks` from the current master produces a jar that is **non-functional**:

```
$ java -jar parquet-benchmarks/target/parquet-benchmarks.jar
Exception in thread "main" java.lang.RuntimeException: ERROR: Unable to find the resource: /META-INF/BenchmarkList
```

The `parquet-benchmarks/pom.xml` is missing two pieces of configuration:

- The `maven-compiler-plugin` lacks the `annotationProcessorPaths` / `annotationProcessors` config for `jmh-generator-annprocess`. As a result the JMH annotation processor never runs, and `META-INF/BenchmarkList` and `META-INF/CompilerHints` are never generated. (Workaround: pass `-Dmaven.compiler.proc=full`, but this is undiscoverable.)
- The `maven-shade-plugin` lacks `AppendingTransformer` entries for `META-INF/BenchmarkList` and `META-INF/CompilerHints`. Even if the resources were generated, shading would drop them.

### 2. No benchmarks for the optimizations under review

The 6 open perf PRs touch encode/decode paths in `parquet-column` and `parquet-common` (PlainValuesReader/Writer, Binary.hashCode, ByteStreamSplitValuesReader/Writer, BinaryPlainValuesReader). Master's `parquet-benchmarks` covers only file-level read/write, not these CPU-bound encoding paths.

## Proposal

Land the following in a single PR against `parquet-benchmarks`:

1. **pom.xml fix**: add JMH annotation-processor config + `AppendingTransformer` entries so the shaded jar is runnable.
2. **11 new JMH benchmark files** covering the encoding/decoding paths under optimization, plus supporting infrastructure:
   - `IntEncodingBenchmark` — encode/decode with PLAIN, DELTA_BINARY_PACKED, BYTE_STREAM_SPLIT, RLE, and dictionary, parameterized on value count and data distribution
   - `BinaryEncodingBenchmark` — Binary write/read paths (PLAIN, dictionary), parameterized on length and cardinality
   - `ByteStreamSplitEncodingBenchmark`, `ByteStreamSplitDecodingBenchmark` — BSS encode/decode for float/double/int/long
   - `FixedLenByteArrayEncodingBenchmark` — FLBA encode/decode
   - `FileReadBenchmark`, `FileWriteBenchmark` — CPU-focused file-level benchmarks (minimal I/O via temp files)
   - `RowGroupFlushBenchmark` — flush-path benchmark
   - `ConcurrentReadWriteBenchmark` — multi-threaded read/write throughput
   - `BlackHoleOutputFile` — `OutputFile` that discards bytes, used to isolate CPU work from I/O
   - `TestDataFactory` — shared test-data generation utilities

After this lands, each existing perf PR will be amended with a one-line "How to reproduce" snippet pointing at the relevant `*Benchmark` class.

### Out of scope (deferred)

The existing `ReadBenchmarks`, `WriteBenchmarks`, and `NestedNullWritingBenchmarks` could be modernized (Hadoop-free `LocalInputFile`, parameterized over compression and writer version, JMH-idiomatic state setup). That is a separate concern and will be proposed in a follow-up PR.

## Validation

With the proposed pom changes, the shaded jar contains a populated `META-INF/BenchmarkList` (87 benchmarks registered) and runs cleanly. As a sanity check, `IntEncodingBenchmark.decodePlain` reproduces the ~91M ops/s baseline cited in #3493/#3494 (master JDK 21, JMH 1.37, 3 warmup + 5 measurement iterations).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add JMH benchmarks for encoding/decoding paths and fix parquet-benchmarks shaded jar #3511

Background

Problems

1. `parquet-benchmarks` shaded jar is broken on master

2. No benchmarks for the optimizations under review

Proposal

Out of scope (deferred)

Validation

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Add JMH benchmarks for encoding/decoding paths and fix parquet-benchmarks shaded jar #3511

Description

Background

Problems

1. parquet-benchmarks shaded jar is broken on master

2. No benchmarks for the optimizations under review

Proposal

Out of scope (deferred)

Validation

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

1. `parquet-benchmarks` shaded jar is broken on master