ORC-2082: Support Parquet LZ4 in bench module #2521

dongjoon-hyun · 2026-02-07T00:58:29Z

What changes were proposed in this pull request?

This PR aims to support Parquet LZ4 in bench module.

Why are the changes needed?

To benchmark LZ4 like the other codecs.

How was this patch tested?

Manually run the following.

BUILD

$ cd java

$ mvn package -DskipTests -Pbenchmark

WRITE

$ java -jar core/target/orc-benchmarks-core-*-uber.jar generate data -d sales -c lz4 -f parquet
Processing sales [parquet]
[main] WARN org.apache.hadoop.util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
[main] INFO org.apache.hadoop.io.compress.CodecPool - Got brand-new compressor [.lz4]

FILE NAME

$ ls -alR data/generated/sales
total 13396024
drwxr-xr-x@ 4 dongjoon  staff         128 Feb  6 16:51 .
drwxr-xr-x@ 3 dongjoon  staff          96 Feb  6 14:50 ..
-rw-r--r--@ 1 dongjoon  staff  3768120878 Feb  6 16:53 parquet.lz4

READ

$ java -jar core/target/orc-benchmarks-core-*-uber.jar scan data -d sales -c lz4 -f parquet
...
[main] INFO org.apache.parquet.hadoop.InternalParquetRecordReader - block read in memory in 10 ms. row count = 374588
data/generated/sales/parquet.lz4 rows: 25000000 batches: 24415

PARQUET

$ parquet meta data/generated/sales/parquet.lz4 | head -n3

File path:  data/generated/sales/parquet.lz4
Created by: parquet-mr version 1.17.0 (build fac0c746532e133beb928a7f6a7e57b510b477a1)

$ parquet footer data/generated/sales/parquet.lz4 | grep -i LZ | sort | uniq
        "codec" : "LZ4_RAW",

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Opus 4.5 on Claude Code

ORC-2082: Support Parquet LZ4 in bench module

1c35b2c

github-actions bot added the JAVA label Feb 7, 2026

dongjoon-hyun closed this in 66375ec Feb 7, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ORC-2082: Support Parquet LZ4 in bench module #2521

ORC-2082: Support Parquet LZ4 in bench module #2521

dongjoon-hyun commented Feb 7, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

ORC-2082: Support Parquet LZ4 in bench module #2521

ORC-2082: Support Parquet LZ4 in bench module #2521

Conversation

dongjoon-hyun commented Feb 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

dongjoon-hyun commented Feb 7, 2026 •

edited

Loading