Skip to content

Conversation

@dongjoon-hyun
Copy link
Member

@dongjoon-hyun dongjoon-hyun commented Feb 7, 2026

What changes were proposed in this pull request?

This PR aims to support Parquet LZ4 in bench module.

Why are the changes needed?

To benchmark LZ4 like the other codecs.

How was this patch tested?

Manually run the following.

BUILD

$ cd java

$ mvn package -DskipTests -Pbenchmark

WRITE

$ java -jar core/target/orc-benchmarks-core-*-uber.jar generate data -d sales -c lz4 -f parquet
Processing sales [parquet]
[main] WARN org.apache.hadoop.util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
[main] INFO org.apache.hadoop.io.compress.CodecPool - Got brand-new compressor [.lz4]

FILE NAME

$ ls -alR data/generated/sales
total 13396024
drwxr-xr-x@ 4 dongjoon  staff         128 Feb  6 16:51 .
drwxr-xr-x@ 3 dongjoon  staff          96 Feb  6 14:50 ..
-rw-r--r--@ 1 dongjoon  staff  3768120878 Feb  6 16:53 parquet.lz4

READ

$ java -jar core/target/orc-benchmarks-core-*-uber.jar scan data -d sales -c lz4 -f parquet
...
[main] INFO org.apache.parquet.hadoop.InternalParquetRecordReader - block read in memory in 10 ms. row count = 374588
data/generated/sales/parquet.lz4 rows: 25000000 batches: 24415

PARQUET

$ parquet meta data/generated/sales/parquet.lz4 | head -n3

File path:  data/generated/sales/parquet.lz4
Created by: parquet-mr version 1.17.0 (build fac0c746532e133beb928a7f6a7e57b510b477a1)

$ parquet footer data/generated/sales/parquet.lz4 | grep -i LZ | sort | uniq
        "codec" : "LZ4_RAW",

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Opus 4.5 on Claude Code

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant