Skip to content

Add benchmarks for Parquet struct leaf-level projection pruning#21180

Open
friendlymatthew wants to merge 2 commits intoapache:mainfrom
pydantic:friendlymatthew/parquet-struct-bench
Open

Add benchmarks for Parquet struct leaf-level projection pruning#21180
friendlymatthew wants to merge 2 commits intoapache:mainfrom
pydantic:friendlymatthew/parquet-struct-bench

Conversation

@friendlymatthew
Copy link
Contributor

Rationale for this change

This PR adds benchmarks that measure the perf of projecting individual fields from struct columns in Parquet files. #20925 introduced leaf-level projection masking so that select s['small_int'] on a struct with large string fields only reads the small integer leaf, skipping the expensive string decoding entirely

3 dataset shapes are coevered, each with ~262K rows of 8kb string payloads: a narrow struct (2 leaves), a wide struct (5 leaves), and a nested struct. Each shape benchmarks full-struct reads against single-field projections

@github-actions github-actions bot added the core Core DataFusion crate label Mar 26, 2026
Copy link
Contributor

@adriangb adriangb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh how I wish we could write SQL benchmarks easily... adding to #21165

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Core DataFusion crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants