Add benchmarks for Parquet struct leaf-level projection pruning by friendlymatthew · Pull Request #21180 · apache/datafusion

friendlymatthew · 2026-03-26T15:01:49Z

Rationale for this change

This PR adds benchmarks that measure the perf of projecting individual fields from struct columns in Parquet files. #20925 introduced leaf-level projection masking so that select s['small_int'] on a struct with large string fields only reads the small integer leaf, skipping the expensive string decoding entirely

3 dataset shapes are coevered, each with ~262K rows of 8kb string payloads: a narrow struct (2 leaves), a wide struct (5 leaves), and a nested struct. Each shape benchmarks full-struct reads against single-field projections

adriangb

Oh how I wish we could write SQL benchmarks easily... adding to #21165

commit benchmarks

f2830a9

github-actions bot added the core Core DataFusion crate label Mar 26, 2026

adriangb approved these changes Mar 26, 2026

View reviewed changes

try 128kb string

41a8238

adriangb mentioned this pull request Mar 26, 2026

[EPIC] Benchmark improvements #21165

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add benchmarks for Parquet struct leaf-level projection pruning#21180

Add benchmarks for Parquet struct leaf-level projection pruning#21180
friendlymatthew wants to merge 2 commits intoapache:mainfrom
pydantic:friendlymatthew/parquet-struct-bench

friendlymatthew commented Mar 26, 2026

Uh oh!

adriangb left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

friendlymatthew commented Mar 26, 2026

Rationale for this change

Uh oh!

adriangb left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants