Add struct pushdown query benchmark and projection pushdown tests #19962

adriangb · 2026-01-23T23:14:17Z

Summary

Extract benchmarks and sqllogictest cases from #19538 for easier review.

This PR includes:

New Benchmark: parquet_struct_query.rs - Benchmarks SQL queries on struct columns in Parquet files
- 524,288 rows across 8 row groups
- 20 benchmark queries covering struct access, filtering, joins, and aggregations
- Struct schema: id (Int32) and s (Struct with id/Int32 and value/Utf8 fields)
SQLLogicTest: projection_pushdown.slt - Tests for projection pushdown optimization

Changes

Added datafusion/core/benches/parquet_struct_query.rs
Updated datafusion/core/Cargo.toml with benchmark entry
Added datafusion/sqllogictest/test_files/projection_pushdown.slt

Test Plan

Run benchmark: cargo bench --profile dev --bench parquet_struct_query
All 20 benchmark queries execute successfully
Parquet file generated with correct row count (524,288) and row groups (8)

🤖 Generated with Claude Code

Copilot

Pull request overview

This PR extracts benchmarks and sqllogictest cases from PR #19538 for easier review, focusing on testing struct field access projection pushdown optimization in DataFusion.

Changes:

Added comprehensive benchmark suite for SQL queries on struct columns in Parquet files with 20 different query patterns
Added 1000+ line SQLLogicTest file covering projection pushdown behavior with get_field expressions through various operators
Updated Cargo.toml to register the new benchmark

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.

File	Description
datafusion/core/benches/parquet_struct_query.rs	New benchmark file testing struct field queries on Parquet data with various SQL patterns (filters, joins, aggregations, etc.)
datafusion/core/Cargo.toml	Added benchmark entry for parquet_struct_query with parquet feature requirement
datafusion/sqllogictest/test_files/projection_pushdown.slt	Comprehensive test suite for get_field projection pushdown through Filter, Sort, TopK, and multi-partition scenarios

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Extract benchmarks and sqllogictest cases from apache#19538 for easier review. Includes a new benchmark for SQL queries on struct columns in Parquet files, covering struct access, filtering, joins, and aggregations with 524K rows and 8 row groups. Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

alamb

Makes sense to me -- thank you @adriangb

alamb · 2026-01-24T01:16:29Z

datafusion/sqllogictest/test_files/projection_pushdown.slt

+logical_plan
+01)Projection: simple_struct.id, get_field(simple_struct.s, Utf8("value"))
+02)--TableScan: simple_struct projection=[id, s]
+physical_plan DataSourceExec: file_groups={1 group: [[WORKSPACE_ROOT/datafusion/sqllogictest/test_files/scratch/projection_pushdown/simple.parquet]]}, projection=[id, get_field(s@1, value) as simple_struct.s[value]], file_type=parquet


It is interesting that these expressions have already been pushed down to the datasource

Yep in some cases (no sort, no repartition, etc) it already works, but only because all projections are pushed down.

alamb · 2026-01-24T01:20:03Z

datafusion/core/Cargo.toml

 [[bench]]
 harness = false
 name = "parquet_query_sql"
 required-features = ["parquet"]


Is there any reason not to just add the benchmarks to parquet_query_sql?

I could but it’s kind of nice to be able to run them in isolation easily at least for now while we’re developing just these. And in some sense the feature we’re working on needn’t be parquet specific (eg Vortex). We can always fold them later.

adriangb · 2026-01-24T03:10:13Z

Thanks @alamb !

github-actions bot added core Core DataFusion crate sqllogictest SQL Logic Tests (.slt) labels Jan 23, 2026

adriangb requested review from alamb and Copilot January 23, 2026 23:15

Copilot started reviewing on behalf of adriangb January 23, 2026 23:15 View session

Copilot AI reviewed Jan 23, 2026

View reviewed changes

adriangb changed the title ~~Add parquet struct query benchmark and projection pushdown tests~~ Add struct pushdown query benchmark and projection pushdown tests Jan 23, 2026

adriangb and others added 2 commits January 23, 2026 18:53

fmt

30b5888

adriangb force-pushed the add-tests-bench branch from 414b451 to 30b5888 Compare January 23, 2026 23:53

alamb approved these changes Jan 24, 2026

View reviewed changes

adriangb added this pull request to the merge queue Jan 24, 2026

Merged via the queue into apache:main with commit 23f5003 Jan 24, 2026
31 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add struct pushdown query benchmark and projection pushdown tests #19962

Add struct pushdown query benchmark and projection pushdown tests #19962

adriangb commented Jan 23, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

alamb left a comment

Uh oh!

alamb Jan 24, 2026

Uh oh!

adriangb Jan 24, 2026

Uh oh!

alamb Jan 24, 2026

Uh oh!

adriangb Jan 24, 2026

Uh oh!

adriangb commented Jan 24, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add struct pushdown query benchmark and projection pushdown tests #19962

Add struct pushdown query benchmark and projection pushdown tests #19962

Conversation

adriangb commented Jan 23, 2026

Summary

Changes

Test Plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

alamb left a comment

Choose a reason for hiding this comment

Uh oh!

alamb Jan 24, 2026

Choose a reason for hiding this comment

Uh oh!

adriangb Jan 24, 2026

Choose a reason for hiding this comment

Uh oh!

alamb Jan 24, 2026

Choose a reason for hiding this comment

Uh oh!

adriangb Jan 24, 2026

Choose a reason for hiding this comment

Uh oh!

adriangb commented Jan 24, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants