chore: Experiments with `native_datafusion` scan optimizations by comphead · Pull Request #3755 · apache/datafusion-comet

comphead · 2026-03-21T19:37:33Z

Which issue does this PR close?

Part of #3748

Rationale for this change

This PR addresses some low hanging fruits with native_datafusion scans:

Cache parquet footer metadata across partitions to avoid unnecessary work, especially important for files with huge schema
optimize O(n*m) schema case sensitive transformation calls, in call stack currently I can see

  22.57 MB       0.2%	1210640	 	alloc::str::_$LT$impl$u20$str$GT$::to_lowercase::h97626021e3e4d091
  22.57 MB       0.2%	1210560	 	 _$LT$comet..parquet..schema_adapter..SparkPhysicalExprAdapterFactory$u20$as$u20$datafusion_physical_expr_adapter..schema_rewriter..PhysicalExprAdapterFactory$GT$::create::hd8340ae81808f4b1
  22.57 MB       0.2%	1210560	 	  _$LT$datafusion_datasource_parquet..opener..ParquetOpener$u20$as$u20$datafusion_datasource..file_stream..FileOpener$GT$::open::_$u7b$$u7b$closure$u7d$$u7d$::h3da9f8bfc88e2ec1
  22.57 MB       0.2%	1210560	 	   _$LT$datafusion_datasource..file_stream..FileStream$u20$as$u20$futures_core..stream..Stream$GT$::poll_next::hdbfbe1789f8ca04e
  22.57 MB       0.2%	1210560	 	    _$LT$datafusion_physical_plan..coop..CooperativeStream$LT$T$GT$$u20$as$u20$futures_core..stream..Stream$GT$::poll_next::haf0d3c646b21dc34
  22.57 MB       0.2%	1210560	 	     _$LT$datafusion_physical_plan..stream..BatchSplitStream$u20$as$u20$futures_core..stream..Stream$GT$::poll_next::h776c49f166956374
  22.57 MB       0.2%	1210560	 	      comet::execution::jni_api::Java_org_apache_comet_Native_executePlan::_$u7b$$u7b$closure$u7d$$u7d$::_$u7b$$u7b$closure$u7d$$u7d$::_$u7b$$u7b$closure$u7d$$u7d$::hedc575c357701add
  22.57 MB       0.2%	1210560	 	       tokio::runtime::task::raw::poll::h9f995a0e9bae688f
  22.57 MB       0.2%	1210560	 	        tokio::runtime::scheduler::multi_thread::worker::Context::run_task::h14fd0b61f20c7a54
  22.57 MB       0.2%	1210560	 	         tokio::runtime::scheduler::multi_thread::worker::run::hbc8a3dbb6ce91c58
  22.57 MB       0.2%	1210560	 	          tokio::runtime::task::raw::poll::h578114713c014b13
  22.57 MB       0.2%	1210560	 	           std::sys::backtrace::__rust_begin_short_backtrace::h721a2d1d9a0ad1e9
  22.57 MB       0.2%	1210560	 	            core::ops::function::FnOnce::call_once$u7b$$u7b$vtable.shim$u7d$$u7d$::hcb79850325811dbc

which allocates 1M times when doing case-sensitive transformation in schema adapter for 20K rows with deeply nested schema

What changes are included in this PR?

How are these changes tested?

andygrove · 2026-03-22T16:44:43Z

native/core/src/parquet/schema_adapter.rs

+            // Pre-compute lowercased physical field names to avoid repeated
+            // to_lowercase() calls in the O(n*m) matching loop.


Is there any chance we could use eq_ignore_ascii_case to avoid allocating the lower case strings?

thats good point

comphead · 2026-03-22T18:00:27Z

Not sure why partitioned table is cached when partition pruning is true *** FAILED *** (1 second, 574 milliseconds) 👀

comphead added 3 commits March 21, 2026 12:36

chore: Experiments with native_datafusion scan optimizations

ba4ebfd

fmt

c1f1718

fmt

2d96eda

comphead requested review from andygrove and mbutrovich March 22, 2026 04:23

fmt

052015a

andygrove reviewed Mar 22, 2026

View reviewed changes

eq_ignore_ascii_case

a119840

make cache instance specific

bec4fc6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore: Experiments with `native_datafusion` scan optimizations#3755

chore: Experiments with `native_datafusion` scan optimizations#3755
comphead wants to merge 6 commits intoapache:mainfrom
comphead:tests

comphead commented Mar 21, 2026 •

edited

Loading

Uh oh!

andygrove Mar 22, 2026

Uh oh!

comphead Mar 22, 2026

Uh oh!

comphead commented Mar 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		// Pre-compute lowercased physical field names to avoid repeated
		// to_lowercase() calls in the O(n*m) matching loop.

Conversation

comphead commented Mar 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

How are these changes tested?

Uh oh!

andygrove Mar 22, 2026

Choose a reason for hiding this comment

Uh oh!

comphead Mar 22, 2026

Choose a reason for hiding this comment

Uh oh!

comphead commented Mar 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

comphead commented Mar 21, 2026 •

edited

Loading