Adaptive filter scheduling in the Parquet decoder (replaces PR #9)#11
Adaptive filter scheduling in the Parquet decoder (replaces PR #9)#11
Conversation
|
run benchmark clickbench_partitioned baseline:
ref: main
env:
DATAFUSION_EXECUTION_PARQUET_PUSHDOWN_FILTERS: false
DATAFUSION_EXECUTION_PARQUET_REORDER_FILTERS: false
changed:
ref: HEAD
env:
DATAFUSION_EXECUTION_PARQUET_PUSHDOWN_FILTERS: true
DATAFUSION_EXECUTION_PARQUET_REORDER_FILTERS: true |
Replaces PR #9's morsel-per-row-group split with in-decoder strategy swap. One `ParquetPushDecoder` per file, one `BoxStream` per file, filter placement re-evaluated at every row-group boundary using the shared `SelectivityTracker`. # What's removed (vs PR #9) - The chunk loop (`ParquetAccessPlan::split_into_chunks`, `Vec<BoxStream>` returns from `build_stream`). - Per-chunk `AsyncFileReader::create_reader` minting and per-chunk `RowFilter` rebuild. - The `EarlyStoppingStream`-on-chunk-0-only special case for the non-`Clone` `FilePruner`. - `LazyMorselShared` per-morsel Arc churn — the source of the ~10% aggregate ClickBench regression you flagged in PR #9 review. # What's added `AdaptiveParquetStream` (new in `opener.rs`) drives one row group at a time via `try_next_reader`: 1. Pull a `ParquetRecordBatchReader` for the next row group. 2. Iterate it synchronously; each batch goes through any post-scan filters (which feed per-filter stats into the tracker) and then through the projector. 3. When the reader exhausts, ask the tracker to re-partition filters based on accumulated stats. If the row-filter set changed, build a new `RowFilter` and call the new arrow-rs `ParquetPushDecoder::swap_strategy` before requesting the next reader. Post-scan filters update in lockstep. `PushBuffers` carries through the swap so already-fetched bytes are preserved, and the optional-filter mid-stream skip mechanism (existing `OptionalFilterPhysicalExpr` + `tracker.is_filter_skipped`) keeps working unchanged inside `apply_post_scan_filters_with_stats`. # Carried-over machinery (file-level checkout from `dbcf5ac1e`) - `selectivity.rs` — `SelectivityTracker`, `PartitionedFilters`, `FilterId`, Welford CI bounds. Verbatim. - `row_filter.rs` — new `build_row_filter` signature returning `(Option<RowFilter>, UnbuildableFilters)` plus `total_compressed_bytes`, plus `DatafusionArrowPredicate` stat hooks. - `physical_expr.rs` — `OptionalFilterPhysicalExpr`, `snapshot_generation` helpers. `Display` is **pass-through** here (PR #9 used `Optional(...)`), keeping every existing sqllogictest expected output intact. - `config.rs` — adds `filter_pushdown_min_bytes_per_sec` / `filter_collecting_byte_ratio_threshold` / `filter_confidence_z`. **`reorder_filters` is preserved as a deprecated no-op** (per request) — the adaptive tracker subsumes it. - `selectivity_tracker.rs` bench — verbatim. - Per-file plumbing in `source.rs`: `predicate_conjuncts: Vec<(FilterId, Arc<PhysicalExpr>)>` instead of a single AND-ed predicate so per-conjunct stats accumulate across files. # arrow-rs companion branch Depends on `pydantic/arrow-rs:adaptive-strategy-swap`, which adds `ParquetPushDecoder::can_swap_strategy()` / `swap_strategy(StrategySwap)` and the `StrategySwap` builder. The `Cargo.toml` `[patch.crates-io]` block points at it. # What's not in this PR (deferred) - Sub-row-group adaptation (would need a `ParquetRecordBatchReader::pause` primitive in arrow-rs to yield a residual `RowSelection`); useful for TPCDS-style single-huge-row-group files. Defer. - Three new config knobs aren't in the proto schema yet; `from_proto` fills with config defaults so a roundtrip preserves behavior. # Tests - `cargo test -p datafusion-datasource-parquet --lib` — 143 passed - `cargo test -p datafusion --lib` — 410 passed - `cargo test -p datafusion --test core_integration` — 935 passed - `cargo test -p datafusion-sqllogictest --test sqllogictests` — all pass except `encrypted_parquet.slt` (pre-existing on upstream/main, not related to this change) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Fix 6 broken intra-doc links in `opener.rs`: `RowFilter`, `PushBuffers`, `AsyncFileReader::create_reader`, `SelectivityTracker` weren't visible from the doc-comment scope. Reword to plain backticks for the names that don't have a stable in-scope path; route `SelectivityTracker` through `crate::selectivity::SelectivityTracker`. - Regenerate `docs/source/user-guide/configs.md` via `dev/update_config_docs.sh` to surface the three new `filter_pushdown_min_bytes_per_sec` / `filter_collecting_byte_ratio_threshold` / `filter_confidence_z` rows the CI doc check expects. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…33dd62 Picks up the rustdoc fix from the arrow-rs companion branch so the DataFusion CI doc job resolves clean too. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
4a4e300 to
d379196
Compare
The example asserts `pushdown_rows_pruned=1` to demonstrate that the row-filter path actually evicts rows. Under the adaptive scheduler's default `filter_pushdown_min_bytes_per_sec = 100 MB/s`, a small example file's filter starts on the post-scan path (where `pushdown_rows_pruned` stays 0) and the assertion fires. Set `filter_pushdown_min_bytes_per_sec = 0` to disable the throughput check and force every filter to row-level — the same lever `physical_plan/parquet.rs` test harness uses. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
d379196 to
88ab545
Compare
|
🤖 Benchmark running (GKE) | trigger CPU Details (lscpu)Comparing HEAD (88ab545) to main diff using: clickbench_partitioned File an issue against this benchmark runner |
|
🤖 Benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usageclickbench_partitioned — base (merge-base)
clickbench_partitioned — branch
File an issue against this benchmark runner |
Two fixes for benchmark regressions and crashes on hits_partitioned ClickBench queries: # Hard failures (Q36, Q38, Q41, Q42) `build_stream` was building the wide ProjectionMask from `user projection ∪ post_scan_conjuncts` only, but a row-level conjunct can get demoted to post-scan mid-stream by `maybe_swap_strategy`. When that happened, the demoted filter's column wasn't in the `stream_schema`, and the post-scan rebase via `reassign_expr_columns` fired a `Schema error: Unable to get field named "..."` against the narrow batch. Fix: include **every** predicate conjunct's columns in the wide projection regardless of current placement. Filter-only columns are still stripped after post-scan filtering by the projector, so the user-visible schema is unchanged. # Initial-placement regressions (Q10, Q11, Q13, Q14, Q26) Queries shaped like `SELECT col, ... FROM t WHERE col <> '' GROUP BY col` had the filter column already in the user projection. The byte-ratio heuristic was counting filter bytes against projection bytes naively, so `MobilePhoneModel_bytes / (MobilePhoneModel_bytes + UserID_bytes) ≈ 0.5` exceeded the 0.20 threshold and pushed the filter to post-scan — even though row-level was strictly better (zero extra I/O, late materialization saves UserID decode for pruned rows). Fix: change the heuristic numerator from `filter_bytes` to **extra** bytes — bytes for filter columns *not* already in the user projection. A filter that only references projection columns now gets `byte_ratio = 0` and starts at row-level. Threading required: add `projection_columns: &HashSet<usize>` to `SelectivityTracker::partition_filters` (and the inner impl); opener's `AdaptiveParquetStream` carries it for mid-stream re-evals. # Test plan - All 4 hard-failure queries (Q36/Q38/Q41/Q42) now run to completion locally on hits_partitioned. - 143 datasource-parquet unit tests pass (38 partition_filters call-sites in the test module updated to the new signature). - Benchmark expectations: Q23/Q22/Q6 wins should hold; Q10/Q11/Q13/Q14 regressions should resolve via the better initial placement. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
run benchmark clickbench_partitioned baseline:
ref: main
env:
DATAFUSION_EXECUTION_PARQUET_PUSHDOWN_FILTERS: false
DATAFUSION_EXECUTION_PARQUET_REORDER_FILTERS: false
changed:
ref: HEAD
env:
DATAFUSION_EXECUTION_PARQUET_PUSHDOWN_FILTERS: true
DATAFUSION_EXECUTION_PARQUET_REORDER_FILTERS: true |
|
🤖 Benchmark running (GKE) | trigger CPU Details (lscpu)Comparing HEAD (1301112) to main diff using: clickbench_partitioned File an issue against this benchmark runner |
|
🤖 Benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usageclickbench_partitioned — base (merge-base)
clickbench_partitioned — branch
File an issue against this benchmark runner |
…ement Bench showed Q10/Q11/Q13/Q14/Q26 still regressing 1.20-1.47x even after the overlap-aware heuristic. These queries are shaped like \`SELECT col, ... FROM t WHERE col <> '' GROUP BY col\` — filter column entirely in projection, so \`extra_bytes = 0\` and \`byte_ratio = 0\`. The previous heuristic placed them at row-level since \`0 <= threshold\`, but row-level *isn't* free even at zero extra I/O: predicate-cache eviction on heavy string columns means the filter column gets decoded twice (once for the predicate eval, once for the projection), and the late-materialization payoff depends on a selectivity we don't know yet. Local timings on hits_partitioned (release mode): | Query | main + no-pushdown (baseline) | branch (old heuristic) | branch (new heuristic) | |-------|------------------------------:|-----------------------:|-----------------------:| | Q23 | 3708 ms | 219 ms* | 219 ms | | Q22 | 1344 ms | 902 ms* | 902 ms | | Q26 | 41 ms | 60 ms | 48 ms | | Q10 | 82 ms | 109 ms | 88 ms | Q23/Q22 wins are preserved (Q23 +17x faster vs baseline, Q22 +1.5x). Q10/Q26 regressions go from 1.32-1.45x to 1.07-1.17x — the residual is the cost of pushdown_filters=true vs false generally, not our adaptive layer. Why Q23 isn't hurt: its huge speedup comes from row-group statistics pruning via the TopK dynamic filter on EventTime, not from row-level filter evaluation. Pruning is independent of row-level vs post-scan placement; the dynamic filter still reaches the source and the PruningPredicate still applies. (Local repro confirms — Q23 actually gets slightly faster on the new heuristic because we skip the double-decode of the heavy URL string column.) Implementation: change the new-filter row-level condition from \`byte_ratio <= threshold\` to \`extra_bytes > 0 && byte_ratio <= threshold\`. Pure-overlap filters (extra_bytes == 0) start at post-scan; the tracker promotes them later if measured bytes-saved-per-sec justifies it. Filters with non-zero extra cost that fits within \`byte_ratio_threshold\` (small int predicate against a heavy string projection) still start at row-level — that's the case where the heuristic is genuinely useful. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
run benchmark clickbench_partitioned baseline:
ref: main
env:
DATAFUSION_EXECUTION_PARQUET_PUSHDOWN_FILTERS: false
DATAFUSION_EXECUTION_PARQUET_REORDER_FILTERS: false
changed:
ref: HEAD
env:
DATAFUSION_EXECUTION_PARQUET_PUSHDOWN_FILTERS: true
DATAFUSION_EXECUTION_PARQUET_REORDER_FILTERS: true |
|
🤖 Benchmark running (GKE) | trigger CPU Details (lscpu)Comparing HEAD (40a1ff8) to main diff using: clickbench_partitioned File an issue against this benchmark runner |
|
🤖 Benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usageclickbench_partitioned — base (merge-base)
clickbench_partitioned — branch
File an issue against this benchmark runner |
Two changes that work together to make Q10/Q11/Q13/Q14/Q26 stop regressing without giving up the Q23/Q22 wins. # 1. Prune-rate gate on PostScan → RowFilter promotion Adds a second gate on top of the existing `filter_pushdown_min_bytes_per_sec` CI bound: a filter only gets promoted from post-scan to row-level if it actually prunes >= 99% of rows it sees. Why: the bytes-saved-per-sec metric is "potential savings if at row-level" (rows_pruned × non-filter-projection-bytes-per-row ÷ eval_time). For ClickBench Q10 (\`MobilePhoneModel <> ''\`) the selectivity is ~94% and the projection is heavy, so bytes-saved-per-sec clears the 100 MB/s threshold easily. But row-level *actually loses* to post-scan there because survivors are uniformly scattered: at 8K rows per page, p^N for p=0.94 is ~10^-220 — effectively zero pages can be skipped, RowSelection-driven decode is just as expensive as a contiguous post-scan read but with extra predicate-cache eviction on the heavy string column. The 0.99 gate captures the scatter problem structurally: - Clustered survivors (TopK dynamic filter, hash-join build): prune_rate trivially ≥ 0.99 once K shrinks. Page-skip works. Promote. - Uniform survivors at moderate selectivity (Q10/Q11/Q13/Q14/Q26): prune_rate stays at 0.5–0.95. Page-skip can't work no matter how big bytes-saved-per-sec is. Stay at post-scan. Q22's `Title LIKE '%Google%'` (prune_rate ~1.0) and Q23's `URL LIKE '%google%'` (similar) trivially clear the gate, so their big wins are preserved. # 2. Drop STATS_SAMPLE_INTERVAL (1/32 → every batch) I added the 1/32 sampling earlier when the per-batch `Instant + tracker.update` was clearly hot — but at the time the heuristic was over-promoting these queries to row-level, making the per-batch path matter much more. Now that the prune-rate gate keeps them at post-scan, sampling actually *hurts*: with 1/32 the Welford accumulator converges 32× slower, so the tracker takes longer to realize "this filter is bad at row-level" and the in-flight filter flips state more often. Updating every batch is faster on every query I measured (Q23, Q22, Q26, Q10). `SKIP_FLAG_CHECK_INTERVAL = 4` stays — it gates the OptionalFilter skip-flag check, not the Welford update, and removing *it* added ~200ms to Q22 (the post-update lock-juggle isn't free). # Local timings (warm, hits_partitioned, 12 partitions) | Query | main+nopush | branch | Δ | |-------|------------:|-------:|---| | Q23 | 3271ms | 168ms | **+19.5x** | | Q22 | 1069ms | 901ms | +1.19x | | Q26 | 39ms | 41ms | matches (+2ms) | | Q10 | 68ms | 59ms | **+1.15x** | All four ≥ baseline. Q26 is essentially break-even; the residual 2ms is below run-to-run noise. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
run benchmark clickbench_partitioned baseline:
ref: main
env:
DATAFUSION_EXECUTION_PARQUET_PUSHDOWN_FILTERS: false
DATAFUSION_EXECUTION_PARQUET_REORDER_FILTERS: false
changed:
ref: HEAD
env:
DATAFUSION_EXECUTION_PARQUET_PUSHDOWN_FILTERS: true
DATAFUSION_EXECUTION_PARQUET_REORDER_FILTERS: true |
|
🤖 Benchmark running (GKE) | trigger CPU Details (lscpu)Comparing HEAD (05590a2) to main diff using: clickbench_partitioned File an issue against this benchmark runner |
|
🤖 Benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usageclickbench_partitioned — base (merge-base)
clickbench_partitioned — branch
File an issue against this benchmark runner |
Earlier I had two sampling/gate constants protecting the hot per-batch
update path:
- \`STATS_SAMPLE_INTERVAL = 32\` in opener.rs: skip the
\`Instant::now\` + \`tracker.update\` work on 31 of every 32 batches.
- \`SKIP_FLAG_CHECK_INTERVAL = 4\` in selectivity.rs: inside
tracker.update, skip the post-stats CI-bound + lock-juggle path on
3 of every 4 calls.
Both were "right" given the prior over-promotion problem (filters
landing at row-level when they shouldn't, making the per-batch path
hot and the CI calc wasted). With the new \`prune_rate >= 0.99\` gate
those filters stay at post-scan and the measurements no longer
support sampling:
- Removing \`STATS_SAMPLE_INTERVAL\` (every batch updates) is
*faster* than 1/32 across Q23/Q22/Q26/Q10. Slower convergence on
1/32 made the tracker take longer to settle, so the in-flight
filter chain flipped state more often.
- \`SKIP_FLAG_CHECK_INTERVAL = 4\` was protecting *non-optional*
filters from a wasted-work path (post-stats CI calc + lock release
+ is_optional HashMap read + lock reacquire) that they didn't need
at all. The right fix is to early-return for non-optional filters
*before* that path, not to amortize it across 4 calls.
This refactor:
1. Caches \`is_optional: bool\` inline on \`SelectivityStats\`.
Non-optional filters early-return after the Welford update with
a single field load on the already-held stats lock — no extra
HashMap, no \`RwLock::read()\`, no \`drop\` + reacquire.
2. For optional filters (hash-join build / TopK dynamic), the
skip-flag CI check now runs every batch. That's what we want:
when a filter's selectivity collapses, the skip flag should fire
ASAP. Q26's TopK dynamic filter benefits visibly from this.
3. Drops the now-redundant \`SelectivityTracker::is_optional\`
HashMap and \`PartitionResult::new_filter_ids\` (was duplicating
\`new_optional_flags\`). The is_optional bit moves to where it's
read.
4. Drops the sampling in \`apply_post_scan_filters_with_stats\`.
\`tracker.update\` is now cheap enough on the fast path that
sampling actively hurts (slower convergence > saved work).
Local timings (warm, hits_partitioned, 12 partitions):
| Query | main+nopush | branch | Δ |
|-------|------------:|-------:|---|
| Q23 | 3271ms | 139ms | **+23.5x** |
| Q22 | 1069ms | 898ms | +1.19x |
| Q26 | 39ms | 39ms | matches |
| Q10 | 68ms | 59ms | **+1.15x** |
143 lib tests pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
run benchmark clickbench_partitioned baseline:
ref: main
env:
DATAFUSION_EXECUTION_PARQUET_PUSHDOWN_FILTERS: false
DATAFUSION_EXECUTION_PARQUET_REORDER_FILTERS: false
changed:
ref: HEAD
env:
DATAFUSION_EXECUTION_PARQUET_PUSHDOWN_FILTERS: true
DATAFUSION_EXECUTION_PARQUET_REORDER_FILTERS: true |
|
🤖 Benchmark running (GKE) | trigger CPU Details (lscpu)Comparing HEAD (ca51dec) to main diff using: clickbench_partitioned File an issue against this benchmark runner |
|
🤖 Benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usageclickbench_partitioned — base (merge-base)
clickbench_partitioned — branch
File an issue against this benchmark runner |
Replaces the all-or-nothing batch-level "if matched == 0, all skippable;
otherwise 0" computation with a sub-batch windowed analysis fed by a
new \`count_skippable_bytes\` helper. The metric is now:
for each batch:
skippable_bytes_for_batch = total_other_projection_bytes_for_batch
× (windows-with-zero-survivors / total-windows)
with W = 8192 rows (short-circuited so total_windows=1 ⇒ binary
"is the whole batch all-pruned" — equivalent to the old behavior on
typical 8K batch sizes, but with the structure in place for finer W
on larger pages or different writers).
Why: \`filter_pushdown_min_bytes_per_sec\` is the right *unit* but the
metric feeding it overestimated savings whenever the filter pruned
rows that the row-level decoder couldn't actually drop a page on. A
50% filter on uniform data still costs full IO at row-level (every
page has survivors); a 50% filter on contiguous data lets the
decoder skip half the pages. The windowed analysis discriminates
these — same formula at post-scan (predicting what row-level would
save) and at row-level (measuring what the decoder did skip, modulo
within-window RowSelection narrowing which is an uncounted bonus).
Same metric on both sides means \`min_bytes_per_sec\` is the only knob;
no separate prune-rate gate. The 0.99 gate is now redundant — if
prune-rate is high enough that page-skip works, the metric already
clears the threshold; if prune-rate is high but scatter is uniform
(case C, ClickBench Q10/Q11/Q13/Q14/Q26), the metric stays low and
the filter stays at post-scan.
Helper short-circuits when:
- batch is fully pruned (\`true_count == 0\`) → all skippable,
- batch has no zeros (\`true_count == n\`) → 0 skippable,
- there's only one window (\`n ≤ W\`) and the answer is determined.
This avoids ~2× per-batch \`true_count\` work that was visible as a
regression when I first wired the helper through.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
run benchmark clickbench_partitioned baseline:
ref: main
env:
DATAFUSION_EXECUTION_PARQUET_PUSHDOWN_FILTERS: false
DATAFUSION_EXECUTION_PARQUET_REORDER_FILTERS: false
changed:
ref: HEAD
env:
DATAFUSION_EXECUTION_PARQUET_PUSHDOWN_FILTERS: true
DATAFUSION_EXECUTION_PARQUET_REORDER_FILTERS: true |
|
🤖 Benchmark running (GKE) | trigger CPU Details (lscpu)Comparing HEAD (97c62a6) to main diff using: clickbench_partitioned File an issue against this benchmark runner |
Summary
Replaces PR #9's morsel-per-row-group split with in-decoder strategy swap: one
ParquetPushDecoderper file, oneBoxStreamper file, filter placement re-evaluated at every row-group boundary using the sharedSelectivityTracker.Filters can now adapt mid-stream (between row groups) without splitting files into chunks. The arrow-rs companion change adds a small
ParquetPushDecoder::swap_strategyAPI; the DataFusion side uses it from a single adaptive stream wrapper.arrow-rs companion branch
This PR depends on
pydantic/arrow-rsbranchadaptive-strategy-swap(CI green at pydantic/arrow-rs#9), referenced via[patch.crates-io]in the workspaceCargo.toml.The arrow-rs additions:
pub fn can_swap_strategy(&self) -> bool— true between row groups.pub fn swap_strategy(&mut self, swap: StrategySwap) -> Result<()>— replaces projection / row filter / row selection policy at a row-group boundary; rejected mid-row-group.pub struct StrategySwap(#[non_exhaustive]) with builder methods.pub fn row_groups_remaining(&self) -> usizefor diagnostics.PushBufferscarries through the swap, so bytes already fetched for columns that survive the new strategy are reused.What's removed (vs PR #9)
ParquetAccessPlan::split_into_chunks,Vec<BoxStream>returns frombuild_stream).AsyncFileReader::create_readerminting and per-chunkRowFilterrebuild (RowFilteris!Clone).EarlyStoppingStream-on-chunk-0-only special case for the non-CloneFilePruner.LazyMorselSharedper-morsel Arc churn — the source of the ~10% aggregate ClickBench regression flagged in PR Adaptive filter scheduling + row-group morsel split #9 review.What's added
AdaptiveParquetStreaminopener.rsdrives one row group at a time viatry_next_reader:ParquetRecordBatchReaderfor the next row group.RowFilterand calldecoder.swap_strategy(...)before requesting the next reader. Post-scan filters update in lockstep.PushBufferscarries through the swap so already-fetched bytes are preserved. The optional-filter mid-stream skip mechanism (existingOptionalFilterPhysicalExpr+tracker.is_filter_skipped) keeps working unchanged insideapply_post_scan_filters_with_stats.Carried-over machinery (file-level checkout from
dbcf5ac1e)selectivity.rs—SelectivityTracker,PartitionedFilters,FilterId, Welford CI bounds.row_filter.rs— newbuild_row_filtersignature returning(Option<RowFilter>, UnbuildableFilters)plustotal_compressed_bytes, plusDatafusionArrowPredicatestat hooks.physical_expr.rs—OptionalFilterPhysicalExpr,snapshot_generationhelpers.Displayis pass-through here (PR Adaptive filter scheduling + row-group morsel split #9 usedOptional(...)), keeping every existing sqllogictest expected output intact.config.rs— addsfilter_pushdown_min_bytes_per_sec/filter_collecting_byte_ratio_threshold/filter_confidence_z.reorder_filtersis preserved as a deprecated no-op — the adaptive tracker subsumes it.source.rs:predicate_conjuncts: Vec<(FilterId, Arc<PhysicalExpr>)>instead of a single AND-ed predicate so per-conjunct stats accumulate across files.Deferred
ParquetRecordBatchReader::pausein arrow-rs to yield a residualRowSelection). Useful for TPCDS-style single-huge-row-group files. Out of scope here.from_protofills with config defaults so a roundtrip preserves behavior. Worth a follow-up to plumb them through the proto.Known pre-existing CI flake
datafusion/sqllogictest/test_files/explain_analyze.slt:103(theoutput_rows_skewskew metric test) is failing onapache/datafusionmainitself — see run 25027102370 on commit310dd5d4, identical diffexpected 84.31% / actual 100%. Not introduced by this branch. Fixing it is out of scope; this PR matches the pre-existing CI baseline.Test plan
cargo fmt --allcargo clippy --workspace --all-targets ... -- -D warningscargo test -p datafusion-datasource-parquet --lib— 143 passedcargo test -p datafusion --lib— 410 passedcargo test -p datafusion --test core_integration— 935 passedcargo doc --workspace --no-deps— cleancargo run --example data_io—json_shreddingpasses withpushdown_rows_pruned=1cargo test -p datafusion-sqllogictest --test sqllogictests— pass except the pre-existingexplain_analyze.sltandencrypted_parquet.sltflakes that fail onapache/main608f280c4/32195d6da/4a4e300eb)LazyMorselSharedchurn disappearssmall_table JOIN large_tablewithWHERE small_table.v >= 50): unchanged from main🤖 Generated with Claude Code