Skip to content

Adaptive filter scheduling in the Parquet decoder (replaces PR #9)#11

Open
adriangb wants to merge 9 commits intomainfrom
adaptive-filters-in-decoder
Open

Adaptive filter scheduling in the Parquet decoder (replaces PR #9)#11
adriangb wants to merge 9 commits intomainfrom
adaptive-filters-in-decoder

Conversation

@adriangb
Copy link
Copy Markdown
Owner

@adriangb adriangb commented Apr 27, 2026

Summary

Replaces PR #9's morsel-per-row-group split with in-decoder strategy swap: one ParquetPushDecoder per file, one BoxStream per file, filter placement re-evaluated at every row-group boundary using the shared SelectivityTracker.

Filters can now adapt mid-stream (between row groups) without splitting files into chunks. The arrow-rs companion change adds a small ParquetPushDecoder::swap_strategy API; the DataFusion side uses it from a single adaptive stream wrapper.

arrow-rs companion branch

This PR depends on pydantic/arrow-rs branch adaptive-strategy-swap (CI green at pydantic/arrow-rs#9), referenced via [patch.crates-io] in the workspace Cargo.toml.

The arrow-rs additions:

  • pub fn can_swap_strategy(&self) -> bool — true between row groups.
  • pub fn swap_strategy(&mut self, swap: StrategySwap) -> Result<()> — replaces projection / row filter / row selection policy at a row-group boundary; rejected mid-row-group.
  • pub struct StrategySwap (#[non_exhaustive]) with builder methods.
  • pub fn row_groups_remaining(&self) -> usize for diagnostics.

PushBuffers carries through the swap, so bytes already fetched for columns that survive the new strategy are reused.

What's removed (vs PR #9)

  • The chunk loop (ParquetAccessPlan::split_into_chunks, Vec<BoxStream> returns from build_stream).
  • Per-chunk AsyncFileReader::create_reader minting and per-chunk RowFilter rebuild (RowFilter is !Clone).
  • The EarlyStoppingStream-on-chunk-0-only special case for the non-Clone FilePruner.
  • LazyMorselShared per-morsel Arc churn — the source of the ~10% aggregate ClickBench regression flagged in PR Adaptive filter scheduling + row-group morsel split #9 review.

What's added

AdaptiveParquetStream in opener.rs drives one row group at a time via try_next_reader:

  1. Pull a ParquetRecordBatchReader for the next row group.
  2. Iterate synchronously; each batch goes through any post-scan filters (which feed per-filter stats into the tracker) and then through the projector.
  3. When the reader exhausts, ask the tracker to re-partition filters based on accumulated stats. If the row-filter set changed, build a new RowFilter and call decoder.swap_strategy(...) before requesting the next reader. Post-scan filters update in lockstep.

PushBuffers carries through the swap so already-fetched bytes are preserved. The optional-filter mid-stream skip mechanism (existing OptionalFilterPhysicalExpr + tracker.is_filter_skipped) keeps working unchanged inside apply_post_scan_filters_with_stats.

Carried-over machinery (file-level checkout from dbcf5ac1e)

  • selectivity.rsSelectivityTracker, PartitionedFilters, FilterId, Welford CI bounds.
  • row_filter.rs — new build_row_filter signature returning (Option<RowFilter>, UnbuildableFilters) plus total_compressed_bytes, plus DatafusionArrowPredicate stat hooks.
  • physical_expr.rsOptionalFilterPhysicalExpr, snapshot_generation helpers. Display is pass-through here (PR Adaptive filter scheduling + row-group morsel split #9 used Optional(...)), keeping every existing sqllogictest expected output intact.
  • config.rs — adds filter_pushdown_min_bytes_per_sec / filter_collecting_byte_ratio_threshold / filter_confidence_z. reorder_filters is preserved as a deprecated no-op — the adaptive tracker subsumes it.
  • Per-file plumbing in source.rs: predicate_conjuncts: Vec<(FilterId, Arc<PhysicalExpr>)> instead of a single AND-ed predicate so per-conjunct stats accumulate across files.

Deferred

  • Sub-row-group adaptation (would need ParquetRecordBatchReader::pause in arrow-rs to yield a residual RowSelection). Useful for TPCDS-style single-huge-row-group files. Out of scope here.
  • Three new config knobs aren't in the proto schema yet; from_proto fills with config defaults so a roundtrip preserves behavior. Worth a follow-up to plumb them through the proto.

Known pre-existing CI flake

datafusion/sqllogictest/test_files/explain_analyze.slt:103 (the output_rows_skew skew metric test) is failing on apache/datafusion main itself — see run 25027102370 on commit 310dd5d4, identical diff expected 84.31% / actual 100%. Not introduced by this branch. Fixing it is out of scope; this PR matches the pre-existing CI baseline.

Test plan

  • cargo fmt --all
  • cargo clippy --workspace --all-targets ... -- -D warnings
  • cargo test -p datafusion-datasource-parquet --lib — 143 passed
  • cargo test -p datafusion --lib — 410 passed
  • cargo test -p datafusion --test core_integration — 935 passed
  • cargo doc --workspace --no-deps — clean
  • cargo run --example data_iojson_shredding passes with pushdown_rows_pruned=1
  • cargo test -p datafusion-sqllogictest --test sqllogictests — pass except the pre-existing explain_analyze.slt and encrypted_parquet.slt flakes that fail on apache/main
  • CI (in progress — currently re-running after the doc/configs.md/json_shredding fixes pushed in 608f280c4/32195d6da/4a4e300eb)
  • ClickBench (pre/post): aggregate + per-query. Expected: ~10% regression from LazyMorselShared churn disappears
  • Hash-join dynamic filter blog-post benchmark (small_table JOIN large_table with WHERE small_table.v >= 50): unchanged from main

🤖 Generated with Claude Code

@adriangb
Copy link
Copy Markdown
Owner Author

run benchmark clickbench_partitioned

baseline:
    ref: main
    env:
       DATAFUSION_EXECUTION_PARQUET_PUSHDOWN_FILTERS: false
       DATAFUSION_EXECUTION_PARQUET_REORDER_FILTERS: false
changed:
    ref: HEAD
    env:
       DATAFUSION_EXECUTION_PARQUET_PUSHDOWN_FILTERS: true
       DATAFUSION_EXECUTION_PARQUET_REORDER_FILTERS: true

adriangb and others added 3 commits April 27, 2026 23:49
Replaces PR #9's morsel-per-row-group split with in-decoder strategy
swap. One `ParquetPushDecoder` per file, one `BoxStream` per file,
filter placement re-evaluated at every row-group boundary using the
shared `SelectivityTracker`.

# What's removed (vs PR #9)

- The chunk loop (`ParquetAccessPlan::split_into_chunks`,
  `Vec<BoxStream>` returns from `build_stream`).
- Per-chunk `AsyncFileReader::create_reader` minting and per-chunk
  `RowFilter` rebuild.
- The `EarlyStoppingStream`-on-chunk-0-only special case for the
  non-`Clone` `FilePruner`.
- `LazyMorselShared` per-morsel Arc churn — the source of the ~10%
  aggregate ClickBench regression you flagged in PR #9 review.

# What's added

`AdaptiveParquetStream` (new in `opener.rs`) drives one row group at a
time via `try_next_reader`:

1. Pull a `ParquetRecordBatchReader` for the next row group.
2. Iterate it synchronously; each batch goes through any post-scan
   filters (which feed per-filter stats into the tracker) and then
   through the projector.
3. When the reader exhausts, ask the tracker to re-partition filters
   based on accumulated stats. If the row-filter set changed, build
   a new `RowFilter` and call the new arrow-rs
   `ParquetPushDecoder::swap_strategy` before requesting the next
   reader. Post-scan filters update in lockstep.

`PushBuffers` carries through the swap so already-fetched bytes are
preserved, and the optional-filter mid-stream skip mechanism (existing
`OptionalFilterPhysicalExpr` + `tracker.is_filter_skipped`) keeps
working unchanged inside `apply_post_scan_filters_with_stats`.

# Carried-over machinery (file-level checkout from `dbcf5ac1e`)

- `selectivity.rs` — `SelectivityTracker`, `PartitionedFilters`,
  `FilterId`, Welford CI bounds. Verbatim.
- `row_filter.rs` — new `build_row_filter` signature returning
  `(Option<RowFilter>, UnbuildableFilters)` plus
  `total_compressed_bytes`, plus `DatafusionArrowPredicate` stat hooks.
- `physical_expr.rs` — `OptionalFilterPhysicalExpr`, `snapshot_generation`
  helpers. `Display` is **pass-through** here (PR #9 used
  `Optional(...)`), keeping every existing sqllogictest expected output
  intact.
- `config.rs` — adds `filter_pushdown_min_bytes_per_sec` /
  `filter_collecting_byte_ratio_threshold` / `filter_confidence_z`.
  **`reorder_filters` is preserved as a deprecated no-op** (per
  request) — the adaptive tracker subsumes it.
- `selectivity_tracker.rs` bench — verbatim.
- Per-file plumbing in `source.rs`: `predicate_conjuncts:
  Vec<(FilterId, Arc<PhysicalExpr>)>` instead of a single AND-ed
  predicate so per-conjunct stats accumulate across files.

# arrow-rs companion branch

Depends on `pydantic/arrow-rs:adaptive-strategy-swap`, which adds
`ParquetPushDecoder::can_swap_strategy()` /
`swap_strategy(StrategySwap)` and the `StrategySwap` builder. The
`Cargo.toml` `[patch.crates-io]` block points at it.

# What's not in this PR (deferred)

- Sub-row-group adaptation (would need a `ParquetRecordBatchReader::pause`
  primitive in arrow-rs to yield a residual `RowSelection`); useful for
  TPCDS-style single-huge-row-group files. Defer.
- Three new config knobs aren't in the proto schema yet; `from_proto`
  fills with config defaults so a roundtrip preserves behavior.

# Tests

- `cargo test -p datafusion-datasource-parquet --lib` — 143 passed
- `cargo test -p datafusion --lib` — 410 passed
- `cargo test -p datafusion --test core_integration` — 935 passed
- `cargo test -p datafusion-sqllogictest --test sqllogictests` — all
  pass except `encrypted_parquet.slt` (pre-existing on upstream/main,
  not related to this change)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Fix 6 broken intra-doc links in `opener.rs`: `RowFilter`,
  `PushBuffers`, `AsyncFileReader::create_reader`, `SelectivityTracker`
  weren't visible from the doc-comment scope. Reword to plain backticks
  for the names that don't have a stable in-scope path; route
  `SelectivityTracker` through `crate::selectivity::SelectivityTracker`.
- Regenerate `docs/source/user-guide/configs.md` via
  `dev/update_config_docs.sh` to surface the three new
  `filter_pushdown_min_bytes_per_sec` /
  `filter_collecting_byte_ratio_threshold` / `filter_confidence_z`
  rows the CI doc check expects.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…33dd62

Picks up the rustdoc fix from the arrow-rs companion branch so the
DataFusion CI doc job resolves clean too.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@adriangb adriangb force-pushed the adaptive-filters-in-decoder branch from 4a4e300 to d379196 Compare April 28, 2026 04:49
The example asserts `pushdown_rows_pruned=1` to demonstrate that the
row-filter path actually evicts rows. Under the adaptive scheduler's
default `filter_pushdown_min_bytes_per_sec = 100 MB/s`, a small
example file's filter starts on the post-scan path (where
`pushdown_rows_pruned` stays 0) and the assertion fires.

Set `filter_pushdown_min_bytes_per_sec = 0` to disable the throughput
check and force every filter to row-level — the same lever
`physical_plan/parquet.rs` test harness uses.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@adriangbot
Copy link
Copy Markdown

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4332426918-1855-h54bv 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing HEAD (88ab545) to main diff using: clickbench_partitioned
Results will be posted here when complete


File an issue against this benchmark runner

@adriangbot
Copy link
Copy Markdown

🤖 Benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected
Details

Comparing HEAD and adaptive-filters-in-decoder
--------------------
Benchmark clickbench_partitioned.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┓
┃ Query     ┃                                  HEAD ┃           adaptive-filters-in-decoder ┃         Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━┩
│ QQuery 0  │          1.20 / 4.66 ±6.79 / 18.24 ms │          1.17 / 4.54 ±6.68 / 17.89 ms │      no change │
│ QQuery 1  │        13.14 / 13.19 ±0.04 / 13.24 ms │        13.82 / 14.24 ±0.33 / 14.82 ms │   1.08x slower │
│ QQuery 2  │        36.38 / 36.56 ±0.15 / 36.78 ms │        36.70 / 36.96 ±0.29 / 37.49 ms │      no change │
│ QQuery 3  │        31.37 / 32.09 ±0.81 / 33.65 ms │        31.12 / 31.63 ±0.37 / 32.18 ms │      no change │
│ QQuery 4  │     238.64 / 243.37 ±3.14 / 246.95 ms │     244.79 / 246.47 ±0.98 / 247.76 ms │      no change │
│ QQuery 5  │     283.07 / 284.47 ±1.25 / 286.79 ms │     280.52 / 282.28 ±1.47 / 284.63 ms │      no change │
│ QQuery 6  │           6.53 / 7.16 ±0.51 / 8.00 ms │           5.02 / 5.51 ±0.52 / 6.52 ms │  +1.30x faster │
│ QQuery 7  │        14.07 / 14.82 ±1.30 / 17.41 ms │        15.12 / 15.57 ±0.69 / 16.95 ms │   1.05x slower │
│ QQuery 8  │     326.33 / 329.14 ±3.55 / 336.15 ms │     320.69 / 323.00 ±1.42 / 325.17 ms │      no change │
│ QQuery 9  │     443.16 / 455.47 ±9.10 / 468.07 ms │     444.31 / 455.94 ±9.19 / 471.53 ms │      no change │
│ QQuery 10 │        74.73 / 75.66 ±0.54 / 76.21 ms │       94.37 / 98.80 ±5.73 / 109.78 ms │   1.31x slower │
│ QQuery 11 │        86.55 / 88.06 ±1.34 / 90.18 ms │     104.15 / 105.40 ±1.06 / 107.25 ms │   1.20x slower │
│ QQuery 12 │     274.12 / 278.09 ±2.48 / 281.90 ms │     262.43 / 267.70 ±4.64 / 275.99 ms │      no change │
│ QQuery 13 │     397.43 / 404.81 ±6.02 / 414.09 ms │    420.17 / 441.01 ±18.25 / 474.11 ms │   1.09x slower │
│ QQuery 14 │     288.99 / 291.07 ±1.90 / 293.98 ms │     315.56 / 319.39 ±3.10 / 322.82 ms │   1.10x slower │
│ QQuery 15 │     281.39 / 285.72 ±5.57 / 296.33 ms │     282.13 / 287.98 ±4.24 / 295.29 ms │      no change │
│ QQuery 16 │     613.29 / 622.44 ±5.02 / 627.96 ms │     606.07 / 612.22 ±3.76 / 617.80 ms │      no change │
│ QQuery 17 │     621.04 / 629.08 ±7.12 / 638.93 ms │     609.29 / 614.56 ±3.30 / 619.53 ms │      no change │
│ QQuery 18 │ 1253.22 / 1270.40 ±15.97 / 1299.12 ms │ 1208.08 / 1236.98 ±39.29 / 1314.45 ms │      no change │
│ QQuery 19 │        29.10 / 29.80 ±0.74 / 30.72 ms │        28.27 / 29.80 ±2.22 / 34.19 ms │      no change │
│ QQuery 20 │     516.87 / 526.19 ±8.40 / 541.70 ms │     516.54 / 522.83 ±5.22 / 531.97 ms │      no change │
│ QQuery 21 │     603.26 / 607.10 ±3.06 / 611.09 ms │     564.63 / 573.00 ±4.45 / 576.92 ms │  +1.06x faster │
│ QQuery 22 │  1069.06 / 1081.19 ±9.34 / 1096.09 ms │     753.87 / 763.45 ±9.31 / 778.77 ms │  +1.42x faster │
│ QQuery 23 │ 3357.28 / 3391.83 ±23.44 / 3426.46 ms │     281.92 / 288.57 ±6.58 / 299.20 ms │ +11.75x faster │
│ QQuery 24 │        42.35 / 42.89 ±0.39 / 43.52 ms │        34.17 / 37.02 ±4.01 / 44.95 ms │  +1.16x faster │
│ QQuery 25 │     114.20 / 115.43 ±1.45 / 118.15 ms │     115.94 / 121.45 ±8.29 / 137.71 ms │   1.05x slower │
│ QQuery 26 │        42.74 / 44.52 ±3.29 / 51.09 ms │        59.56 / 65.43 ±9.44 / 84.14 ms │   1.47x slower │
│ QQuery 27 │     670.41 / 674.19 ±2.74 / 678.74 ms │     649.14 / 651.81 ±2.65 / 655.25 ms │      no change │
│ QQuery 28 │  3018.40 / 3026.36 ±6.70 / 3036.04 ms │ 2985.51 / 3012.13 ±24.03 / 3046.13 ms │      no change │
│ QQuery 29 │        42.61 / 43.07 ±0.45 / 43.72 ms │        41.88 / 42.22 ±0.39 / 42.98 ms │      no change │
│ QQuery 30 │    309.49 / 323.23 ±21.54 / 366.03 ms │    312.33 / 322.80 ±12.99 / 345.92 ms │      no change │
│ QQuery 31 │     305.53 / 312.06 ±5.76 / 322.65 ms │     300.48 / 309.34 ±6.15 / 315.90 ms │      no change │
│ QQuery 32 │ 1027.62 / 1061.77 ±52.19 / 1164.27 ms │   951.86 / 975.99 ±18.76 / 1000.77 ms │  +1.09x faster │
│ QQuery 33 │ 1429.60 / 1443.21 ±10.90 / 1458.00 ms │ 1402.12 / 1423.09 ±12.71 / 1434.18 ms │      no change │
│ QQuery 34 │ 1455.67 / 1467.63 ±12.20 / 1488.73 ms │  1421.97 / 1429.57 ±8.17 / 1444.88 ms │      no change │
│ QQuery 35 │    291.95 / 301.56 ±11.57 / 322.72 ms │    282.57 / 297.01 ±12.50 / 318.94 ms │      no change │
│ QQuery 36 │        66.38 / 73.62 ±5.60 / 80.58 ms │                                  FAIL │   incomparable │
│ QQuery 37 │        35.91 / 36.13 ±0.22 / 36.43 ms │        32.80 / 32.96 ±0.15 / 33.17 ms │  +1.10x faster │
│ QQuery 38 │        40.41 / 42.84 ±1.50 / 44.73 ms │                                  FAIL │   incomparable │
│ QQuery 39 │     131.31 / 140.06 ±7.74 / 152.82 ms │     114.06 / 126.39 ±8.50 / 137.51 ms │  +1.11x faster │
│ QQuery 40 │        14.53 / 14.80 ±0.24 / 15.26 ms │        16.90 / 17.56 ±0.44 / 18.22 ms │   1.19x slower │
│ QQuery 41 │        13.86 / 14.11 ±0.25 / 14.55 ms │                                  FAIL │   incomparable │
│ QQuery 42 │        13.37 / 15.38 ±3.89 / 23.17 ms │                                  FAIL │   incomparable │
└───────────┴───────────────────────────────────────┴───────────────────────────────────────┴────────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                          ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                          │ 20049.32ms │
│ Total Time (adaptive-filters-in-decoder)   │ 16442.58ms │
│ Average Time (HEAD)                        │   514.09ms │
│ Average Time (adaptive-filters-in-decoder) │   421.60ms │
│ Queries Faster                             │          8 │
│ Queries Slower                             │          9 │
│ Queries with No Change                     │         22 │
│ Queries with Failure                       │          4 │
└────────────────────────────────────────────┴────────────┘

Resource Usage

clickbench_partitioned — base (merge-base)

Metric Value
Wall time 105.0s
Peak memory 30.2 GiB
Avg memory 23.0 GiB
CPU user 1063.9s
CPU sys 67.1s
Peak spill 0 B

clickbench_partitioned — branch

Metric Value
Wall time 85.0s
Peak memory 28.9 GiB
Avg memory 23.4 GiB
CPU user 875.9s
CPU sys 53.5s
Peak spill 0 B

File an issue against this benchmark runner

Two fixes for benchmark regressions and crashes on hits_partitioned
ClickBench queries:

# Hard failures (Q36, Q38, Q41, Q42)

`build_stream` was building the wide ProjectionMask from `user
projection ∪ post_scan_conjuncts` only, but a row-level conjunct can
get demoted to post-scan mid-stream by `maybe_swap_strategy`. When
that happened, the demoted filter's column wasn't in the
`stream_schema`, and the post-scan rebase via `reassign_expr_columns`
fired a `Schema error: Unable to get field named "..."` against the
narrow batch.

Fix: include **every** predicate conjunct's columns in the wide
projection regardless of current placement. Filter-only columns are
still stripped after post-scan filtering by the projector, so the
user-visible schema is unchanged.

# Initial-placement regressions (Q10, Q11, Q13, Q14, Q26)

Queries shaped like
`SELECT col, ... FROM t WHERE col <> '' GROUP BY col` had the filter
column already in the user projection. The byte-ratio heuristic was
counting filter bytes against projection bytes naively, so
`MobilePhoneModel_bytes / (MobilePhoneModel_bytes + UserID_bytes) ≈ 0.5`
exceeded the 0.20 threshold and pushed the filter to post-scan — even
though row-level was strictly better (zero extra I/O, late
materialization saves UserID decode for pruned rows).

Fix: change the heuristic numerator from `filter_bytes` to **extra**
bytes — bytes for filter columns *not* already in the user
projection. A filter that only references projection columns now
gets `byte_ratio = 0` and starts at row-level. Threading required:
add `projection_columns: &HashSet<usize>` to
`SelectivityTracker::partition_filters` (and the inner impl);
opener's `AdaptiveParquetStream` carries it for mid-stream re-evals.

# Test plan

- All 4 hard-failure queries (Q36/Q38/Q41/Q42) now run to completion
  locally on hits_partitioned.
- 143 datasource-parquet unit tests pass (38 partition_filters
  call-sites in the test module updated to the new signature).
- Benchmark expectations: Q23/Q22/Q6 wins should hold; Q10/Q11/Q13/Q14
  regressions should resolve via the better initial placement.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@adriangb
Copy link
Copy Markdown
Owner Author

run benchmark clickbench_partitioned

baseline:
    ref: main
    env:
       DATAFUSION_EXECUTION_PARQUET_PUSHDOWN_FILTERS: false
       DATAFUSION_EXECUTION_PARQUET_REORDER_FILTERS: false
changed:
    ref: HEAD
    env:
       DATAFUSION_EXECUTION_PARQUET_PUSHDOWN_FILTERS: true
       DATAFUSION_EXECUTION_PARQUET_REORDER_FILTERS: true

@adriangbot
Copy link
Copy Markdown

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4336024373-1867-g8jwq 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing HEAD (1301112) to main diff using: clickbench_partitioned
Results will be posted here when complete


File an issue against this benchmark runner

@adriangbot
Copy link
Copy Markdown

🤖 Benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected
Details

Comparing HEAD and adaptive-filters-in-decoder
--------------------
Benchmark clickbench_partitioned.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┓
┃ Query     ┃                                  HEAD ┃           adaptive-filters-in-decoder ┃         Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━┩
│ QQuery 0  │          1.20 / 4.73 ±6.93 / 18.58 ms │          1.18 / 4.61 ±6.78 / 18.17 ms │      no change │
│ QQuery 1  │        12.86 / 13.04 ±0.13 / 13.24 ms │        13.22 / 13.69 ±0.26 / 14.00 ms │      no change │
│ QQuery 2  │        36.60 / 36.91 ±0.39 / 37.63 ms │        35.81 / 36.01 ±0.18 / 36.30 ms │      no change │
│ QQuery 3  │        31.50 / 32.07 ±0.80 / 33.64 ms │        30.51 / 30.76 ±0.23 / 31.07 ms │      no change │
│ QQuery 4  │     240.20 / 245.06 ±4.61 / 251.53 ms │     242.15 / 246.87 ±2.82 / 250.72 ms │      no change │
│ QQuery 5  │     285.38 / 287.38 ±1.41 / 289.33 ms │     281.78 / 283.53 ±1.80 / 286.32 ms │      no change │
│ QQuery 6  │           6.35 / 7.04 ±0.52 / 7.60 ms │           5.32 / 5.60 ±0.36 / 6.28 ms │  +1.26x faster │
│ QQuery 7  │        13.81 / 14.58 ±1.31 / 17.20 ms │        14.71 / 15.44 ±1.26 / 17.96 ms │   1.06x slower │
│ QQuery 8  │     330.37 / 332.95 ±3.75 / 340.40 ms │    318.42 / 329.79 ±13.41 / 356.14 ms │      no change │
│ QQuery 9  │     457.25 / 461.55 ±5.21 / 471.45 ms │    453.02 / 468.07 ±15.16 / 491.92 ms │      no change │
│ QQuery 10 │        74.34 / 77.11 ±2.98 / 82.47 ms │      95.87 / 101.79 ±7.33 / 113.43 ms │   1.32x slower │
│ QQuery 11 │        86.70 / 87.82 ±1.21 / 89.70 ms │     105.50 / 106.42 ±1.08 / 108.47 ms │   1.21x slower │
│ QQuery 12 │     277.09 / 281.58 ±3.62 / 287.57 ms │     270.08 / 274.48 ±3.74 / 280.65 ms │      no change │
│ QQuery 13 │     398.21 / 404.74 ±3.74 / 409.06 ms │     433.94 / 444.16 ±7.99 / 453.85 ms │   1.10x slower │
│ QQuery 14 │     290.19 / 294.03 ±4.21 / 301.75 ms │     317.75 / 320.25 ±2.13 / 323.27 ms │   1.09x slower │
│ QQuery 15 │     289.80 / 292.24 ±2.10 / 295.19 ms │     284.53 / 299.01 ±9.90 / 313.32 ms │      no change │
│ QQuery 16 │     623.03 / 635.74 ±7.24 / 643.32 ms │     607.81 / 611.46 ±2.01 / 614.02 ms │      no change │
│ QQuery 17 │     626.74 / 634.31 ±6.48 / 645.54 ms │     605.53 / 615.59 ±5.13 / 619.41 ms │      no change │
│ QQuery 18 │  1267.62 / 1284.27 ±9.98 / 1294.20 ms │ 1205.75 / 1221.35 ±11.12 / 1237.07 ms │      no change │
│ QQuery 19 │        28.54 / 30.54 ±3.69 / 37.92 ms │       28.14 / 33.57 ±10.15 / 53.86 ms │   1.10x slower │
│ QQuery 20 │     517.39 / 527.39 ±9.86 / 546.09 ms │    517.13 / 532.25 ±22.61 / 576.99 ms │      no change │
│ QQuery 21 │     592.38 / 598.48 ±3.34 / 601.21 ms │    562.57 / 578.16 ±12.67 / 596.99 ms │      no change │
│ QQuery 22 │  1063.71 / 1067.89 ±3.37 / 1072.63 ms │     750.89 / 758.13 ±5.37 / 767.17 ms │  +1.41x faster │
│ QQuery 23 │ 3345.93 / 3376.50 ±21.96 / 3400.74 ms │    266.68 / 287.00 ±10.78 / 297.47 ms │ +11.76x faster │
│ QQuery 24 │        43.25 / 45.97 ±3.77 / 53.27 ms │       35.96 / 50.03 ±12.13 / 65.94 ms │   1.09x slower │
│ QQuery 25 │     113.48 / 117.30 ±4.45 / 125.82 ms │     117.86 / 123.66 ±5.87 / 134.62 ms │   1.05x slower │
│ QQuery 26 │        42.34 / 42.62 ±0.25 / 42.99 ms │        60.27 / 61.99 ±2.41 / 66.75 ms │   1.45x slower │
│ QQuery 27 │     660.56 / 671.15 ±6.20 / 678.20 ms │     641.32 / 647.80 ±3.86 / 652.91 ms │      no change │
│ QQuery 28 │ 2996.58 / 3015.67 ±11.62 / 3028.09 ms │ 2992.60 / 3015.46 ±16.93 / 3043.83 ms │      no change │
│ QQuery 29 │        42.58 / 48.17 ±6.87 / 60.49 ms │        42.30 / 44.70 ±4.43 / 53.56 ms │  +1.08x faster │
│ QQuery 30 │     312.33 / 316.62 ±3.87 / 323.18 ms │     348.50 / 349.30 ±0.91 / 350.89 ms │   1.10x slower │
│ QQuery 31 │     306.42 / 310.75 ±3.99 / 317.09 ms │     340.76 / 347.31 ±3.66 / 351.12 ms │   1.12x slower │
│ QQuery 32 │  1008.33 / 1016.12 ±4.59 / 1021.77 ms │    954.37 / 968.25 ±17.01 / 997.89 ms │      no change │
│ QQuery 33 │ 1435.05 / 1462.13 ±24.70 / 1507.72 ms │ 1436.04 / 1446.29 ±10.53 / 1462.77 ms │      no change │
│ QQuery 34 │ 1454.21 / 1472.42 ±17.29 / 1505.31 ms │  1448.13 / 1458.99 ±6.03 / 1466.55 ms │      no change │
│ QQuery 35 │    298.82 / 318.88 ±24.29 / 361.84 ms │     289.06 / 295.77 ±4.05 / 300.19 ms │  +1.08x faster │
│ QQuery 36 │        62.62 / 65.13 ±2.57 / 68.55 ms │      71.05 / 80.16 ±10.89 / 100.01 ms │   1.23x slower │
│ QQuery 37 │       36.23 / 43.42 ±10.13 / 62.83 ms │        39.29 / 43.62 ±5.85 / 54.80 ms │      no change │
│ QQuery 38 │        40.89 / 48.60 ±6.39 / 59.19 ms │        36.86 / 37.94 ±1.03 / 39.51 ms │  +1.28x faster │
│ QQuery 39 │     131.84 / 137.75 ±4.24 / 144.94 ms │     118.15 / 127.64 ±6.76 / 134.72 ms │  +1.08x faster │
│ QQuery 40 │        14.77 / 18.63 ±4.56 / 24.57 ms │        18.67 / 19.54 ±1.26 / 22.05 ms │      no change │
│ QQuery 41 │        14.08 / 14.25 ±0.13 / 14.47 ms │        17.50 / 17.66 ±0.13 / 17.88 ms │   1.24x slower │
│ QQuery 42 │        13.60 / 17.18 ±4.35 / 23.76 ms │        16.27 / 17.72 ±2.21 / 22.11 ms │      no change │
└───────────┴───────────────────────────────────────┴───────────────────────────────────────┴────────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                          ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                          │ 20210.75ms │
│ Total Time (adaptive-filters-in-decoder)   │ 16771.83ms │
│ Average Time (HEAD)                        │   470.02ms │
│ Average Time (adaptive-filters-in-decoder) │   390.04ms │
│ Queries Faster                             │          7 │
│ Queries Slower                             │         13 │
│ Queries with No Change                     │         23 │
│ Queries with Failure                       │          0 │
└────────────────────────────────────────────┴────────────┘

Resource Usage

clickbench_partitioned — base (merge-base)

Metric Value
Wall time 105.0s
Peak memory 30.5 GiB
Avg memory 23.4 GiB
CPU user 1073.8s
CPU sys 60.0s
Peak spill 0 B

clickbench_partitioned — branch

Metric Value
Wall time 85.0s
Peak memory 30.5 GiB
Avg memory 23.2 GiB
CPU user 888.0s
CPU sys 50.4s
Peak spill 0 B

File an issue against this benchmark runner

…ement

Bench showed Q10/Q11/Q13/Q14/Q26 still regressing 1.20-1.47x even
after the overlap-aware heuristic. These queries are shaped like
\`SELECT col, ... FROM t WHERE col <> '' GROUP BY col\` — filter
column entirely in projection, so \`extra_bytes = 0\` and
\`byte_ratio = 0\`. The previous heuristic placed them at row-level
since \`0 <= threshold\`, but row-level *isn't* free even at zero
extra I/O: predicate-cache eviction on heavy string columns means the
filter column gets decoded twice (once for the predicate eval, once
for the projection), and the late-materialization payoff depends on a
selectivity we don't know yet.

Local timings on hits_partitioned (release mode):

| Query | main + no-pushdown (baseline) | branch (old heuristic) | branch (new heuristic) |
|-------|------------------------------:|-----------------------:|-----------------------:|
| Q23   |                       3708 ms |                219 ms* |                 219 ms |
| Q22   |                       1344 ms |                902 ms* |                 902 ms |
| Q26   |                         41 ms |                  60 ms |                  48 ms |
| Q10   |                         82 ms |                 109 ms |                  88 ms |

Q23/Q22 wins are preserved (Q23 +17x faster vs baseline, Q22 +1.5x).
Q10/Q26 regressions go from 1.32-1.45x to 1.07-1.17x — the residual
is the cost of pushdown_filters=true vs false generally, not our
adaptive layer.

Why Q23 isn't hurt: its huge speedup comes from row-group statistics
pruning via the TopK dynamic filter on EventTime, not from row-level
filter evaluation. Pruning is independent of row-level vs post-scan
placement; the dynamic filter still reaches the source and the
PruningPredicate still applies. (Local repro confirms — Q23 actually
gets slightly faster on the new heuristic because we skip the
double-decode of the heavy URL string column.)

Implementation: change the new-filter row-level condition from
\`byte_ratio <= threshold\` to \`extra_bytes > 0 && byte_ratio <=
threshold\`. Pure-overlap filters (extra_bytes == 0) start at
post-scan; the tracker promotes them later if measured
bytes-saved-per-sec justifies it. Filters with non-zero extra cost
that fits within \`byte_ratio_threshold\` (small int predicate
against a heavy string projection) still start at row-level — that's
the case where the heuristic is genuinely useful.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@adriangb
Copy link
Copy Markdown
Owner Author

run benchmark clickbench_partitioned

baseline:
    ref: main
    env:
       DATAFUSION_EXECUTION_PARQUET_PUSHDOWN_FILTERS: false
       DATAFUSION_EXECUTION_PARQUET_REORDER_FILTERS: false
changed:
    ref: HEAD
    env:
       DATAFUSION_EXECUTION_PARQUET_PUSHDOWN_FILTERS: true
       DATAFUSION_EXECUTION_PARQUET_REORDER_FILTERS: true

@adriangbot
Copy link
Copy Markdown

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4336687674-1868-kqsf5 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing HEAD (40a1ff8) to main diff using: clickbench_partitioned
Results will be posted here when complete


File an issue against this benchmark runner

@adriangbot
Copy link
Copy Markdown

🤖 Benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected
Details

Comparing HEAD and adaptive-filters-in-decoder
--------------------
Benchmark clickbench_partitioned.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┓
┃ Query     ┃                                  HEAD ┃           adaptive-filters-in-decoder ┃         Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━┩
│ QQuery 0  │          1.21 / 4.59 ±6.71 / 18.02 ms │          1.21 / 4.61 ±6.73 / 18.06 ms │      no change │
│ QQuery 1  │        12.14 / 12.63 ±0.38 / 13.25 ms │        13.46 / 13.87 ±0.21 / 14.05 ms │   1.10x slower │
│ QQuery 2  │        36.32 / 36.78 ±0.37 / 37.38 ms │        36.27 / 36.43 ±0.22 / 36.86 ms │      no change │
│ QQuery 3  │        30.90 / 31.82 ±0.93 / 33.59 ms │        30.85 / 33.57 ±2.64 / 37.55 ms │   1.05x slower │
│ QQuery 4  │     236.98 / 244.96 ±4.60 / 249.86 ms │     237.26 / 244.34 ±4.54 / 251.27 ms │      no change │
│ QQuery 5  │     281.12 / 282.69 ±0.93 / 283.88 ms │     277.17 / 279.96 ±1.71 / 281.80 ms │      no change │
│ QQuery 6  │           6.25 / 7.02 ±0.71 / 7.92 ms │           5.44 / 5.96 ±0.49 / 6.63 ms │  +1.18x faster │
│ QQuery 7  │        13.38 / 13.57 ±0.17 / 13.82 ms │        14.96 / 15.55 ±0.96 / 17.46 ms │   1.15x slower │
│ QQuery 8  │     325.09 / 326.93 ±2.19 / 331.12 ms │     314.20 / 317.29 ±3.37 / 323.52 ms │      no change │
│ QQuery 9  │     445.57 / 449.55 ±3.20 / 454.78 ms │     440.55 / 451.83 ±5.94 / 456.36 ms │      no change │
│ QQuery 10 │        73.60 / 74.82 ±1.05 / 76.45 ms │       94.48 / 97.11 ±4.45 / 105.99 ms │   1.30x slower │
│ QQuery 11 │        83.98 / 84.89 ±0.53 / 85.54 ms │     104.52 / 105.05 ±0.49 / 105.80 ms │   1.24x slower │
│ QQuery 12 │     273.18 / 277.07 ±3.57 / 282.48 ms │     265.29 / 268.65 ±3.21 / 274.50 ms │      no change │
│ QQuery 13 │     390.28 / 396.99 ±6.94 / 407.32 ms │    431.63 / 443.93 ±12.68 / 462.03 ms │   1.12x slower │
│ QQuery 14 │     283.33 / 285.06 ±1.71 / 288.10 ms │     315.09 / 318.89 ±3.39 / 323.58 ms │   1.12x slower │
│ QQuery 15 │     280.08 / 282.68 ±1.53 / 284.68 ms │     281.14 / 289.06 ±9.48 / 306.82 ms │      no change │
│ QQuery 16 │     615.73 / 618.98 ±3.36 / 625.33 ms │     602.10 / 607.17 ±4.43 / 613.15 ms │      no change │
│ QQuery 17 │     619.35 / 627.14 ±7.17 / 640.19 ms │     607.36 / 613.02 ±5.37 / 623.21 ms │      no change │
│ QQuery 18 │ 1236.73 / 1262.93 ±19.32 / 1285.93 ms │  1211.33 / 1219.38 ±8.13 / 1235.06 ms │      no change │
│ QQuery 19 │        28.27 / 29.42 ±1.98 / 33.38 ms │        27.73 / 32.68 ±5.33 / 39.54 ms │   1.11x slower │
│ QQuery 20 │    515.26 / 527.72 ±11.63 / 544.03 ms │    514.59 / 526.12 ±15.02 / 552.96 ms │      no change │
│ QQuery 21 │     596.95 / 602.61 ±4.48 / 610.21 ms │     563.95 / 570.72 ±7.04 / 583.31 ms │  +1.06x faster │
│ QQuery 22 │  1060.76 / 1068.14 ±9.01 / 1083.47 ms │     755.06 / 762.08 ±7.32 / 775.46 ms │  +1.40x faster │
│ QQuery 23 │ 3307.09 / 3346.80 ±29.29 / 3390.79 ms │     172.80 / 177.96 ±5.16 / 185.24 ms │ +18.81x faster │
│ QQuery 24 │       41.80 / 48.32 ±11.65 / 71.58 ms │        33.32 / 37.51 ±4.75 / 46.70 ms │  +1.29x faster │
│ QQuery 25 │     112.63 / 113.74 ±0.95 / 115.11 ms │     115.84 / 120.05 ±4.15 / 127.88 ms │   1.06x slower │
│ QQuery 26 │        42.45 / 46.20 ±4.45 / 54.30 ms │        60.41 / 62.24 ±2.12 / 66.01 ms │   1.35x slower │
│ QQuery 27 │     664.71 / 668.50 ±4.92 / 678.16 ms │     641.60 / 646.73 ±4.61 / 654.47 ms │      no change │
│ QQuery 28 │ 3003.38 / 3027.08 ±15.02 / 3046.59 ms │ 2987.72 / 3000.06 ±15.81 / 3030.70 ms │      no change │
│ QQuery 29 │        41.95 / 44.41 ±3.83 / 52.01 ms │       41.69 / 49.17 ±12.82 / 74.73 ms │   1.11x slower │
│ QQuery 30 │     302.55 / 312.60 ±9.21 / 329.50 ms │     344.22 / 347.20 ±2.69 / 352.02 ms │   1.11x slower │
│ QQuery 31 │     301.35 / 306.93 ±5.15 / 314.87 ms │     342.18 / 344.73 ±2.62 / 349.20 ms │   1.12x slower │
│ QQuery 32 │   998.14 / 1007.37 ±7.11 / 1016.74 ms │    945.45 / 959.63 ±11.54 / 979.30 ms │      no change │
│ QQuery 33 │  1411.37 / 1425.75 ±9.94 / 1438.05 ms │  1406.84 / 1414.10 ±7.70 / 1428.11 ms │      no change │
│ QQuery 34 │ 1421.56 / 1455.75 ±31.52 / 1514.16 ms │  1424.14 / 1435.66 ±8.91 / 1449.76 ms │      no change │
│ QQuery 35 │    286.47 / 296.09 ±10.52 / 315.69 ms │    286.87 / 305.30 ±16.08 / 330.72 ms │      no change │
│ QQuery 36 │        62.73 / 67.33 ±4.12 / 74.49 ms │        60.63 / 64.39 ±2.24 / 66.63 ms │      no change │
│ QQuery 37 │        36.23 / 40.29 ±4.13 / 47.35 ms │        35.42 / 41.72 ±5.55 / 49.54 ms │      no change │
│ QQuery 38 │        42.36 / 45.61 ±5.16 / 55.89 ms │        35.97 / 38.51 ±2.30 / 41.86 ms │  +1.18x faster │
│ QQuery 39 │     119.18 / 133.06 ±7.47 / 141.18 ms │     114.37 / 122.56 ±4.83 / 128.64 ms │  +1.09x faster │
│ QQuery 40 │        14.36 / 15.96 ±1.78 / 19.42 ms │        18.10 / 20.45 ±2.22 / 24.32 ms │   1.28x slower │
│ QQuery 41 │        13.45 / 13.73 ±0.23 / 14.16 ms │        17.06 / 19.38 ±2.89 / 25.05 ms │   1.41x slower │
│ QQuery 42 │        12.99 / 13.53 ±0.93 / 15.38 ms │        16.64 / 19.20 ±4.30 / 27.78 ms │   1.42x slower │
└───────────┴───────────────────────────────────────┴───────────────────────────────────────┴────────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                          ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                          │ 19978.05ms │
│ Total Time (adaptive-filters-in-decoder)   │ 16483.80ms │
│ Average Time (HEAD)                        │   464.61ms │
│ Average Time (adaptive-filters-in-decoder) │   383.34ms │
│ Queries Faster                             │          7 │
│ Queries Slower                             │         16 │
│ Queries with No Change                     │         20 │
│ Queries with Failure                       │          0 │
└────────────────────────────────────────────┴────────────┘

Resource Usage

clickbench_partitioned — base (merge-base)

Metric Value
Wall time 105.0s
Peak memory 29.9 GiB
Avg memory 23.0 GiB
CPU user 1058.8s
CPU sys 61.7s
Peak spill 0 B

clickbench_partitioned — branch

Metric Value
Wall time 85.0s
Peak memory 30.6 GiB
Avg memory 23.5 GiB
CPU user 874.6s
CPU sys 49.2s
Peak spill 0 B

File an issue against this benchmark runner

Two changes that work together to make Q10/Q11/Q13/Q14/Q26 stop
regressing without giving up the Q23/Q22 wins.

# 1. Prune-rate gate on PostScan → RowFilter promotion

Adds a second gate on top of the existing
`filter_pushdown_min_bytes_per_sec` CI bound: a filter only gets
promoted from post-scan to row-level if it actually prunes >= 99% of
rows it sees.

Why: the bytes-saved-per-sec metric is "potential savings if at
row-level" (rows_pruned × non-filter-projection-bytes-per-row ÷
eval_time). For ClickBench Q10 (\`MobilePhoneModel <> ''\`) the
selectivity is ~94% and the projection is heavy, so bytes-saved-per-sec
clears the 100 MB/s threshold easily. But row-level *actually loses*
to post-scan there because survivors are uniformly scattered: at 8K
rows per page, p^N for p=0.94 is ~10^-220 — effectively zero pages can
be skipped, RowSelection-driven decode is just as expensive as a
contiguous post-scan read but with extra predicate-cache eviction on
the heavy string column.

The 0.99 gate captures the scatter problem structurally:
- Clustered survivors (TopK dynamic filter, hash-join build): prune_rate
  trivially ≥ 0.99 once K shrinks. Page-skip works. Promote.
- Uniform survivors at moderate selectivity (Q10/Q11/Q13/Q14/Q26):
  prune_rate stays at 0.5–0.95. Page-skip can't work no matter how
  big bytes-saved-per-sec is. Stay at post-scan.

Q22's `Title LIKE '%Google%'` (prune_rate ~1.0) and Q23's
`URL LIKE '%google%'` (similar) trivially clear the gate, so their
big wins are preserved.

# 2. Drop STATS_SAMPLE_INTERVAL (1/32 → every batch)

I added the 1/32 sampling earlier when the per-batch
`Instant + tracker.update` was clearly hot — but at the time the
heuristic was over-promoting these queries to row-level, making the
per-batch path matter much more. Now that the prune-rate gate keeps
them at post-scan, sampling actually *hurts*: with 1/32 the Welford
accumulator converges 32× slower, so the tracker takes longer to
realize "this filter is bad at row-level" and the in-flight filter
flips state more often. Updating every batch is faster on every
query I measured (Q23, Q22, Q26, Q10).

`SKIP_FLAG_CHECK_INTERVAL = 4` stays — it gates the
OptionalFilter skip-flag check, not the Welford update, and removing
*it* added ~200ms to Q22 (the post-update lock-juggle isn't free).

# Local timings (warm, hits_partitioned, 12 partitions)

| Query | main+nopush | branch | Δ |
|-------|------------:|-------:|---|
| Q23   |      3271ms |  168ms | **+19.5x** |
| Q22   |      1069ms |  901ms | +1.19x |
| Q26   |        39ms |   41ms | matches (+2ms) |
| Q10   |        68ms |   59ms | **+1.15x** |

All four ≥ baseline. Q26 is essentially break-even; the residual 2ms
is below run-to-run noise.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@adriangb
Copy link
Copy Markdown
Owner Author

run benchmark clickbench_partitioned

baseline:
    ref: main
    env:
       DATAFUSION_EXECUTION_PARQUET_PUSHDOWN_FILTERS: false
       DATAFUSION_EXECUTION_PARQUET_REORDER_FILTERS: false
changed:
    ref: HEAD
    env:
       DATAFUSION_EXECUTION_PARQUET_PUSHDOWN_FILTERS: true
       DATAFUSION_EXECUTION_PARQUET_REORDER_FILTERS: true

@adriangbot
Copy link
Copy Markdown

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4337978163-1870-bmpv7 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing HEAD (05590a2) to main diff using: clickbench_partitioned
Results will be posted here when complete


File an issue against this benchmark runner

@adriangbot
Copy link
Copy Markdown

🤖 Benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected
Details

Comparing HEAD and adaptive-filters-in-decoder
--------------------
Benchmark clickbench_partitioned.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┓
┃ Query     ┃                                  HEAD ┃           adaptive-filters-in-decoder ┃         Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━┩
│ QQuery 0  │          1.18 / 4.54 ±6.68 / 17.90 ms │          1.17 / 4.61 ±6.77 / 18.14 ms │      no change │
│ QQuery 1  │        12.36 / 12.76 ±0.21 / 12.93 ms │        13.26 / 13.74 ±0.30 / 14.15 ms │   1.08x slower │
│ QQuery 2  │        36.47 / 36.88 ±0.32 / 37.44 ms │        35.75 / 36.07 ±0.20 / 36.34 ms │      no change │
│ QQuery 3  │        31.80 / 32.42 ±0.75 / 33.56 ms │        30.04 / 30.71 ±0.37 / 31.06 ms │  +1.06x faster │
│ QQuery 4  │     241.90 / 245.49 ±3.95 / 253.15 ms │     240.42 / 243.33 ±2.43 / 246.19 ms │      no change │
│ QQuery 5  │     283.83 / 284.53 ±0.53 / 285.12 ms │     280.40 / 282.20 ±1.24 / 283.61 ms │      no change │
│ QQuery 6  │           6.23 / 6.96 ±0.44 / 7.43 ms │           5.16 / 5.35 ±0.15 / 5.58 ms │  +1.30x faster │
│ QQuery 7  │        13.75 / 14.52 ±1.48 / 17.48 ms │        14.68 / 15.68 ±1.56 / 18.78 ms │   1.08x slower │
│ QQuery 8  │     327.84 / 330.42 ±2.00 / 332.96 ms │     319.46 / 324.73 ±5.13 / 334.44 ms │      no change │
│ QQuery 9  │     449.42 / 452.52 ±3.38 / 457.78 ms │     447.41 / 458.85 ±8.02 / 471.74 ms │      no change │
│ QQuery 10 │        74.72 / 77.12 ±4.05 / 85.18 ms │        69.27 / 69.74 ±0.44 / 70.52 ms │  +1.11x faster │
│ QQuery 11 │        86.11 / 87.41 ±1.12 / 89.46 ms │        79.23 / 81.17 ±1.90 / 84.75 ms │  +1.08x faster │
│ QQuery 12 │     278.81 / 283.39 ±3.16 / 288.03 ms │     263.02 / 266.43 ±2.87 / 271.59 ms │  +1.06x faster │
│ QQuery 13 │     395.86 / 404.45 ±7.57 / 413.28 ms │    408.22 / 418.33 ±10.39 / 437.88 ms │      no change │
│ QQuery 14 │     286.82 / 290.90 ±3.70 / 296.63 ms │     270.17 / 273.29 ±3.56 / 279.66 ms │  +1.06x faster │
│ QQuery 15 │     281.98 / 286.53 ±2.79 / 290.16 ms │     279.76 / 286.79 ±4.93 / 292.26 ms │      no change │
│ QQuery 16 │     626.93 / 630.77 ±3.81 / 637.01 ms │     596.80 / 610.54 ±8.93 / 623.26 ms │      no change │
│ QQuery 17 │     626.64 / 632.50 ±4.40 / 639.90 ms │     600.87 / 605.50 ±3.71 / 611.51 ms │      no change │
│ QQuery 18 │  1256.30 / 1272.25 ±8.56 / 1281.01 ms │ 1204.91 / 1226.35 ±16.67 / 1246.85 ms │      no change │
│ QQuery 19 │        28.34 / 29.12 ±0.57 / 29.91 ms │        28.22 / 29.74 ±2.90 / 35.53 ms │      no change │
│ QQuery 20 │     518.38 / 525.52 ±8.32 / 541.74 ms │     519.45 / 530.10 ±9.34 / 546.54 ms │      no change │
│ QQuery 21 │     596.27 / 605.48 ±7.61 / 614.87 ms │    623.38 / 646.49 ±22.81 / 690.19 ms │   1.07x slower │
│ QQuery 22 │ 1056.72 / 1068.62 ±10.71 / 1088.63 ms │     887.78 / 896.05 ±8.72 / 909.36 ms │  +1.19x faster │
│ QQuery 23 │ 3337.53 / 3353.35 ±10.99 / 3365.55 ms │     163.64 / 167.75 ±6.09 / 179.87 ms │ +19.99x faster │
│ QQuery 24 │        43.48 / 49.15 ±6.40 / 59.28 ms │       31.41 / 40.39 ±13.48 / 66.78 ms │  +1.22x faster │
│ QQuery 25 │     113.52 / 115.61 ±1.19 / 117.15 ms │     117.04 / 119.10 ±2.12 / 122.37 ms │      no change │
│ QQuery 26 │        43.54 / 45.32 ±2.07 / 49.09 ms │        50.38 / 51.49 ±0.95 / 52.80 ms │   1.14x slower │
│ QQuery 27 │    669.91 / 683.31 ±12.64 / 702.46 ms │     646.14 / 649.80 ±3.22 / 655.86 ms │      no change │
│ QQuery 28 │ 2989.63 / 3017.84 ±14.48 / 3028.90 ms │ 2994.03 / 3011.56 ±14.49 / 3033.87 ms │      no change │
│ QQuery 29 │        42.46 / 42.76 ±0.21 / 43.09 ms │        41.82 / 45.13 ±5.58 / 56.28 ms │   1.06x slower │
│ QQuery 30 │     311.28 / 314.33 ±3.16 / 319.81 ms │     298.95 / 306.50 ±5.16 / 312.35 ms │      no change │
│ QQuery 31 │     303.76 / 309.22 ±3.12 / 312.96 ms │     336.19 / 346.63 ±5.53 / 352.15 ms │   1.12x slower │
│ QQuery 32 │  1005.24 / 1009.85 ±2.60 / 1013.11 ms │    940.31 / 953.30 ±10.38 / 965.31 ms │  +1.06x faster │
│ QQuery 33 │ 1426.75 / 1458.34 ±26.58 / 1507.21 ms │ 1403.83 / 1426.34 ±13.59 / 1443.98 ms │      no change │
│ QQuery 34 │ 1427.29 / 1450.45 ±12.86 / 1466.03 ms │ 1413.35 / 1431.53 ±16.46 / 1454.64 ms │      no change │
│ QQuery 35 │    286.19 / 302.61 ±16.08 / 324.33 ms │    284.18 / 296.49 ±12.72 / 313.40 ms │      no change │
│ QQuery 36 │        64.03 / 70.11 ±9.16 / 88.36 ms │        59.76 / 69.94 ±7.99 / 78.98 ms │      no change │
│ QQuery 37 │        35.01 / 36.30 ±1.23 / 38.66 ms │        35.59 / 38.34 ±3.36 / 42.84 ms │   1.06x slower │
│ QQuery 38 │        39.45 / 42.06 ±1.62 / 44.07 ms │        35.85 / 37.93 ±2.07 / 41.56 ms │  +1.11x faster │
│ QQuery 39 │     134.90 / 138.97 ±3.75 / 143.85 ms │    115.64 / 128.83 ±10.31 / 145.52 ms │  +1.08x faster │
│ QQuery 40 │        14.43 / 14.83 ±0.50 / 15.82 ms │        18.26 / 19.64 ±1.08 / 21.04 ms │   1.32x slower │
│ QQuery 41 │        13.83 / 18.50 ±9.13 / 36.76 ms │        16.98 / 17.61 ±0.87 / 19.34 ms │      no change │
│ QQuery 42 │        13.10 / 13.40 ±0.23 / 13.78 ms │        15.95 / 16.18 ±0.15 / 16.35 ms │   1.21x slower │
└───────────┴───────────────────────────────────────┴───────────────────────────────────────┴────────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                          ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                          │ 20101.43ms │
│ Total Time (adaptive-filters-in-decoder)   │ 16530.26ms │
│ Average Time (HEAD)                        │   467.48ms │
│ Average Time (adaptive-filters-in-decoder) │   384.42ms │
│ Queries Faster                             │         12 │
│ Queries Slower                             │          9 │
│ Queries with No Change                     │         22 │
│ Queries with Failure                       │          0 │
└────────────────────────────────────────────┴────────────┘

Resource Usage

clickbench_partitioned — base (merge-base)

Metric Value
Wall time 105.0s
Peak memory 30.5 GiB
Avg memory 23.2 GiB
CPU user 1065.1s
CPU sys 62.7s
Peak spill 0 B

clickbench_partitioned — branch

Metric Value
Wall time 85.0s
Peak memory 31.6 GiB
Avg memory 23.6 GiB
CPU user 875.5s
CPU sys 49.3s
Peak spill 0 B

File an issue against this benchmark runner

Earlier I had two sampling/gate constants protecting the hot per-batch
update path:

  - \`STATS_SAMPLE_INTERVAL = 32\` in opener.rs: skip the
    \`Instant::now\` + \`tracker.update\` work on 31 of every 32 batches.
  - \`SKIP_FLAG_CHECK_INTERVAL = 4\` in selectivity.rs: inside
    tracker.update, skip the post-stats CI-bound + lock-juggle path on
    3 of every 4 calls.

Both were "right" given the prior over-promotion problem (filters
landing at row-level when they shouldn't, making the per-batch path
hot and the CI calc wasted). With the new \`prune_rate >= 0.99\` gate
those filters stay at post-scan and the measurements no longer
support sampling:

  - Removing \`STATS_SAMPLE_INTERVAL\` (every batch updates) is
    *faster* than 1/32 across Q23/Q22/Q26/Q10. Slower convergence on
    1/32 made the tracker take longer to settle, so the in-flight
    filter chain flipped state more often.
  - \`SKIP_FLAG_CHECK_INTERVAL = 4\` was protecting *non-optional*
    filters from a wasted-work path (post-stats CI calc + lock release
    + is_optional HashMap read + lock reacquire) that they didn't need
    at all. The right fix is to early-return for non-optional filters
    *before* that path, not to amortize it across 4 calls.

This refactor:

  1. Caches \`is_optional: bool\` inline on \`SelectivityStats\`.
     Non-optional filters early-return after the Welford update with
     a single field load on the already-held stats lock — no extra
     HashMap, no \`RwLock::read()\`, no \`drop\` + reacquire.

  2. For optional filters (hash-join build / TopK dynamic), the
     skip-flag CI check now runs every batch. That's what we want:
     when a filter's selectivity collapses, the skip flag should fire
     ASAP. Q26's TopK dynamic filter benefits visibly from this.

  3. Drops the now-redundant \`SelectivityTracker::is_optional\`
     HashMap and \`PartitionResult::new_filter_ids\` (was duplicating
     \`new_optional_flags\`). The is_optional bit moves to where it's
     read.

  4. Drops the sampling in \`apply_post_scan_filters_with_stats\`.
     \`tracker.update\` is now cheap enough on the fast path that
     sampling actively hurts (slower convergence > saved work).

Local timings (warm, hits_partitioned, 12 partitions):

  | Query | main+nopush | branch | Δ |
  |-------|------------:|-------:|---|
  | Q23   |      3271ms |  139ms | **+23.5x** |
  | Q22   |      1069ms |  898ms | +1.19x |
  | Q26   |        39ms |   39ms | matches |
  | Q10   |        68ms |   59ms | **+1.15x** |

143 lib tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@adriangb
Copy link
Copy Markdown
Owner Author

run benchmark clickbench_partitioned

baseline:
    ref: main
    env:
       DATAFUSION_EXECUTION_PARQUET_PUSHDOWN_FILTERS: false
       DATAFUSION_EXECUTION_PARQUET_REORDER_FILTERS: false
changed:
    ref: HEAD
    env:
       DATAFUSION_EXECUTION_PARQUET_PUSHDOWN_FILTERS: true
       DATAFUSION_EXECUTION_PARQUET_REORDER_FILTERS: true

@adriangbot
Copy link
Copy Markdown

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4338381183-1872-j57zl 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing HEAD (ca51dec) to main diff using: clickbench_partitioned
Results will be posted here when complete


File an issue against this benchmark runner

@adriangbot
Copy link
Copy Markdown

🤖 Benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected
Details

Comparing HEAD and adaptive-filters-in-decoder
--------------------
Benchmark clickbench_partitioned.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┓
┃ Query     ┃                                  HEAD ┃           adaptive-filters-in-decoder ┃         Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━┩
│ QQuery 0  │          1.20 / 4.59 ±6.69 / 17.97 ms │          1.26 / 4.83 ±7.01 / 18.84 ms │   1.05x slower │
│ QQuery 1  │        12.31 / 12.68 ±0.20 / 12.93 ms │        13.90 / 14.57 ±0.38 / 14.97 ms │   1.15x slower │
│ QQuery 2  │        36.50 / 37.09 ±0.46 / 37.90 ms │        36.90 / 37.18 ±0.34 / 37.84 ms │      no change │
│ QQuery 3  │        31.25 / 32.04 ±0.58 / 32.85 ms │        31.87 / 32.14 ±0.14 / 32.26 ms │      no change │
│ QQuery 4  │    243.49 / 259.14 ±13.40 / 278.55 ms │     268.98 / 273.94 ±3.18 / 277.53 ms │   1.06x slower │
│ QQuery 5  │     283.64 / 294.98 ±8.94 / 306.99 ms │    280.18 / 287.77 ±11.05 / 309.71 ms │      no change │
│ QQuery 6  │           6.80 / 7.46 ±0.48 / 8.09 ms │           5.25 / 5.72 ±0.47 / 6.53 ms │  +1.30x faster │
│ QQuery 7  │        13.46 / 13.78 ±0.20 / 14.06 ms │        15.01 / 15.33 ±0.34 / 15.91 ms │   1.11x slower │
│ QQuery 8  │     329.45 / 343.82 ±8.50 / 354.54 ms │    319.77 / 330.82 ±12.64 / 350.17 ms │      no change │
│ QQuery 9  │    448.17 / 467.91 ±19.41 / 494.36 ms │    444.91 / 459.22 ±12.11 / 477.01 ms │      no change │
│ QQuery 10 │        74.41 / 78.36 ±2.72 / 82.96 ms │        73.70 / 76.03 ±1.28 / 77.34 ms │      no change │
│ QQuery 11 │        85.55 / 86.47 ±1.10 / 88.62 ms │        82.79 / 83.01 ±0.24 / 83.42 ms │      no change │
│ QQuery 12 │    274.14 / 289.65 ±12.55 / 311.29 ms │    266.77 / 282.12 ±15.81 / 305.86 ms │      no change │
│ QQuery 13 │    392.07 / 420.94 ±16.49 / 442.08 ms │    432.45 / 449.87 ±13.48 / 465.95 ms │   1.07x slower │
│ QQuery 14 │     289.57 / 294.30 ±4.51 / 302.51 ms │    278.60 / 289.82 ±10.13 / 301.30 ms │      no change │
│ QQuery 15 │     283.24 / 289.25 ±4.44 / 293.45 ms │    285.92 / 309.71 ±12.58 / 321.28 ms │   1.07x slower │
│ QQuery 16 │    661.29 / 678.02 ±12.63 / 691.73 ms │    615.14 / 635.73 ±11.46 / 649.75 ms │  +1.07x faster │
│ QQuery 17 │    643.23 / 664.22 ±13.59 / 683.14 ms │    620.71 / 641.09 ±19.62 / 676.27 ms │      no change │
│ QQuery 18 │ 1311.53 / 1325.76 ±12.86 / 1348.98 ms │ 1239.60 / 1279.96 ±32.67 / 1311.64 ms │      no change │
│ QQuery 19 │        28.55 / 35.28 ±5.80 / 43.60 ms │       27.88 / 35.83 ±15.14 / 66.10 ms │      no change │
│ QQuery 20 │    519.01 / 541.09 ±20.71 / 575.36 ms │     516.05 / 527.31 ±6.30 / 535.18 ms │      no change │
│ QQuery 21 │     599.45 / 606.81 ±7.33 / 616.03 ms │    634.03 / 646.29 ±13.09 / 669.99 ms │   1.07x slower │
│ QQuery 22 │ 1067.53 / 1083.61 ±10.91 / 1100.56 ms │     895.87 / 908.17 ±9.11 / 919.76 ms │  +1.19x faster │
│ QQuery 23 │ 3385.66 / 3463.68 ±77.71 / 3607.51 ms │     164.65 / 175.92 ±6.93 / 184.19 ms │ +19.69x faster │
│ QQuery 24 │        44.10 / 51.05 ±5.40 / 58.23 ms │       33.00 / 39.84 ±10.53 / 60.60 ms │  +1.28x faster │
│ QQuery 25 │     114.94 / 118.16 ±2.92 / 123.51 ms │     116.98 / 118.97 ±1.49 / 121.29 ms │      no change │
│ QQuery 26 │        42.95 / 43.77 ±0.60 / 44.47 ms │        50.77 / 51.59 ±1.10 / 53.68 ms │   1.18x slower │
│ QQuery 27 │     678.59 / 686.46 ±6.16 / 693.70 ms │    653.11 / 663.02 ±10.68 / 682.64 ms │      no change │
│ QQuery 28 │ 3041.25 / 3082.31 ±42.41 / 3162.35 ms │ 3053.19 / 3066.66 ±11.59 / 3086.56 ms │      no change │
│ QQuery 29 │        43.99 / 44.34 ±0.34 / 44.97 ms │        43.30 / 45.82 ±4.12 / 54.03 ms │      no change │
│ QQuery 30 │    309.73 / 318.42 ±13.01 / 344.15 ms │     334.30 / 338.05 ±3.40 / 343.46 ms │   1.06x slower │
│ QQuery 31 │    310.21 / 322.84 ±10.85 / 338.68 ms │    358.28 / 379.05 ±15.41 / 400.71 ms │   1.17x slower │
│ QQuery 32 │ 1047.59 / 1066.29 ±19.15 / 1095.42 ms │     990.14 / 994.25 ±2.12 / 995.91 ms │  +1.07x faster │
│ QQuery 33 │ 1449.00 / 1504.72 ±40.65 / 1562.39 ms │ 1474.76 / 1510.00 ±30.31 / 1546.17 ms │      no change │
│ QQuery 34 │ 1460.59 / 1495.54 ±26.75 / 1526.05 ms │ 1491.23 / 1518.05 ±17.04 / 1537.15 ms │      no change │
│ QQuery 35 │    314.70 / 327.55 ±18.93 / 365.11 ms │    294.29 / 319.70 ±21.27 / 355.66 ms │      no change │
│ QQuery 36 │        65.08 / 66.35 ±1.27 / 68.57 ms │        65.75 / 69.42 ±3.05 / 75.00 ms │      no change │
│ QQuery 37 │        37.17 / 38.65 ±1.58 / 41.70 ms │        35.05 / 39.79 ±4.68 / 46.86 ms │      no change │
│ QQuery 38 │       43.16 / 51.29 ±11.10 / 72.61 ms │        36.76 / 41.08 ±3.39 / 47.17 ms │  +1.25x faster │
│ QQuery 39 │    130.90 / 142.87 ±10.29 / 161.91 ms │     130.80 / 137.97 ±3.76 / 141.19 ms │      no change │
│ QQuery 40 │        15.34 / 18.99 ±4.65 / 27.67 ms │        18.74 / 20.63 ±2.26 / 25.05 ms │   1.09x slower │
│ QQuery 41 │        14.74 / 15.94 ±1.14 / 18.05 ms │        18.22 / 18.63 ±0.42 / 19.33 ms │   1.17x slower │
│ QQuery 42 │        14.41 / 14.60 ±0.17 / 14.87 ms │        17.22 / 17.71 ±0.34 / 18.22 ms │   1.21x slower │
└───────────┴───────────────────────────────────────┴───────────────────────────────────────┴────────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                          ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                          │ 20741.06ms │
│ Total Time (adaptive-filters-in-decoder)   │ 17192.61ms │
│ Average Time (HEAD)                        │   482.35ms │
│ Average Time (adaptive-filters-in-decoder) │   399.83ms │
│ Queries Faster                             │          7 │
│ Queries Slower                             │         13 │
│ Queries with No Change                     │         23 │
│ Queries with Failure                       │          0 │
└────────────────────────────────────────────┴────────────┘

Resource Usage

clickbench_partitioned — base (merge-base)

Metric Value
Wall time 105.0s
Peak memory 29.6 GiB
Avg memory 23.2 GiB
CPU user 1099.1s
CPU sys 64.5s
Peak spill 0 B

clickbench_partitioned — branch

Metric Value
Wall time 90.0s
Peak memory 30.2 GiB
Avg memory 23.2 GiB
CPU user 906.9s
CPU sys 53.4s
Peak spill 0 B

File an issue against this benchmark runner

Replaces the all-or-nothing batch-level "if matched == 0, all skippable;
otherwise 0" computation with a sub-batch windowed analysis fed by a
new \`count_skippable_bytes\` helper. The metric is now:

  for each batch:
    skippable_bytes_for_batch = total_other_projection_bytes_for_batch
      × (windows-with-zero-survivors / total-windows)

with W = 8192 rows (short-circuited so total_windows=1 ⇒ binary
"is the whole batch all-pruned" — equivalent to the old behavior on
typical 8K batch sizes, but with the structure in place for finer W
on larger pages or different writers).

Why: \`filter_pushdown_min_bytes_per_sec\` is the right *unit* but the
metric feeding it overestimated savings whenever the filter pruned
rows that the row-level decoder couldn't actually drop a page on. A
50% filter on uniform data still costs full IO at row-level (every
page has survivors); a 50% filter on contiguous data lets the
decoder skip half the pages. The windowed analysis discriminates
these — same formula at post-scan (predicting what row-level would
save) and at row-level (measuring what the decoder did skip, modulo
within-window RowSelection narrowing which is an uncounted bonus).

Same metric on both sides means \`min_bytes_per_sec\` is the only knob;
no separate prune-rate gate. The 0.99 gate is now redundant — if
prune-rate is high enough that page-skip works, the metric already
clears the threshold; if prune-rate is high but scatter is uniform
(case C, ClickBench Q10/Q11/Q13/Q14/Q26), the metric stays low and
the filter stays at post-scan.

Helper short-circuits when:
- batch is fully pruned (\`true_count == 0\`) → all skippable,
- batch has no zeros (\`true_count == n\`) → 0 skippable,
- there's only one window (\`n ≤ W\`) and the answer is determined.

This avoids ~2× per-batch \`true_count\` work that was visible as a
regression when I first wired the helper through.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@adriangb
Copy link
Copy Markdown
Owner Author

run benchmark clickbench_partitioned

baseline:
    ref: main
    env:
       DATAFUSION_EXECUTION_PARQUET_PUSHDOWN_FILTERS: false
       DATAFUSION_EXECUTION_PARQUET_REORDER_FILTERS: false
changed:
    ref: HEAD
    env:
       DATAFUSION_EXECUTION_PARQUET_PUSHDOWN_FILTERS: true
       DATAFUSION_EXECUTION_PARQUET_REORDER_FILTERS: true

@adriangbot
Copy link
Copy Markdown

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4338719111-1874-9qd99 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing HEAD (97c62a6) to main diff using: clickbench_partitioned
Results will be posted here when complete


File an issue against this benchmark runner

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants