Skip to content

Conversation

@adriangb
Copy link
Contributor

@adriangb adriangb commented Dec 5, 2025

Closes #14993

Once this is merged I think we can say we support projection expression pushdown into scans and it is implemented for Parquet.

Remaining TODOs which I think should be tracked in other issues (I'll find them or create them later):

@adriangb adriangb requested a review from alamb December 5, 2025 14:28
@github-actions github-actions bot added physical-expr Changes to the physical-expr crates datasource Changes to the datasource crate sqllogictest SQL Logic Tests (.slt) labels Dec 5, 2025
@github-actions github-actions bot removed the sqllogictest SQL Logic Tests (.slt) label Dec 5, 2025
let table_schema = table_schema.into();
Self {
projection: SplitProjection::unprojected(&table_schema),
projection: ProjectionExprs::from_schema(table_schema.table_schema()),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic here of creating this upfront even if it will often be replaced is that the most important thing is that each time we create a ParquetOpener (once per file) we do the minimum amount of compute. Pre-computing here it and storing it in ParquetSource is the easiest way to do that.

@github-actions github-actions bot added the core Core DataFusion crate label Dec 5, 2025
github-merge-queue bot pushed a commit that referenced this pull request Dec 8, 2025
This allows it to be Arc'ed, have multiple references, etc. It's a
backwards compatible change (aside from producing compiler warnings
about unnecessarily mutable variables).

This will help with #19111
where we'll want to re-use a simplifier instance for predicate and
projection.
github-merge-queue bot pushed a commit that referenced this pull request Dec 9, 2025
This PR does some refactoring of `PhysicalExprAdapter` and
`PhysicalExprSimplifier` that I found necessary and/or beneficial while
working on #19111.

## Changes made

### Replace `PhysicalExprAdapter::with_partition_values` with
`replace_columns_with_literals`

This is a nice improvement because it:
1. Makes the `PhysicalExprAdapter` trait that users might need to
implement simpler (less boilerplate for users).
2. Decouples these two transformations so that we can replace partition
values and then apply a projection without having to also do the schema
mapping (it would be from the logical schema to the logical schema,
confusing and a waste of compute). I ran into this need in
#19111. I think there may be
other ways of doing it (e.g. piping in the expected output schema from
ParquetSource) but it felt nicer this way and I expect other places may
also need the decoupled transformation.
3. I think we can use it in the future to implement #19089 (edit:
evidently I was right, see identical function in
#19136).
4. It's less lines of code 😄

This will require any users calling `PhysicalExprAdapter` directly to
change their code, I can add an entry to the upgrade guide.


### Remove partition pruning logic from `FilePruner` and deprecate now
unused `PrunableStatistics` and `CompositePruningStatistics`.

Since we replace partition values with literals we no longer need these
structures, they get handled like any other literals.
This is a good chunk of code / complexity that we can bin off.


### Use `TableSchema` instead of `SchemaRef` + `Vec<FieldRef>` in
`ParquetOpener`

`TableSchema` is basically `SchemaRef` + `Vec<FieldRef>` already and
since `ParquetSource` has a `TableSchema` its less code and less clones
to propagate that into `ParquetOpener`

---------

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
@adriangb adriangb force-pushed the parquet-expression-pushdown branch from d002d19 to 6aba2d1 Compare December 9, 2025 18:15
@github-actions github-actions bot removed physical-expr Changes to the physical-expr crates core Core DataFusion crate labels Dec 9, 2025
Comment on lines 326 to 344
if let Some(expr_adapter_factory) = expr_adapter_factory.as_ref() {
// After rewriting to the file schema, further simplifications may be possible.
// For example, if `'a' = col_that_is_missing` becomes `'a' = NULL` that can then be simplified to `FALSE`
// and we can avoid doing any more work on the file (bloom filters, loading the page index, etc.).
// Additionally, if any casts were inserted we can move casts from the column to the literal side:
// `CAST(col AS INT) = 5` can become `col = CAST(5 AS <col type>)`, which can be evaluated statically.
let simplifier = PhysicalExprSimplifier::new(&physical_file_schema);
let rewriter = expr_adapter_factory.create(
Arc::clone(&logical_file_schema),
Arc::clone(&physical_file_schema),
);
predicate = predicate
.map(|p| {
let expr = expr_adapter_factory
.create(
Arc::clone(&logical_file_schema),
Arc::clone(&physical_file_schema),
)
.rewrite(p)?;
// After rewriting to the file schema, further simplifications may be possible.
// For example, if `'a' = col_that_is_missing` becomes `'a' = NULL` that can then be simplified to `FALSE`
// and we can avoid doing any more work on the file (bloom filters, loading the page index, etc.).
PhysicalExprSimplifier::new(&physical_file_schema).simplify(expr)
})
.map(|p| simplifier.simplify(rewriter.rewrite(p)?))
.transpose()?;
predicate_file_schema = Arc::clone(&physical_file_schema);
// Adapt projections to the physical file schema as well
projection = projection
.try_map_exprs(|p| simplifier.simplify(rewriter.rewrite(p)?))?;
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Recommend viewing this in split view. The new pattern of applying the expression adapter and simplifer to both the projection and predicate in a unified manner is

Image

@github-actions github-actions bot added the documentation Improvements or additions to documentation label Dec 9, 2025
@alamb
Copy link
Contributor

alamb commented Dec 10, 2025

Replace remaining uses of SchemaAdapter with PhysicalExprAdapter and decide if we want to actually deprecate SchemaAdapter

I think this is important (though a tech debt item to be sure)

@alamb
Copy link
Contributor

alamb commented Dec 10, 2025

Starting to check this one out

@alamb
Copy link
Contributor

alamb commented Dec 10, 2025

run benchmarks

@alamb
Copy link
Contributor

alamb commented Dec 10, 2025

run benchmark clickbench_pushdown

@alamb-ghbot
Copy link

🤖 ./gh_compare_branch.sh gh_compare_branch.sh Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing parquet-expression-pushdown (a560759) to dc78613 diff using: tpch_mem clickbench_partitioned clickbench_extended
Results will be posted here when complete

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @adriangb

I went through this code carefully and it looks really nice. Super well done! It has been quite a road but the result is 🧑‍🍳 👌

I do think we should work on deprecating SchemaAdapter for this release, to ensure that we have ported over all tests, for example.

let projection = Arc::clone(&self.projection);
let logical_file_schema = Arc::clone(self.table_schema.table_schema());
// Apply partition column replacement to projection expressions
let mut projection = self.projection.clone();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found this code now very nice and easy to follow

projector.project_batch(&b)
b = projector.project_batch(&b)?;
if replace_schema {
// Ensure the output batch has the expected schema.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the comments -- I see now that this is similar, but not the same, as what SchemaAdapter did.

@adriangb
Copy link
Contributor Author

adriangb commented Dec 10, 2025

I will follow up with deprecating SchemaAdapter. We already have a tracking issue so 👍🏻

edit: #16800

@alamb-ghbot
Copy link

🤖: Benchmark completed

Details

Comparing HEAD and parquet-expression-pushdown
--------------------
Benchmark clickbench_extended.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃        HEAD ┃ parquet-expression-pushdown ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0     │  2721.07 ms │                  2736.82 ms │     no change │
│ QQuery 1     │  1292.73 ms │                  1349.72 ms │     no change │
│ QQuery 2     │  2524.63 ms │                  2601.50 ms │     no change │
│ QQuery 3     │  1154.30 ms │                  1133.46 ms │     no change │
│ QQuery 4     │  2307.05 ms │                  2266.25 ms │     no change │
│ QQuery 5     │ 28732.44 ms │                 28344.86 ms │     no change │
│ QQuery 6     │  3842.87 ms │                  3958.96 ms │     no change │
│ QQuery 7     │  3935.96 ms │                  3688.65 ms │ +1.07x faster │
└──────────────┴─────────────┴─────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                          ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                          │ 46511.06ms │
│ Total Time (parquet-expression-pushdown)   │ 46080.23ms │
│ Average Time (HEAD)                        │  5813.88ms │
│ Average Time (parquet-expression-pushdown) │  5760.03ms │
│ Queries Faster                             │          1 │
│ Queries Slower                             │          0 │
│ Queries with No Change                     │          7 │
│ Queries with Failure                       │          0 │
└────────────────────────────────────────────┴────────────┘
--------------------
Benchmark clickbench_partitioned.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃        HEAD ┃ parquet-expression-pushdown ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0     │     2.42 ms │                     2.38 ms │     no change │
│ QQuery 1     │    51.29 ms │                    50.96 ms │     no change │
│ QQuery 2     │   134.35 ms │                   136.96 ms │     no change │
│ QQuery 3     │   169.90 ms │                   154.51 ms │ +1.10x faster │
│ QQuery 4     │  1168.60 ms │                  1157.25 ms │     no change │
│ QQuery 5     │  1559.63 ms │                  1527.51 ms │     no change │
│ QQuery 6     │     2.13 ms │                     2.16 ms │     no change │
│ QQuery 7     │    56.51 ms │                    56.88 ms │     no change │
│ QQuery 8     │  1520.95 ms │                  1489.90 ms │     no change │
│ QQuery 9     │  1954.75 ms │                  1958.49 ms │     no change │
│ QQuery 10    │   370.32 ms │                   396.79 ms │  1.07x slower │
│ QQuery 11    │   423.96 ms │                   446.94 ms │  1.05x slower │
│ QQuery 12    │  1456.61 ms │                  1408.72 ms │     no change │
│ QQuery 13    │  2067.61 ms │                  2114.71 ms │     no change │
│ QQuery 14    │  1326.83 ms │                  1322.19 ms │     no change │
│ QQuery 15    │  1322.46 ms │                  1263.95 ms │     no change │
│ QQuery 16    │  2791.90 ms │                  2751.82 ms │     no change │
│ QQuery 17    │  2704.24 ms │                  2741.74 ms │     no change │
│ QQuery 18    │  6233.25 ms │                  5199.36 ms │ +1.20x faster │
│ QQuery 19    │   128.12 ms │                   121.68 ms │ +1.05x faster │
│ QQuery 20    │  2001.23 ms │                  1914.54 ms │     no change │
│ QQuery 21    │  2270.79 ms │                  2250.03 ms │     no change │
│ QQuery 22    │  6912.99 ms │                  3850.20 ms │ +1.80x faster │
│ QQuery 23    │ 24834.41 ms │                 12818.38 ms │ +1.94x faster │
│ QQuery 24    │   220.95 ms │                   210.68 ms │     no change │
│ QQuery 25    │   482.33 ms │                   494.99 ms │     no change │
│ QQuery 26    │   239.10 ms │                   227.76 ms │     no change │
│ QQuery 27    │  2799.37 ms │                  2724.83 ms │     no change │
│ QQuery 28    │ 24523.37 ms │                 23512.76 ms │     no change │
│ QQuery 29    │   970.21 ms │                   940.01 ms │     no change │
│ QQuery 30    │  1409.46 ms │                  1355.68 ms │     no change │
│ QQuery 31    │  1363.73 ms │                  1341.41 ms │     no change │
│ QQuery 32    │  5342.92 ms │                  4828.96 ms │ +1.11x faster │
│ QQuery 33    │  6289.19 ms │                  6000.26 ms │     no change │
│ QQuery 34    │  6273.54 ms │                  6240.09 ms │     no change │
│ QQuery 35    │  2058.76 ms │                  1912.86 ms │ +1.08x faster │
│ QQuery 36    │   122.18 ms │                   118.36 ms │     no change │
│ QQuery 37    │    56.07 ms │                    53.79 ms │     no change │
│ QQuery 38    │   122.32 ms │                   118.91 ms │     no change │
│ QQuery 39    │   191.74 ms │                   194.07 ms │     no change │
│ QQuery 40    │    45.21 ms │                    46.68 ms │     no change │
│ QQuery 41    │    41.47 ms │                    41.46 ms │     no change │
│ QQuery 42    │    34.32 ms │                    34.98 ms │     no change │
└──────────────┴─────────────┴─────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┓
┃ Benchmark Summary                          ┃             ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━┩
│ Total Time (HEAD)                          │ 114051.51ms │
│ Total Time (parquet-expression-pushdown)   │  95536.55ms │
│ Average Time (HEAD)                        │   2652.36ms │
│ Average Time (parquet-expression-pushdown) │   2221.78ms │
│ Queries Faster                             │           7 │
│ Queries Slower                             │           2 │
│ Queries with No Change                     │          34 │
│ Queries with Failure                       │           0 │
└────────────────────────────────────────────┴─────────────┘
--------------------
Benchmark tpch_mem_sf1.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┓
┃ Query        ┃      HEAD ┃ parquet-expression-pushdown ┃       Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━┩
│ QQuery 1     │ 137.43 ms │                   136.51 ms │    no change │
│ QQuery 2     │  29.49 ms │                    31.24 ms │ 1.06x slower │
│ QQuery 3     │  38.61 ms │                    37.14 ms │    no change │
│ QQuery 4     │  29.58 ms │                    30.02 ms │    no change │
│ QQuery 5     │  89.28 ms │                    87.43 ms │    no change │
│ QQuery 6     │  19.78 ms │                    19.90 ms │    no change │
│ QQuery 7     │ 227.79 ms │                   236.45 ms │    no change │
│ QQuery 8     │  36.28 ms │                    36.19 ms │    no change │
│ QQuery 9     │  96.56 ms │                   108.39 ms │ 1.12x slower │
│ QQuery 10    │  63.55 ms │                    63.45 ms │    no change │
│ QQuery 11    │  19.07 ms │                    19.28 ms │    no change │
│ QQuery 12    │  52.70 ms │                    51.68 ms │    no change │
│ QQuery 13    │  48.62 ms │                    50.24 ms │    no change │
│ QQuery 14    │  14.12 ms │                    14.16 ms │    no change │
│ QQuery 15    │  25.22 ms │                    25.03 ms │    no change │
│ QQuery 16    │  25.60 ms │                    25.34 ms │    no change │
│ QQuery 17    │ 152.99 ms │                   159.14 ms │    no change │
│ QQuery 18    │ 291.05 ms │                   291.70 ms │    no change │
│ QQuery 19    │  37.16 ms │                    37.60 ms │    no change │
│ QQuery 20    │  50.35 ms │                    50.69 ms │    no change │
│ QQuery 21    │ 327.29 ms │                   313.53 ms │    no change │
│ QQuery 22    │  17.81 ms │                    17.95 ms │    no change │
└──────────────┴───────────┴─────────────────────────────┴──────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Benchmark Summary                          ┃           ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ Total Time (HEAD)                          │ 1830.34ms │
│ Total Time (parquet-expression-pushdown)   │ 1843.05ms │
│ Average Time (HEAD)                        │   83.20ms │
│ Average Time (parquet-expression-pushdown) │   83.78ms │
│ Queries Faster                             │         0 │
│ Queries Slower                             │         2 │
│ Queries with No Change                     │        20 │
│ Queries with Failure                       │         0 │
└────────────────────────────────────────────┴───────────┘

@alamb-ghbot
Copy link

🤖 ./gh_compare_branch.sh gh_compare_branch.sh Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing parquet-expression-pushdown (a560759) to dc78613 diff using: clickbench_pushdown
Results will be posted here when complete

@adriangb
Copy link
Contributor Author

│ QQuery 22 │ 6912.99 ms │ 3850.20 ms │ +1.80x faster │
│ QQuery 23 │ 24834.41 ms │ 12818.38 ms │ +1.94x faster │

Well that's unexpected but welcome 😆

@adriangb
Copy link
Contributor Author

Benchmarks generally look great!
The only slowdown >10% I see is │ QQuery 9 │ 96.56 ms │ 108.39 ms │ 1.12x slower │ for tpch_mem_sf1 Those are smaller / faster runs, I think they tend to be more noisy. I think this is noise. And on the balance there's other slower queries that became 1.94x faster, so on the balance this is a huge perf improvement.
With that I'm going to go ahead and merge this 😄

@adriangb adriangb added this pull request to the merge queue Dec 10, 2025
Merged via the queue into apache:main with commit 021188e Dec 10, 2025
28 checks passed
@adriangb adriangb deleted the parquet-expression-pushdown branch December 10, 2025 16:48
@alamb-ghbot
Copy link

🤖: Benchmark completed

Details

Comparing HEAD and parquet-expression-pushdown
--------------------
Benchmark clickbench_pushdown.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃        HEAD ┃ parquet-expression-pushdown ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0     │     2.26 ms │                     2.41 ms │  1.07x slower │
│ QQuery 1     │    53.62 ms │                    51.61 ms │     no change │
│ QQuery 2     │   134.60 ms │                   130.53 ms │     no change │
│ QQuery 3     │   155.17 ms │                   154.34 ms │     no change │
│ QQuery 4     │  1123.75 ms │                  1123.81 ms │     no change │
│ QQuery 5     │  1514.37 ms │                  1536.30 ms │     no change │
│ QQuery 6     │     2.26 ms │                     2.17 ms │     no change │
│ QQuery 7     │    76.68 ms │                    70.52 ms │ +1.09x faster │
│ QQuery 8     │  1459.28 ms │                  1420.99 ms │     no change │
│ QQuery 9     │  1878.44 ms │                  1795.30 ms │     no change │
│ QQuery 10    │   494.50 ms │                   520.26 ms │  1.05x slower │
│ QQuery 11    │   543.31 ms │                   562.08 ms │     no change │
│ QQuery 12    │  1509.50 ms │                  1558.53 ms │     no change │
│ QQuery 13    │  2190.46 ms │                  2215.45 ms │     no change │
│ QQuery 14    │  1456.12 ms │                  1488.77 ms │     no change │
│ QQuery 15    │  1269.48 ms │                  1282.77 ms │     no change │
│ QQuery 16    │  2727.17 ms │                  2769.96 ms │     no change │
│ QQuery 17    │  2704.53 ms │                  2685.11 ms │     no change │
│ QQuery 18    │  5150.98 ms │                  5049.46 ms │     no change │
│ QQuery 19    │   142.68 ms │                   146.58 ms │     no change │
│ QQuery 20    │  1904.23 ms │                  1877.19 ms │     no change │
│ QQuery 21    │  2266.80 ms │                  2294.33 ms │     no change │
│ QQuery 22    │  3917.83 ms │                  3993.94 ms │     no change │
│ QQuery 23    │  1073.86 ms │                  1092.86 ms │     no change │
│ QQuery 24    │   245.12 ms │                   249.38 ms │     no change │
│ QQuery 25    │   611.62 ms │                   628.58 ms │     no change │
│ QQuery 26    │   330.76 ms │                   333.03 ms │     no change │
│ QQuery 27    │  2981.19 ms │                  3010.61 ms │     no change │
│ QQuery 28    │ 24981.98 ms │                 23806.92 ms │     no change │
│ QQuery 29    │   956.72 ms │                   947.38 ms │     no change │
│ QQuery 30    │  1326.57 ms │                  1367.64 ms │     no change │
│ QQuery 31    │  1360.53 ms │                  1331.76 ms │     no change │
│ QQuery 32    │  4810.38 ms │                  4526.22 ms │ +1.06x faster │
│ QQuery 33    │  5765.04 ms │                  5639.42 ms │     no change │
│ QQuery 34    │  6089.36 ms │                  5952.52 ms │     no change │
│ QQuery 35    │  1896.46 ms │                  1900.08 ms │     no change │
│ QQuery 36    │    29.49 ms │                    28.57 ms │     no change │
│ QQuery 37    │    28.54 ms │                    28.12 ms │     no change │
│ QQuery 38    │    29.20 ms │                    28.51 ms │     no change │
│ QQuery 39    │    27.29 ms │                    27.22 ms │     no change │
│ QQuery 40    │    31.24 ms │                    29.95 ms │     no change │
│ QQuery 41    │    28.76 ms │                    28.28 ms │     no change │
│ QQuery 42    │    27.89 ms │                    28.13 ms │     no change │
└──────────────┴─────────────┴─────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                          ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                          │ 85310.04ms │
│ Total Time (parquet-expression-pushdown)   │ 83717.59ms │
│ Average Time (HEAD)                        │  1983.95ms │
│ Average Time (parquet-expression-pushdown) │  1946.92ms │
│ Queries Faster                             │          2 │
│ Queries Slower                             │          2 │
│ Queries with No Change                     │         39 │
│ Queries with Failure                       │          0 │
└────────────────────────────────────────────┴────────────┘

@alamb
Copy link
Contributor

alamb commented Dec 10, 2025

I know we are trying to go fast, but I think for some of these slightly less trivial PRs we should consider leaving them open for a while after approval and before merge to allow additional time to review from people who may not be awake, as described in https://datafusion.apache.org/contributor-guide/index.html#major-and-minor-prs

The major/minor distinction is definitely subjective, but I do think it would be good to make sure as many people who are interested get the chance to comment if they want

@alamb
Copy link
Contributor

alamb commented Dec 10, 2025

Well that's unexpected but welcome 😆

Or maybe noise -- I'll see if it reprodces

@alamb
Copy link
Contributor

alamb commented Dec 10, 2025

run benchmark clickbench_partitioned

@adriangb
Copy link
Contributor Author

I know we are trying to go fast, but I think for some of these slightly less trivial PRs we should consider leaving them open for a while after approval and before merge to allow additional time to review from people who may not be awake, as described in https://datafusion.apache.org/contributor-guide/index.html#major-and-minor-prs

The major/minor distinction is definitely subjective, but I do think it would be good to make sure as many people who are interested get the chance to comment if they want

Noted, thank you for the reminder. Just very excited to close the loop on this change. I’m happy to address any feedback that comes in the next couple days or revert if there are issues

@alamb-ghbot
Copy link

🤖 ./gh_compare_branch.sh gh_compare_branch.sh Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing parquet-expression-pushdown (a560759) to dc78613 diff using: clickbench_partitioned
Results will be posted here when complete

@alamb-ghbot
Copy link

🤖: Benchmark completed

Details

Comparing HEAD and parquet-expression-pushdown
--------------------
Benchmark clickbench_partitioned.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃        HEAD ┃ parquet-expression-pushdown ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0     │     2.66 ms │                     2.44 ms │ +1.09x faster │
│ QQuery 1     │    52.37 ms │                    52.16 ms │     no change │
│ QQuery 2     │   132.50 ms │                   137.61 ms │     no change │
│ QQuery 3     │   156.09 ms │                   157.64 ms │     no change │
│ QQuery 4     │  1086.85 ms │                  1126.73 ms │     no change │
│ QQuery 5     │  1465.56 ms │                  1546.83 ms │  1.06x slower │
│ QQuery 6     │     2.32 ms │                     2.33 ms │     no change │
│ QQuery 7     │    55.52 ms │                    58.52 ms │  1.05x slower │
│ QQuery 8     │  1421.21 ms │                  1472.01 ms │     no change │
│ QQuery 9     │  1888.56 ms │                  1983.74 ms │  1.05x slower │
│ QQuery 10    │   362.53 ms │                   403.26 ms │  1.11x slower │
│ QQuery 11    │   412.21 ms │                   465.76 ms │  1.13x slower │
│ QQuery 12    │  1358.08 ms │                  1433.94 ms │  1.06x slower │
│ QQuery 13    │  1985.98 ms │                  2197.32 ms │  1.11x slower │
│ QQuery 14    │  1245.50 ms │                  1378.06 ms │  1.11x slower │
│ QQuery 15    │  1238.65 ms │                  1316.70 ms │  1.06x slower │
│ QQuery 16    │  2615.62 ms │                  2878.27 ms │  1.10x slower │
│ QQuery 17    │  2628.17 ms │                  2851.35 ms │  1.08x slower │
│ QQuery 18    │  4949.58 ms │                  5467.01 ms │  1.10x slower │
│ QQuery 19    │   123.13 ms │                   131.93 ms │  1.07x slower │
│ QQuery 20    │  1867.45 ms │                  2053.97 ms │  1.10x slower │
│ QQuery 21    │  2172.60 ms │                  2428.20 ms │  1.12x slower │
│ QQuery 22    │  3673.32 ms │                  4133.95 ms │  1.13x slower │
│ QQuery 23    │ 12274.34 ms │                 14116.63 ms │  1.15x slower │
│ QQuery 24    │   208.12 ms │                   246.54 ms │  1.18x slower │
│ QQuery 25    │   454.12 ms │                   519.11 ms │  1.14x slower │
│ QQuery 26    │   218.45 ms │                   244.25 ms │  1.12x slower │
│ QQuery 27    │  2672.43 ms │                  2912.40 ms │  1.09x slower │
│ QQuery 28    │ 24105.72 ms │                 25179.93 ms │     no change │
│ QQuery 29    │   949.26 ms │                  1063.45 ms │  1.12x slower │
│ QQuery 30    │  1304.31 ms │                  1470.26 ms │  1.13x slower │
│ QQuery 31    │  1352.16 ms │                  1491.63 ms │  1.10x slower │
│ QQuery 32    │  4634.82 ms │                  5505.02 ms │  1.19x slower │
│ QQuery 33    │  5562.99 ms │                  6473.22 ms │  1.16x slower │
│ QQuery 34    │  6427.87 ms │                  6776.48 ms │  1.05x slower │
│ QQuery 35    │  1978.07 ms │                  2155.63 ms │  1.09x slower │
│ QQuery 36    │   119.39 ms │                   123.67 ms │     no change │
│ QQuery 37    │    55.37 ms │                    56.54 ms │     no change │
│ QQuery 38    │   122.04 ms │                   125.64 ms │     no change │
│ QQuery 39    │   194.07 ms │                   208.48 ms │  1.07x slower │
│ QQuery 40    │    45.00 ms │                    47.94 ms │  1.07x slower │
│ QQuery 41    │    41.12 ms │                    44.05 ms │  1.07x slower │
│ QQuery 42    │    34.80 ms │                    36.86 ms │  1.06x slower │
└──────────────┴─────────────┴─────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┓
┃ Benchmark Summary                          ┃             ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━┩
│ Total Time (HEAD)                          │  93650.93ms │
│ Total Time (parquet-expression-pushdown)   │ 102477.48ms │
│ Average Time (HEAD)                        │   2177.93ms │
│ Average Time (parquet-expression-pushdown) │   2383.20ms │
│ Queries Faster                             │           1 │
│ Queries Slower                             │          32 │
│ Queries with No Change                     │          10 │
│ Queries with Failure                       │           0 │
└────────────────────────────────────────────┴─────────────┘

@adriangb
Copy link
Contributor Author

@alamb hmm I'm going to re-run again. That last run seems quite... interesting.

@adriangb
Copy link
Contributor Author

run benchmark clickbench_partitioned

@alamb-ghbot
Copy link

🤖 ./gh_compare_branch.sh gh_compare_branch.sh Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing parquet-expression-pushdown (a560759) to dc78613 diff using: clickbench_partitioned
Results will be posted here when complete

@alamb-ghbot
Copy link

🤖: Benchmark completed

Details

Comparing HEAD and parquet-expression-pushdown
--------------------
Benchmark clickbench_partitioned.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃        HEAD ┃ parquet-expression-pushdown ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0     │     2.40 ms │                     2.37 ms │     no change │
│ QQuery 1     │    52.61 ms │                    49.25 ms │ +1.07x faster │
│ QQuery 2     │   135.64 ms │                   134.63 ms │     no change │
│ QQuery 3     │   156.73 ms │                   153.75 ms │     no change │
│ QQuery 4     │  1166.55 ms │                  1078.23 ms │ +1.08x faster │
│ QQuery 5     │  1519.99 ms │                  1516.74 ms │     no change │
│ QQuery 6     │     2.31 ms │                     2.07 ms │ +1.11x faster │
│ QQuery 7     │    56.42 ms │                    54.52 ms │     no change │
│ QQuery 8     │  1490.54 ms │                  1428.63 ms │     no change │
│ QQuery 9     │  1904.07 ms │                  1885.60 ms │     no change │
│ QQuery 10    │   421.98 ms │                   402.19 ms │     no change │
│ QQuery 11    │   471.67 ms │                   448.42 ms │     no change │
│ QQuery 12    │  1409.18 ms │                  1424.11 ms │     no change │
│ QQuery 13    │  2078.34 ms │                  2086.84 ms │     no change │
│ QQuery 14    │  1303.71 ms │                  1306.96 ms │     no change │
│ QQuery 15    │  1296.91 ms │                  1302.74 ms │     no change │
│ QQuery 16    │  2670.09 ms │                  2710.35 ms │     no change │
│ QQuery 17    │  2650.82 ms │                  2708.04 ms │     no change │
│ QQuery 18    │  5010.92 ms │                  5052.67 ms │     no change │
│ QQuery 19    │   126.50 ms │                   122.06 ms │     no change │
│ QQuery 20    │  1855.68 ms │                  1936.77 ms │     no change │
│ QQuery 21    │  2183.46 ms │                  2248.01 ms │     no change │
│ QQuery 22    │  3723.50 ms │                  3843.00 ms │     no change │
│ QQuery 23    │ 12288.36 ms │                 12894.64 ms │     no change │
│ QQuery 24    │   219.92 ms │                   228.60 ms │     no change │
│ QQuery 25    │   463.80 ms │                   480.40 ms │     no change │
│ QQuery 26    │   220.68 ms │                   227.54 ms │     no change │
│ QQuery 27    │  2689.73 ms │                  2740.15 ms │     no change │
│ QQuery 28    │ 24186.45 ms │                 23504.51 ms │     no change │
│ QQuery 29    │   971.75 ms │                   973.99 ms │     no change │
│ QQuery 30    │  1318.79 ms │                  1369.42 ms │     no change │
│ QQuery 31    │  1335.78 ms │                  1360.98 ms │     no change │
│ QQuery 32    │  4891.45 ms │                  4737.19 ms │     no change │
│ QQuery 33    │  5859.44 ms │                  5843.16 ms │     no change │
│ QQuery 34    │  5937.03 ms │                  5893.95 ms │     no change │
│ QQuery 35    │  1876.42 ms │                  1930.82 ms │     no change │
│ QQuery 36    │   120.91 ms │                   121.32 ms │     no change │
│ QQuery 37    │    55.09 ms │                    56.34 ms │     no change │
│ QQuery 38    │   117.28 ms │                   118.01 ms │     no change │
│ QQuery 39    │   191.85 ms │                   189.64 ms │     no change │
│ QQuery 40    │    44.67 ms │                    43.58 ms │     no change │
│ QQuery 41    │    41.26 ms │                    39.75 ms │     no change │
│ QQuery 42    │    35.04 ms │                    34.81 ms │     no change │
└──────────────┴─────────────┴─────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                          ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                          │ 94555.72ms │
│ Total Time (parquet-expression-pushdown)   │ 94686.77ms │
│ Average Time (HEAD)                        │  2198.97ms │
│ Average Time (parquet-expression-pushdown) │  2202.02ms │
│ Queries Faster                             │          3 │
│ Queries Slower                             │          0 │
│ Queries with No Change                     │         40 │
│ Queries with Failure                       │          0 │
└────────────────────────────────────────────┴────────────┘

@adriangb
Copy link
Contributor Author

I think my conclusion here is that the benchmarks are just too noisy to draw any conclusions but this change (as expected) seems neutral on performance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

datasource Changes to the datasource crate documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support Push down expression evaluation in TableProviders

3 participants