Hooks to re-evaluate table level filters/indexes during file scans

Consider the scenario of:

```sql
SELECT *
FROM large_table
JOIN small_table ON large_table.id = small_table.id
WHERE small_table.name = 'Adrian';
```

As per [our recent blog post](https://datafusion.apache.org/blog/2025/09/10/dynamic-filters/) we will first scan `small_table`, find the `id` for `'Adrian'` and then scan `large_table` with that information available. But what if we had an external table level point lookup index for `large_table.id`? We won't be able to use that during the scan.

One option is to add hooks to the parquet readers that get called before each scan, something like:


```rust
trait ScanPlanUpdater {
   async fn rescan(&self, file: PartitionedFile, plan: FileScanPlan) -> Result<FileScanPlan>;
}
```

Then we call this before we do any more work on this file to allow checking the point lookup index. The main issue with this option is that it could result in *a lot more* of lookups into the point lookup index than if it was done once at the table level. Maybe implementations of `ScanPlanUpdater` can have some sort of cache? I don't see a way to do it at the table level, the concept of a table is long gone by this point and I can't think of a low friction way to apply a filter to an entire `DataSourceExec`.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Hooks to re-evaluate table level filters/indexes during file scans #17954

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Hooks to re-evaluate table level filters/indexes during file scans #17954

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions