Skip to content

IcebergTableProvider::supports_filters_pushdown marks every filter as Inexact, causing a redundant FilterExec above IcebergTableScan #2363

@toutane

Description

@toutane

Is your feature request related to a problem or challenge?

IcebergTableProvider::supports_filters_pushdown in crates/integrations/datafusion/src/table/mod.rs:163-169 unconditionally returns TableProviderFilterPushDown::Inexact for every filter:

fn supports_filters_pushdown(
    &self,
    filters: &[&Expr],
) -> DFResult<Vec<TableProviderFilterPushDown>> {
    Ok(vec![TableProviderFilterPushDown::Inexact; filters.len()])
}

Inexact tells DataFusion: "the scan may apply this filter, but don't trust it, re-evaluate it above the scan." DataFusion therefore leaves a FilterExec on top of IcebergTableScan that re-evaluates the same predicate on every row the scan emits.

However, the scan does already apply the predicate exactly. In crates/iceberg/src/arrow/reader.rs:244-256, the bound predicate is pushed into the Parquet reader as an Arrow RowFilter (ArrowPredicateFn), so every batch yielded by the scan already satisfies the filter. The FilterExec above is pure overhead.

Reproducer

EXPLAIN ANALYZE output with the query SELECT * FROM t WHERE foo = 1 (a filter that is losslessly convertible to an iceberg Predicate):

CoalesceBatchesExec: ..., metrics=[output_rows=1, elapsed_compute=19µs]
  FilterExec: foo@0 = 1, metrics=[output_rows=1, elapsed_compute=90µs]     <-- redundant
    RepartitionExec: ..., metrics=[fetch_time=4.4ms, ...]
      IcebergTableScan predicate:[foo = 1] metrics=[]

FilterExec evaluates foo@0 = 1 a second time on every row the scan emits.

Describe the solution you'd like

In supports_filters_pushdown, return TableProviderFilterPushDown::Exact for filters that convert_filter_to_predicate translates losslessly into an iceberg Predicate, and Inexact (or Unsupported) otherwise.

The non-trivial part is detecting lossy conversions.

Willingness to contribute

I would be willing to contribute to this feature with guidance from the Iceberg Rust community

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions