Skip to content

proto: preserve DynamicFilterPhysicalExpr identity across round-trip#21786

Open
adriangb wants to merge 7 commits intoapache:mainfrom
adriangb:ser-filters
Open

proto: preserve DynamicFilterPhysicalExpr identity across round-trip#21786
adriangb wants to merge 7 commits intoapache:mainfrom
adriangb:ser-filters

Conversation

@adriangb
Copy link
Copy Markdown
Contributor

@adriangb adriangb commented Apr 22, 2026

Which issue does this PR close?

Rationale for this change

Given a plan like

HashJoinExec(dynamic_filter_1 on a@0)
  (...left side of join)
  ProjectionExec(a := Column("a", source_index))
    DataSourceExec
      ParquetSource(predicate = dynamic_filter_2)

after serialize/deserialize the two DynamicFilterPhysicalExpr wrappers should still share the same mutable inner, so that a HashJoinExec update during execution is visible at the pushed-down ParquetSource and the scan prunes rows. Today this breaks for several reasons:

  1. serialize_physical_expr_with_converter calls snapshot_physical_expr, which replaces DynamicFilterPhysicalExpr with its current inner expression (often lit(true)) — identity is lost.
  2. The existing dedup key hashes the outer Arc::as_ptr, but the HashJoinExec side and the ParquetSource side hold different outer Arcs (one comes from with_new_children), so they never share expr_id.
  3. Even if the filter survived the top-level trip, HashJoinExec and SortExec (TopK) don't serialize their own dynamic_filter field — on deserialize they mint a fresh filter that's disconnected from anything pushed into the scan.

What changes are included in this PR?

Seven commits, grouped by concern. Refactor commits change API shape only and should be reviewable independently. Change commits add behavior.

Base

  1. proto: preserve DynamicFilterPhysicalExpr identity across round-trip — Adds PhysicalExpr::expression_id(&self) -> Option<u64> (default None). DynamicFilterPhysicalExpr gains a stable u64 id that follows the shared Arc<RwLock<Inner>> across with_new_children. New proto variant PhysicalDynamicFilterExprNode { current_expr, children, generation, is_complete } — the inner expr tree is serialized natively instead of snapshotted away. DeduplicatingSerializer stamps expr_id = expr.expression_id() (no more Arc::as_ptr/session_id/pid hashing). DeduplicatingDeserializer uses a single unified cache path: on miss, parse + cache; on hit, parse to recover this site's children and overlay via with_new_children on the cached canonical — gives DynamicFilter its shared-inner semantics without any type-specific code in the deserializer. Three proto-level round-trip tests cover shared-inner preservation, per-site remapped children with update propagation, and the FilterExec → ProjectionExec → DataSourceExec(ParquetSource) shape.

Refactor

  1. physical-expr: builder setters on DynamicFilterPhysicalExpr — Replaces a growing 5-arg constructor with fluent with_id(u64) / with_generation(u64) / with_is_complete(bool) setters on Self, following the HashJoinExecBuilder style. new(children, inner) keeps the default case.
  2. hash_join: public builder setter + accessor for the dynamic filter — Makes HashJoinExecBuilder::with_dynamic_filter public and changes its signature from the crate-private Option<HashJoinExecDynamicFilter> to plain Arc<DynamicFilterPhysicalExpr>, wrapping it internally. The inner SharedBuildAccumulator stays private; only the filter crosses the API boundary. Promotes dynamic_filter_for_test to dynamic_filter().
  3. sort: add SortExec::with_dynamic_filter builder setter + accessor — Mirror for SortExec. Fluent setter on Self + a dynamic_filter() accessor that reads through the TopKDynamicFilters wrapper.

Change

  1. proto: round-trip HashJoinExec dynamic_filter — Adds optional PhysicalExprNode dynamic_filter to HashJoinExecNode. to_proto emits the filter via the new accessor; from_proto parses it (hitting the id cache populated by the scan-side pushed-down predicate) and installs it via HashJoinExecBuilder::with_dynamic_filter. Because the cache returns the same canonical Arc, the join's filter and the scan's filter share inner automatically.
  2. proto: round-trip SortExec's TopK dynamic filter — Same pattern for SortExecNode. with_fetch may auto-create a TopK filter; with_dynamic_filter then replaces it with the sender's so the id matches the pushed-down scan's copy.

Integration tests

  1. test: add datafusion-tests workspace crate for cross-crate integration tests — Introduces a new datafusion-tests workspace member at datafusion/tests/. Tests living here can dev-depend on both datafusion and datafusion-proto simultaneously; putting them in either of those crates' own tests/ directory closes a dev-dep cycle caught by dev/depcheck. Hosts two end-to-end SQL tests that prove the round-trip actually drives scan-level pruning (see next section).

Comparison with #20416

PR #20416 introduces a generic PhysicalExprId { exact, shallow } and keeps an Arc-ptr default so every expression is stamped. This PR instead makes the identity hook opt-in per type (expression_id()), restricting the blast radius to the one type that needs it today. A follow-up can implement expression_id() on other types (e.g. InList, literals) to re-introduce generic within-process dedup on a case-by-case basis.

Are these changes tested?

Yes.

Unit tests (datafusion/physical-expr/src/expressions/dynamic_filters.rs):

  • test_expression_id_stable_across_with_new_children — the id survives with_new_children.

Proto-level round-trip tests (datafusion/proto/tests/cases/roundtrip_physical_plan.rs):

  • roundtrip_dynamic_filter_preserves_shared_inner — two wrappers in a BinaryExpr(And) predicate share inner after round-trip; an update() on one is observable via the other.
  • roundtrip_dynamic_filter_preserves_remapped_children — two wrappers with different effective children; each preserves its site-specific projection, both share identity, update() propagates, and current() on the remapped side applies the column substitution.
  • roundtrip_dynamic_filter_in_parquet_pushdown — the plan shape from the PR description (FilterExec → ProjectionExec → DataSourceExec(ParquetSource)).

End-to-end SQL tests (datafusion/tests/tests/dynamic_filter_proto_roundtrip.rs):

  • hash_join_dynamic_filter_prunes_via_sql — registers a parquet file, runs INNER JOIN ... WHERE b.n_nationkey = 5, round-trips via DeduplicatingProtoConverter, executes, and asserts the probe-side scan emitted strictly fewer rows than the full table.
  • topk_dynamic_filter_prunes_files_via_sql — writes two single-row parquet files (a.parquet key=1, b.parquet key=2), runs ORDER BY ... LIMIT 1 with target_partitions=1, round-trips, executes, and asserts the scan emitted exactly 1 row. b.parquet is pruned by row-group statistics once TopK's dynamic filter tightens after reading a.parquet.

Removed: the pre-existing test_expression_deduplication_arc_sharing, test_deduplication_within_plan_deserialization, test_deduplication_within_expr_deserialization, and two test_session_id_rotation_* tests — they asserted the generic Arc-ptr dedup contract that this PR deliberately drops.

CI is green: cargo fmt, clippy, circular dependency check, detect-unused-dependencies, all cargo test matrixes pass.

Are there any user-facing changes?

  • New trait method PhysicalExpr::expression_id(&self) -> Option<u64> with a safe default (None). Existing implementations keep compiling.
  • New public DynamicFilterPhysicalExpr surface: with_id, with_generation, with_is_complete, is_complete(). The Debug impl now hides the random id field so plan snapshots stay deterministic.
  • HashJoinExecBuilder::with_dynamic_filter promoted to pub with a Arc<DynamicFilterPhysicalExpr> signature; HashJoinExec::dynamic_filter() accessor promoted from dynamic_filter_for_test. Callers of the renamed method need to update.
  • New SortExec::with_dynamic_filter setter + SortExec::dynamic_filter() accessor.
  • DeduplicatingProtoConverter no longer deduplicates arbitrary expressions by Arc pointer. Plans that relied on that (e.g. for a large shared InList) will serialize independently until a per-type expression_id() is added. Dynamic filters now round-trip with shared state, which is the primary motivation.
  • New workspace member datafusion-tests (internal; publish = false) hosting cross-crate integration tests.

🤖 Generated with Claude Code

@github-actions github-actions Bot added physical-expr Changes to the physical-expr crates proto Related to proto crate labels Apr 22, 2026
@adriangb adriangb force-pushed the ser-filters branch 2 times, most recently from 2df2293 to 955117c Compare April 23, 2026 00:35
@github-actions github-actions Bot added core Core DataFusion crate physical-plan Changes to the physical-plan crate labels Apr 23, 2026
adriangb and others added 4 commits April 23, 2026 07:13
Without this change a plan with a DynamicFilterPhysicalExpr referenced
from two sites (e.g. a HashJoinExec and a pushed-down ParquetSource
predicate) loses referential integrity when serialized and deserialized:
the filter is snapshotted away at serialize time, and even if it
survived the existing Arc-pointer dedup scheme would give two sites
different ids because the pushed-down side comes from `with_new_children`
with a different outer Arc.

Changes:

- `PhysicalExpr::expression_id(&self) -> Option<u64>` new trait method,
  defaulting to None. Only DynamicFilterPhysicalExpr reports an identity.
- DynamicFilterPhysicalExpr stores a stable random u64 id that follows
  the shared inner Arc through `with_new_children`. Custom Debug hides
  the random id so plan snapshots stay deterministic.
- New proto variant PhysicalDynamicFilterExprNode carrying the current
  expression, the site's children view, generation, and is_complete.
- serialize_physical_expr_with_converter stops calling
  snapshot_physical_expr at the top — dynamic filters survive as
  themselves. HashTableLookupExpr still gets the lit(true) replacement.
- DeduplicatingSerializer is now stateless: it stamps
  `expr_id = expr.expression_id()`. The old session_id/Arc::as_ptr/pid
  hashing is dropped; dedup only fires for expressions that opt in via
  expression_id. Restoring within-process dedup for other types is a
  follow-up (implement expression_id on InList, literals, etc.).
- DeduplicatingDeserializer has a single unified cache path: on miss
  parse + cache, on hit parse once to recover this site's children and
  overlay via `with_new_children` on the cached canonical. This gives
  DynamicFilter its shared-inner semantics without any type-specific
  code in the deserializer.
- Three integration tests in roundtrip_physical_plan.rs cover shared
  inner preservation, per-site remapped children + update propagation
  including column remap, and the FilterExec → ProjectionExec →
  DataSourceExec(ParquetSource) shape from the PR description.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replace the 5-arg with_id_and_state(id, children, inner, generation,
is_complete) constructor with three fluent setters on Self that match
the HashJoinExecBuilder style used elsewhere in the crate:

    DynamicFilterPhysicalExpr::new(children, inner)
        .with_id(id)
        .with_generation(generation)
        .with_is_complete(is_complete)

new() keeps the default case (random id, generation 1, not complete).
The setters are intended for the deserialize side of proto round-trip
and assume the filter hasn't been shared yet; update() and
mark_complete() remain the correct path for live mutation.

No behavior change — only from_proto is migrated.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Change HashJoinExecBuilder::with_dynamic_filter from a crate-private
`Option<HashJoinExecDynamicFilter>` setter into a public API that takes
just `Arc<DynamicFilterPhysicalExpr>` and wraps it internally. The inner
`HashJoinExecDynamicFilter` and its `SharedBuildAccumulator` stay
private; only the filter itself crosses the API boundary.

Promote `dynamic_filter_for_test` to a plain `pub fn dynamic_filter()`
accessor — proto serialization has a legitimate non-test use for it.
Existing test caller migrated.

No behavior change. Sets up HashJoinExec.dynamic_filter to be
round-trippable through proto in the next commit.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Mirroring HashJoinExecBuilder::with_dynamic_filter: expose a fluent
setter on SortExec that installs a caller-provided
`Arc<DynamicFilterPhysicalExpr>` as the TopK dynamic filter, replacing
any auto-created one. Add a matching `dynamic_filter()` accessor that
reads through the `TopKDynamicFilters` wrapper.

No behavior change. Sets up SortExec.filter to be round-trippable
through proto in the next commit.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Add optional PhysicalExprNode dynamic_filter field to HashJoinExecNode.
Emit it from to_proto via the new HashJoinExec::dynamic_filter()
accessor; on deserialize, parse it and install via
HashJoinExecBuilder::with_dynamic_filter.

Critically, because the scan-side pushed-down predicate is already in
the id cache by the time we deserialize the HashJoinExec's field, the
cache hit returns the same canonical Arc — the join's filter and the
scan's filter share `inner` automatically. Build-side update() during
execution is observed by the scan, and the scan prunes rows.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Add optional PhysicalExprNode dynamic_filter field to SortExecNode.
Emit from to_proto via SortExec::dynamic_filter(); on deserialize parse
it and install via SortExec::with_dynamic_filter after the usual
with_fetch / with_preserve_partitioning chain. The with_fetch step may
auto-create a TopK filter when fetch is set; with_dynamic_filter then
replaces it with the one from the sender so the id matches the
pushed-down scan's copy (shared via the id cache).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@adriangb adriangb force-pushed the ser-filters branch 3 times, most recently from 1136f73 to 5ff6a25 Compare April 23, 2026 14:12
…n tests

Introduce a new workspace member `datafusion-tests` at datafusion/tests
dedicated to integration tests that need to depend on both `datafusion`
and `datafusion-proto`. Placing such tests in either of those crates'
own `tests/` directory would close a dev-dependency cycle caught by the
workspace's circular-dependency check (see `dev/depcheck`).

The first two tests exercise end-to-end SQL round-trips proving the
DynamicFilterPhysicalExpr wiring added in preceding commits actually
drives scan-level pruning:

- `hash_join_dynamic_filter_prunes_via_sql`: INNER JOIN with a WHERE
  on the build side; after round-trip the probe-side `ParquetSource`
  emits fewer rows than the full table.
- `topk_dynamic_filter_prunes_files_via_sql`: ORDER BY ... LIMIT 1
  over two single-row parquet files; after round-trip the second file
  is pruned by row-group statistics because TopK's dynamic filter
  tightens after reading the first, and the scan sees exactly one row.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@adriangb adriangb marked this pull request as ready for review April 23, 2026 14:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Core DataFusion crate physical-expr Changes to the physical-expr crates physical-plan Changes to the physical-plan crate proto Related to proto crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Serialize dynamic filters across network boundaries

1 participant