refactor(physical-expr): add proto ctx expr helpers and adopt in InList/Like#22513
Merged
Conversation
Expressions migrating to the `try_to_proto`/`try_from_proto` hooks (see apache#21835, apache#22418) repeatedly hand-roll the same boilerplate the central `datafusion-proto` match already factors out via `parse_physical_exprs`, `parse_required_physical_expr`, and `serialize_physical_exprs`. Those free functions can't be reused by expression authors — they take `PhysicalProtoConverterExtension`/`PhysicalPlanDecodeContext`, which the ctx structs deliberately hide. Add the same shapes as thin methods on the ctx structs, built on the existing `encode_child`/`decode` primitives: - `PhysicalExprEncodeCtx::encode_children_expressions` - `PhysicalExprDecodeCtx::decode_required_expression` - `PhysicalExprDecodeCtx::decode_children_expressions` `decode_required_expression` takes the expression name and proto field so its "missing required field" error still names where the error is (e.g. `InListExpr is missing required field 'expr'`), while keeping the format in one place. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replace the hand-rolled list maps and missing-required-field checks in `InListExpr` and `LikeExpr`'s `try_to_proto`/`try_from_proto` with the new `encode_children_expressions`, `decode_children_expressions`, and `decode_required_expression` ctx methods. Behavior note: `decode_required_expression` couples the presence check with the decode, so `LikeExpr` now decodes children left-to-right rather than validating both required fields up front. The end result is unchanged (a missing required field still errors), but a present sibling is decoded before a later missing field is reported; the `try_from_proto_rejects_missing_pattern` test is updated accordingly. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
c0bf17f to
f899f89
Compare
Contributor
Author
|
@kumarUjjawal since you've reviewed some of the PRs for this EPIC could you take a look at this PR? |
Contributor
There was a problem hiding this comment.
Pull request overview
This PR adds small proto encode/decode helper methods to PhysicalExprEncodeCtx / PhysicalExprDecodeCtx (in datafusion-physical-expr-common) and adopts them in LikeExpr and InListExpr to remove repeated boilerplate when implementing try_to_proto / try_from_proto for physical expressions.
Changes:
- Added
encode_children_expressionstoPhysicalExprEncodeCtxfor encoding repeated child expression fields. - Added
decode_required_expression(standardized missing-required-field error) anddecode_children_expressionstoPhysicalExprDecodeCtx. - Refactored
LikeExprandInListExprproto encode/decode implementations + adjusted the affected unit tests accordingly.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
datafusion/physical-expr-common/src/physical_expr.rs |
Adds encode/decode convenience helpers on the proto ctx types to centralize repeated patterns. |
datafusion/physical-expr/src/expressions/like.rs |
Uses decode_required_expression for required children; updates test to reflect decode-before-missing-field behavior. |
datafusion/physical-expr/src/expressions/in_list.rs |
Uses decode_required_expression / decode_children_expressions / encode_children_expressions; updates missing-field test expectation. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
kumarUjjawal
approved these changes
May 25, 2026
Contributor
kumarUjjawal
left a comment
There was a problem hiding this comment.
Thanks @adriangb Looks good!
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
PhysicalExprimplementations to usetry_from_proto/try_to_proto#22418 (decentralizingdatafusion-protoserialization onto the expressions themselves; follows Split proto serialization to encapsulate private state #21835, Port LikeExpr to use try_to_proto / try_from_proto #22471, refactor: port InListExpr to use try_to_proto/try_from_proto hooks #22503).Rationale for this change
Expressions migrating to the
try_to_proto/try_from_protohooks keep hand-rolling the same boilerplate that the centraldatafusion-protomatch already factors out viaparse_physical_exprs,parse_required_physical_expr, andserialize_physical_exprs. Those free functions can't be reused by expression authors: they takePhysicalProtoConverterExtension/PhysicalPlanDecodeContext, which thePhysicalExprEncodeCtx/PhysicalExprDecodeCtxsurfaces deliberately hide.This was raised in review on #22503 — rather than re-implement the list maps and "missing required field" checks in every migrated expression, expose the same shapes on the ctx structs.
What changes are included in this PR?
datafusion-physical-expr-common: three thin convenience methods, built on the existingencode_child/decodeprimitives:PhysicalExprEncodeCtx::encode_children_expressionsPhysicalExprDecodeCtx::decode_required_expression(also standardizes theMissing required field "<name>"error so each expression no longer spells its own)PhysicalExprDecodeCtx::decode_children_expressionsdatafusion-physical-expr: adopt them inInListExprandLikeExpr, removing the hand-rolled list maps and per-fieldok_or_elsechecks.Behavior note
decode_required_expressioncouples the presence check with the decode, soLikeExprnow decodes children left-to-right rather than validating both required fields up front. The end result is unchanged (a missing required field still errors), but a present sibling is decoded before a later missing field is reported. Thetry_from_proto_rejects_missing_patternunit test is updated to reflect this.Are these changes tested?
Yes — covered by existing tests, no new ones needed:
proto_testsmodules inin_list.rsandlike.rsalready exercise all three helpers (list encode/decode, required decode, and the missing-field + child-error paths) through the migratedtry_to_proto/try_from_proto.datafusion-protoround-trip integration tests (roundtrip_inlist,roundtrip_like,roundtrip_filter_with_not_and_in_list,test_tpch_part_in_list_query_with_real_parquet_data, etc.) continue to pass.cargo clippy --all-targets --features proto -D warningsis clean on the touched crates;cargo fmt --allapplied.Are there any user-facing changes?
No. The per-expression
missing required fielderror wording is preserved:LikeExpris byte-for-byte identical, andInListonly changes its label fromInListtoInListExpr(the actual type name). The format now lives in one place (decode_required_expression) instead of being hand-spelled per expression.