Skip to content

refactor(physical-expr): add proto ctx expr helpers and adopt in InList/Like#22513

Merged
adriangb merged 2 commits into
apache:mainfrom
pydantic:proto-expr-ctx-helpers
May 25, 2026
Merged

refactor(physical-expr): add proto ctx expr helpers and adopt in InList/Like#22513
adriangb merged 2 commits into
apache:mainfrom
pydantic:proto-expr-ctx-helpers

Conversation

@adriangb
Copy link
Copy Markdown
Contributor

@adriangb adriangb commented May 25, 2026

Which issue does this PR close?

Rationale for this change

Expressions migrating to the try_to_proto / try_from_proto hooks keep hand-rolling the same boilerplate that the central datafusion-proto match already factors out via parse_physical_exprs, parse_required_physical_expr, and serialize_physical_exprs. Those free functions can't be reused by expression authors: they take PhysicalProtoConverterExtension / PhysicalPlanDecodeContext, which the PhysicalExprEncodeCtx / PhysicalExprDecodeCtx surfaces deliberately hide.

This was raised in review on #22503 — rather than re-implement the list maps and "missing required field" checks in every migrated expression, expose the same shapes on the ctx structs.

What changes are included in this PR?

  • datafusion-physical-expr-common: three thin convenience methods, built on the existing encode_child / decode primitives:
    • PhysicalExprEncodeCtx::encode_children_expressions
    • PhysicalExprDecodeCtx::decode_required_expression (also standardizes the Missing required field "<name>" error so each expression no longer spells its own)
    • PhysicalExprDecodeCtx::decode_children_expressions
  • datafusion-physical-expr: adopt them in InListExpr and LikeExpr, removing the hand-rolled list maps and per-field ok_or_else checks.

Behavior note

decode_required_expression couples the presence check with the decode, so LikeExpr now decodes children left-to-right rather than validating both required fields up front. The end result is unchanged (a missing required field still errors), but a present sibling is decoded before a later missing field is reported. The try_from_proto_rejects_missing_pattern unit test is updated to reflect this.

Are these changes tested?

Yes — covered by existing tests, no new ones needed:

  • The isolated proto_tests modules in in_list.rs and like.rs already exercise all three helpers (list encode/decode, required decode, and the missing-field + child-error paths) through the migrated try_to_proto / try_from_proto.
  • The datafusion-proto round-trip integration tests (roundtrip_inlist, roundtrip_like, roundtrip_filter_with_not_and_in_list, test_tpch_part_in_list_query_with_real_parquet_data, etc.) continue to pass.
  • cargo clippy --all-targets --features proto -D warnings is clean on the touched crates; cargo fmt --all applied.

Are there any user-facing changes?

No. The per-expression missing required field error wording is preserved: LikeExpr is byte-for-byte identical, and InList only changes its label from InList to InListExpr (the actual type name). The format now lives in one place (decode_required_expression) instead of being hand-spelled per expression.

@github-actions github-actions Bot added the physical-expr Changes to the physical-expr crates label May 25, 2026
adriangb and others added 2 commits May 25, 2026 09:50
Expressions migrating to the `try_to_proto`/`try_from_proto` hooks (see
apache#21835, apache#22418) repeatedly hand-roll the same boilerplate the central
`datafusion-proto` match already factors out via `parse_physical_exprs`,
`parse_required_physical_expr`, and `serialize_physical_exprs`. Those
free functions can't be reused by expression authors — they take
`PhysicalProtoConverterExtension`/`PhysicalPlanDecodeContext`, which the
ctx structs deliberately hide.

Add the same shapes as thin methods on the ctx structs, built on the
existing `encode_child`/`decode` primitives:

- `PhysicalExprEncodeCtx::encode_children_expressions`
- `PhysicalExprDecodeCtx::decode_required_expression`
- `PhysicalExprDecodeCtx::decode_children_expressions`

`decode_required_expression` takes the expression name and proto field
so its "missing required field" error still names where the error is
(e.g. `InListExpr is missing required field 'expr'`), while keeping the
format in one place.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replace the hand-rolled list maps and missing-required-field checks in
`InListExpr` and `LikeExpr`'s `try_to_proto`/`try_from_proto` with the
new `encode_children_expressions`, `decode_children_expressions`, and
`decode_required_expression` ctx methods.

Behavior note: `decode_required_expression` couples the presence check
with the decode, so `LikeExpr` now decodes children left-to-right rather
than validating both required fields up front. The end result is
unchanged (a missing required field still errors), but a present sibling
is decoded before a later missing field is reported; the
`try_from_proto_rejects_missing_pattern` test is updated accordingly.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@adriangb adriangb force-pushed the proto-expr-ctx-helpers branch from c0bf17f to f899f89 Compare May 25, 2026 14:53
@adriangb
Copy link
Copy Markdown
Contributor Author

@kumarUjjawal since you've reviewed some of the PRs for this EPIC could you take a look at this PR?

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds small proto encode/decode helper methods to PhysicalExprEncodeCtx / PhysicalExprDecodeCtx (in datafusion-physical-expr-common) and adopts them in LikeExpr and InListExpr to remove repeated boilerplate when implementing try_to_proto / try_from_proto for physical expressions.

Changes:

  • Added encode_children_expressions to PhysicalExprEncodeCtx for encoding repeated child expression fields.
  • Added decode_required_expression (standardized missing-required-field error) and decode_children_expressions to PhysicalExprDecodeCtx.
  • Refactored LikeExpr and InListExpr proto encode/decode implementations + adjusted the affected unit tests accordingly.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.

File Description
datafusion/physical-expr-common/src/physical_expr.rs Adds encode/decode convenience helpers on the proto ctx types to centralize repeated patterns.
datafusion/physical-expr/src/expressions/like.rs Uses decode_required_expression for required children; updates test to reflect decode-before-missing-field behavior.
datafusion/physical-expr/src/expressions/in_list.rs Uses decode_required_expression / decode_children_expressions / encode_children_expressions; updates missing-field test expectation.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Copy Markdown
Contributor

@kumarUjjawal kumarUjjawal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @adriangb Looks good!

@adriangb adriangb added this pull request to the merge queue May 25, 2026
Merged via the queue into apache:main with commit fcc9cc4 May 25, 2026
38 of 39 checks passed
@adriangb adriangb deleted the proto-expr-ctx-helpers branch May 25, 2026 16:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

physical-expr Changes to the physical-expr crates

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants