Preserve logical cast field semantics during physical lowering and schema rewrite by kosiew · Pull Request #20836 · apache/datafusion

kosiew · 2026-03-10T03:21:42Z

Which issue does this PR close?

Part of Extend CastExpr and Converge on Single Cast Expression #20164

Rationale for this change

Physical lowering in planner.rs was dropping logical cast field semantics by forwarding only the target DataType when building physical cast expressions. That meant metadata-bearing target fields were rejected or reduced to type-only behavior, and planner-produced casts could diverge from adapter-produced casts.

This patch fixes that gap by preserving the logical target Field through lowering and related rewrite paths. It also aligns cast behavior across the planner and schema rewriter, so field metadata and logical nullability intent remain available in the produced physical expression.

What changes are included in this PR?

This PR updates physical cast construction to be field-aware end to end.

Replaces type-only cast lowering for Expr::Cast in planner.rs with field-aware lowering using the logical target FieldRef.
Removes the previous planner restriction that rejected metadata-bearing cast target fields purely because the lowering path only supported type-only casts.
Adds cast_with_target_field_and_options so physical cast construction can preserve target field metadata and nullability semantics.
Optimizes same-type casts so they are still elided when there is no semantic difference, while preserving an explicit CastExpr when the target field carries distinct field semantics such as name, metadata, or nullability.
Updates CastExpr::nullable() so runtime nullability follows the child expression, while logical target field nullability remains exposed through return_field().
Updates the schema rewriter to build field-aware CastExpr instances instead of CastColumnExpr, including name-based physical column resolution when indexes differ.
Adds regression coverage to ensure planner- and adapter-produced casts behave consistently for standard casts, metadata-bearing casts, same-type casts, struct casts, and mismatched column index scenarios.

Are these changes tested?

Yes.

This PR adds and updates unit tests in the affected areas to cover:

planner lowering preserving target field metadata
planner lowering preserving logical target field semantics for nullability and field name
same-type casts being preserved when the explicit target field differs semantically
default same-type casts still being elided
standard non-metadata casts continuing to validate and lower correctly
schema rewriter use of field-aware CastExpr
regression coverage for resolving the correct physical column by name rather than relying on a stale column index
struct cast compatibility and field-aware cast expectations

Are there any user-facing changes?

There are no intended user-facing API changes, but physical cast expressions now preserve logical field metadata and nullability intent more faithfully during lowering. This should improve correctness and consistency for queries that depend on field metadata or logical cast semantics.

LLM-generated code disclosure

This PR includes LLM-generated code and comments. All LLM-generated content has been manually reviewed and tested.

Update cast handling in planner.rs to retain logical target FieldRef metadata during the cast lowering process. Introduced a new field-aware helper in cast.rs for this purpose, which ensures that metadata, name, and nullability intent are kept intact, even for same-type casts. Updated planner tests to verify: - Preservation of metadata and nullability in lowered casts - CastExpr production for same-type casts with different field semantics - Regression checks for standard non-metadata casts - Enhanced CAST behavior to retain extension metadata while ensuring TRY_CAST still rejects it.

Change `cast_with_target_field_and_options` to crate-internal by modifying its visibility. Remove public re-export of the planner-only helper while retaining planner access via the internal module path. Make the `cast` module available within the crate for internal usage.

Make the cast module private again by changing its visibility to module only. Provide crate-internal access to the helper function from expressions::mod, and update the planner to use this internal entry point instead of direct module exposure.

Simplify is_valid_cast logic to compute a single decision. Return a single not_impl_err! for invalid cast cases. Construct CastExpr::new_with_target_field(...) once on the successful path, enhancing code clarity and reducing duplication.

Rename the test from `test_cast_to_extension_type` to `test_cast_preserves_extension_metadata` to better reflect its purpose. Replace the arrow.uuid fixture with a neutral extension name "datafusion.test.int64_extension". Update the assertion to check for the new extension name, ensuring that the test remains focused on verifying that cast lowering preserves target field metadata while surfacing extension metadata in the TryCast error path.

Eliminate unnecessary DataType clone and simplify the cast-validity branches into a single boolean expression. Ensure that the behavior and public surface remain unchanged while improving code efficiency.

Restore no-op cast canonicalization in cast_with_target_field_and_options(...). Update CastExpr::nullable() to align with runtime nullability, while maintaining logical target field semantics via return_field(). Adjust tests to ensure type-changing casts preserve logical target field semantics, and validate cast nullability against runtime semantics.

…uct casting validation

Refine cast handling in both cast.rs and schema_rewriter.rs. Explicit same-type casts now preserve CastExpr semantics, while default type-only casts are elided. Update planner tests to properly distinguish between the two and ensure consistent unified behavior across adapters. Added low-level tests for preserved and elided same-type cases for better coverage.

kosiew added 6 commits March 10, 2026 11:18

Refactor cast_with_target_field_and_options

90922d9

Simplify is_valid_cast logic to compute a single decision. Return a single not_impl_err! for invalid cast cases. Construct CastExpr::new_with_target_field(...) once on the successful path, enhancing code clarity and reducing duplication.

Refactor cast-target-field path in cast.rs

59d3eae

Eliminate unnecessary DataType clone and simplify the cast-validity branches into a single boolean expression. Ensure that the behavior and public surface remain unchanged while improving code efficiency.

github-actions bot added the physical-expr Changes to the physical-expr crates label Mar 10, 2026

kosiew added 4 commits March 10, 2026 11:35

cargo fmt

9aeb678

Clarify nullability handling in CastExpr and improve comments for str…

a878670

…uct casting validation

kosiew changed the title ~~Preserve target field metadata and nullability when lowering logical CASTs~~ Preserve logical field semantics during physical cast lowering Mar 10, 2026

cargo fmt

c7c5d80

kosiew force-pushed the cast-02-20164 branch from 3fa7858 to 40874f9 Compare March 10, 2026 06:50

cargo clippy fix

df5486c

kosiew force-pushed the cast-02-20164 branch from 1058b83 to df5486c Compare March 10, 2026 07:22

kosiew changed the title ~~Preserve logical field semantics during physical cast lowering~~ Preserve logical cast field semantics during physical lowering and schema rewrite Mar 10, 2026

kosiew marked this pull request as ready for review March 10, 2026 09:10

kosiew requested a review from adriangb March 13, 2026 03:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Preserve logical cast field semantics during physical lowering and schema rewrite#20836

Preserve logical cast field semantics during physical lowering and schema rewrite#20836
kosiew wants to merge 12 commits intoapache:mainfrom
kosiew:cast-02-20164

kosiew commented Mar 10, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

kosiew commented Mar 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

LLM-generated code disclosure

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

kosiew commented Mar 10, 2026 •

edited

Loading