Skip to content

Preserve logical cast field semantics during physical lowering and schema rewrite#20836

Open
kosiew wants to merge 12 commits intoapache:mainfrom
kosiew:cast-02-20164
Open

Preserve logical cast field semantics during physical lowering and schema rewrite#20836
kosiew wants to merge 12 commits intoapache:mainfrom
kosiew:cast-02-20164

Conversation

@kosiew
Copy link
Contributor

@kosiew kosiew commented Mar 10, 2026

Which issue does this PR close?

Rationale for this change

Physical lowering in planner.rs was dropping logical cast field semantics by forwarding only the target DataType when building physical cast expressions. That meant metadata-bearing target fields were rejected or reduced to type-only behavior, and planner-produced casts could diverge from adapter-produced casts.

This patch fixes that gap by preserving the logical target Field through lowering and related rewrite paths. It also aligns cast behavior across the planner and schema rewriter, so field metadata and logical nullability intent remain available in the produced physical expression.

What changes are included in this PR?

This PR updates physical cast construction to be field-aware end to end.

  • Replaces type-only cast lowering for Expr::Cast in planner.rs with field-aware lowering using the logical target FieldRef.
  • Removes the previous planner restriction that rejected metadata-bearing cast target fields purely because the lowering path only supported type-only casts.
  • Adds cast_with_target_field_and_options so physical cast construction can preserve target field metadata and nullability semantics.
  • Optimizes same-type casts so they are still elided when there is no semantic difference, while preserving an explicit CastExpr when the target field carries distinct field semantics such as name, metadata, or nullability.
  • Updates CastExpr::nullable() so runtime nullability follows the child expression, while logical target field nullability remains exposed through return_field().
  • Updates the schema rewriter to build field-aware CastExpr instances instead of CastColumnExpr, including name-based physical column resolution when indexes differ.
  • Adds regression coverage to ensure planner- and adapter-produced casts behave consistently for standard casts, metadata-bearing casts, same-type casts, struct casts, and mismatched column index scenarios.

Are these changes tested?

Yes.

This PR adds and updates unit tests in the affected areas to cover:

  • planner lowering preserving target field metadata
  • planner lowering preserving logical target field semantics for nullability and field name
  • same-type casts being preserved when the explicit target field differs semantically
  • default same-type casts still being elided
  • standard non-metadata casts continuing to validate and lower correctly
  • schema rewriter use of field-aware CastExpr
  • regression coverage for resolving the correct physical column by name rather than relying on a stale column index
  • struct cast compatibility and field-aware cast expectations

Are there any user-facing changes?

There are no intended user-facing API changes, but physical cast expressions now preserve logical field metadata and nullability intent more faithfully during lowering. This should improve correctness and consistency for queries that depend on field metadata or logical cast semantics.

LLM-generated code disclosure

This PR includes LLM-generated code and comments. All LLM-generated content has been manually reviewed and tested.

kosiew added 6 commits March 10, 2026 11:18
Update cast handling in planner.rs to retain logical target
FieldRef metadata during the cast lowering process. Introduced
a new field-aware helper in cast.rs for this purpose, which
ensures that metadata, name, and nullability intent are kept
intact, even for same-type casts.

Updated planner tests to verify:
- Preservation of metadata and nullability in lowered casts
- CastExpr production for same-type casts with different
  field semantics
- Regression checks for standard non-metadata casts
- Enhanced CAST behavior to retain extension metadata while
  ensuring TRY_CAST still rejects it.
Change `cast_with_target_field_and_options` to crate-internal
by modifying its visibility. Remove public re-export of the
planner-only helper while retaining planner access via the
internal module path. Make the `cast` module available
within the crate for internal usage.
Make the cast module private again by changing its
visibility to module only. Provide crate-internal access
to the helper function from expressions::mod, and update
the planner to use this internal entry point instead
of direct module exposure.
Simplify is_valid_cast logic to compute a single decision.
Return a single not_impl_err! for invalid cast cases.
Construct CastExpr::new_with_target_field(...) once
on the successful path, enhancing code clarity and
reducing duplication.
Rename the test from `test_cast_to_extension_type` to
`test_cast_preserves_extension_metadata` to better reflect its
purpose. Replace the arrow.uuid fixture with a neutral extension
name "datafusion.test.int64_extension". Update the assertion to
check for the new extension name, ensuring that the test remains
focused on verifying that cast lowering preserves target field
metadata while surfacing extension metadata in the TryCast error
path.
Eliminate unnecessary DataType clone and simplify the cast-validity
branches into a single boolean expression. Ensure that the behavior
and public surface remain unchanged while improving code efficiency.
@github-actions github-actions bot added the physical-expr Changes to the physical-expr crates label Mar 10, 2026
kosiew added 4 commits March 10, 2026 11:35
Restore no-op cast canonicalization in
cast_with_target_field_and_options(...). Update
CastExpr::nullable() to align with runtime nullability,
while maintaining logical target field semantics via
return_field(). Adjust tests to ensure type-changing
casts preserve logical target field semantics,
and validate cast nullability against runtime semantics.
Refine cast handling in both cast.rs and schema_rewriter.rs.
Explicit same-type casts now preserve CastExpr semantics, while
default type-only casts are elided. Update planner tests to
properly distinguish between the two and ensure consistent
unified behavior across adapters. Added low-level tests for
preserved and elided same-type cases for better coverage.
@kosiew kosiew changed the title Preserve target field metadata and nullability when lowering logical CASTs Preserve logical field semantics during physical cast lowering Mar 10, 2026
@kosiew kosiew changed the title Preserve logical field semantics during physical cast lowering Preserve logical cast field semantics during physical lowering and schema rewrite Mar 10, 2026
@kosiew kosiew marked this pull request as ready for review March 10, 2026 09:10
@kosiew kosiew requested a review from adriangb March 13, 2026 03:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

physical-expr Changes to the physical-expr crates

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant