Skip to content

Consolidate ScalarValue cast implementations (towards a non-copying cast_to) #22577

@alamb

Description

@alamb

Is your feature request related to a problem or challenge?

DataFusion currently has (at least) three overlapping implementations of "cast a scalar / literal to a target type", with different contracts and duplicated per-type logic:

  1. ScalarValue::cast_to / cast_to_with_optionsdatafusion/common/src/scalar/mod.rs. (Recently gained array-free fast paths for identity and string↔string casts in perf: array-free fast paths for ScalarValue::cast_to #22576.)

  2. try_cast_literal_to_typedatafusion/expr-common/src/casts.rs. An array-free, hand-rolled cast used by the unwrap-cast optimizations. Returns Option, with its own per-type helpers : it only performs value-preserving casts and returns None for out-of-range numeric, precision-losing decimal, lossy date↔timestamp, timestamp→string, and string→numeric conversions. These restrictions are load-bearing for optimizer correctness (you may only unwrap CAST(col AS T) = lit when lit converts back to its original type exactly).

  3. cast_literal_to_type_with_opdatafusion/optimizer/src/simplify_expressions/unwrap_cast.rs. Yet another special case (for Utf8 = Int-style comparisons), implemented via cast_to plus a manual round-trip check.

Because the logic is duplicated, casting behavior and optimizations have to be implemented in multiple places, and the implementations can (and do) diverge in subtle, correctness-relevant ways.

Describe the solution you'd like

Consolidate the cast implementations so there is a single place to implement and optimize scalar casting.

My goal is to be able to add a non-copying version of cast_to: today cast_to(&self) forces a clone/allocation even on the fast paths, and try_cast_literal_to_type re-allocates strings via to_string().

First step (this issue): consolidate the implementations, without changing behavior.

  • Make ScalarValue::cast_to the single canonical scalar-cast implementation
  • Re-express try_cast_literal_to_type in terms of cast_to,
  • Fold cast_literal_to_type_with_op into the same path where possible.

Once consolidated, a follow-up can add the non-copying cast variant and optimize it in that one place if possible

Describe alternatives you've considered

Additional context

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions