You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem or challenge?
DataFusion currently has (at least) three overlapping implementations of "cast a scalar / literal to a target type", with different contracts and duplicated per-type logic:
try_cast_literal_to_type — datafusion/expr-common/src/casts.rs. An array-free, hand-rolled cast used by the unwrap-cast optimizations. Returns Option, with its own per-type helpers : it only performs value-preserving casts and returns None for out-of-range numeric, precision-losing decimal, lossy date↔timestamp, timestamp→string, and string→numeric conversions. These restrictions are load-bearing for optimizer correctness (you may only unwrap CAST(col AS T) = lit when lit converts back to its original type exactly).
Because the logic is duplicated, casting behavior and optimizations have to be implemented in multiple places, and the implementations can (and do) diverge in subtle, correctness-relevant ways.
Describe the solution you'd like
Consolidate the cast implementations so there is a single place to implement and optimize scalar casting.
My goal is to be able to add a non-copying version of cast_to: today cast_to(&self) forces a clone/allocation even on the fast paths, and try_cast_literal_to_type re-allocates strings via to_string().
First step (this issue): consolidate the implementations, without changing behavior.
Make ScalarValue::cast_to the single canonical scalar-cast implementation
Re-express try_cast_literal_to_type in terms of cast_to,
Fold cast_literal_to_type_with_op into the same path where possible.
Once consolidated, a follow-up can add the non-copying cast variant and optimize it in that one place if possible
Is your feature request related to a problem or challenge?
DataFusion currently has (at least) three overlapping implementations of "cast a scalar / literal to a target type", with different contracts and duplicated per-type logic:
ScalarValue::cast_to/cast_to_with_options—datafusion/common/src/scalar/mod.rs. (Recently gained array-free fast paths for identity and string↔string casts in perf: array-free fast paths forScalarValue::cast_to#22576.)try_cast_literal_to_type—datafusion/expr-common/src/casts.rs. An array-free, hand-rolled cast used by the unwrap-cast optimizations. ReturnsOption, with its own per-type helpers : it only performs value-preserving casts and returnsNonefor out-of-range numeric, precision-losing decimal, lossy date↔timestamp, timestamp→string, and string→numeric conversions. These restrictions are load-bearing for optimizer correctness (you may only unwrapCAST(col AS T) = litwhenlitconverts back to its original type exactly).cast_literal_to_type_with_op—datafusion/optimizer/src/simplify_expressions/unwrap_cast.rs. Yet another special case (forUtf8 = Int-style comparisons), implemented viacast_toplus a manual round-trip check.Because the logic is duplicated, casting behavior and optimizations have to be implemented in multiple places, and the implementations can (and do) diverge in subtle, correctness-relevant ways.
Describe the solution you'd like
Consolidate the cast implementations so there is a single place to implement and optimize scalar casting.
My goal is to be able to add a non-copying version of
cast_to: todaycast_to(&self)forces a clone/allocation even on the fast paths, andtry_cast_literal_to_typere-allocates strings viato_string().First step (this issue): consolidate the implementations, without changing behavior.
ScalarValue::cast_tothe single canonical scalar-cast implementationtry_cast_literal_to_typein terms ofcast_to,cast_literal_to_type_with_opinto the same path where possible.Once consolidated, a follow-up can add the non-copying cast variant and optimize it in that one place if possible
Describe alternatives you've considered
try_cast_literal_to_typedirectly (explored in perf: avoid redundant string allocation in LIKE prefix pruning (follow-up to #22562) #22574). This works, but optimizes only one of the three implementations and leaves the duplication in place.Additional context
'prefix%'pruning), which surfaced a redundant string allocation in the literal-cast path.ScalarValue::cast_to#22576 added the identity / string↔string array-free fast paths tocast_to, and made thecast_round_triptest cross-checkcast_toagainst the arrow cast kernel.