Skip to content

Align median and percentile_cont implementations (preserve Decimal in percentile_cont, alias median) #22042

@CuteChuanChuan

Description

@CuteChuanChuan

Is your feature request related to a problem or challenge?

In #21988, we fixed median to return Float64 for integer inputs (avoiding truncation) while preserving Decimal and floating-point types as-is. This means after #21988:

  • median(int_col)Float64
  • median(float_col)Float32/Float64 (preserved)
  • median(decimal_col)Decimal* (preserved)

However, percentile_cont — which is functionally equivalent to median when the percentile is 0.5 — casts all Numeric inputs (including Decimal) to Float64:
https://github.com/apache/datafusion/blob/main/datafusion/functions-aggregate/src/percentile_cont.rs#L143-L157

So median(decimal_col) and percentile_cont(0.5) WITHIN GROUP (ORDER BY decimal_col) return different types for the same input — Decimal* vs Float64. As @Jefffrey noted in #21988, this divergence between two essentially identical functions is surprising.

Describe the solution you'd like

Per @alamb's suggestion in #21988 (comment), align in two phases:

Phase 1 — extend percentile_cont to preserve Decimal (and Float32)

Update PercentileCont::new so the value-argument coercion accepts Decimal and floating-point types directly, matching median's post-#21988 behavior. The percentile argument (the 0.5) keeps its current Float64 coercion.

After this change:

  • percentile_cont(0.5) WITHIN GROUP (ORDER BY decimal_col)Decimal* (preserved)
  • percentile_cont(0.5) WITHIN GROUP (ORDER BY int_col)Float64
  • percentile_cont(0.5) WITHIN GROUP (ORDER BY float_col)Float32/Float64 (preserved)

Phase 2 — fold median into percentile_cont as an alias

Once the signatures and return types align, remove the standalone Median implementation and register median as an alias of percentile_cont with 0.5 as the percentile.

Describe alternatives you've considered

Make median match percentile_cont's current Float64-everywhere behavior — discussed in #21988 and not chosen because casting Decimal to Float64 discards the exact-precision guarantee users opt into when picking Decimal.

Additional context

Metadata

Metadata

Labels

enhancementNew feature or request
No fields configured for Feature.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions