Skip to content

Unify granularity / field-name parsing across temporal functions #22033

@sdf-jkl

Description

@sdf-jkl

Is your feature request related to a problem or challenge?

Working on #18319 I noticed that date_trunc defines its own DateTruncGranularity enum.

enum DateTruncGranularity {
Microsecond,
Millisecond,
Second,
Minute,
Hour,
Day,
Week,
Month,
Quarter,
Year,
}

date_part solves the same parsing problem differently — it delegates to arrow_cast::parse::IntervalUnit::from_str:

IntervalUnit::Microsecond => seconds_as_i32(array.as_ref(), Microsecond)?,
IntervalUnit::Nanosecond => seconds_ns(array.as_ref())?,
// century and decade are not supported by `DatePart`, although they are supported in postgres
_ => return exec_err!("Date part '{part}' not supported"),
}
} else {
// special cases that can be extracted (in postgres) but are not interval units
match part_trim.to_lowercase().as_str() {
"isoyear" => date_part(array.as_ref(), DatePart::YearISO)?,
"qtr" | "quarter" => date_part(array.as_ref(), DatePart::Quarter)?,
"doy" => date_part(array.as_ref(), DatePart::DayOfYear)?,
"dow" => date_part(array.as_ref(), DatePart::DayOfWeekSunday0)?,
"isodow" => date_part(array.as_ref(), DatePart::DayOfWeekMonday0)?,
"epoch" => epoch(array.as_ref())?,
_ => return exec_err!("Date part '{part}' not supported"),

This leads to a user-visible inconsistency between the two functions. For example, date_part accepts "mon" | "mons" | "month" | "months", while date_trunc only accepts "month".

IntervalUnit, however, is the wrong enum for either of these functions — it exists to parse Postgres INTERVAL literals (INTERVAL '1 year 2 months'), which is why it has Century / Decade but no Quarter (Postgres rejects INTERVAL '1 QUARTER'). The right counterpart is arrow_arith::temporal::DatePart, which represents the date-part field namespace, includes Quarter, and is #[non_exhaustive]:
https://github.com/apache/arrow-rs/blob/8091f3f17b2de355f7c47e7a0907000d308f8f3e/arrow-arith/src/temporal.rs#L38-L79

Describe the solution you'd like

I propose:

  • Add a FromStr impl for arrow_arith::DatePart (with the Postgres alias map currently inside IntervalUnit::from_str plus the date-part-only field names).
  • Migrate date_part, date_trunc, and the Spark variants (date_trunc, trunc, time_trunc, date_part) to the new parser. The unparser sites in sql/src/unparser/utils.rs can also use it as an input canonicalizer.

Describe alternatives you've considered

No response

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request
    No fields configured for Feature.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions