Is your feature request related to a problem or challenge?
Working on #18319 I noticed that date_trunc defines its own DateTruncGranularity enum.
|
enum DateTruncGranularity { |
|
Microsecond, |
|
Millisecond, |
|
Second, |
|
Minute, |
|
Hour, |
|
Day, |
|
Week, |
|
Month, |
|
Quarter, |
|
Year, |
|
} |
date_part solves the same parsing problem differently — it delegates to arrow_cast::parse::IntervalUnit::from_str:
|
IntervalUnit::Microsecond => seconds_as_i32(array.as_ref(), Microsecond)?, |
|
IntervalUnit::Nanosecond => seconds_ns(array.as_ref())?, |
|
// century and decade are not supported by `DatePart`, although they are supported in postgres |
|
_ => return exec_err!("Date part '{part}' not supported"), |
|
} |
|
} else { |
|
// special cases that can be extracted (in postgres) but are not interval units |
|
match part_trim.to_lowercase().as_str() { |
|
"isoyear" => date_part(array.as_ref(), DatePart::YearISO)?, |
|
"qtr" | "quarter" => date_part(array.as_ref(), DatePart::Quarter)?, |
|
"doy" => date_part(array.as_ref(), DatePart::DayOfYear)?, |
|
"dow" => date_part(array.as_ref(), DatePart::DayOfWeekSunday0)?, |
|
"isodow" => date_part(array.as_ref(), DatePart::DayOfWeekMonday0)?, |
|
"epoch" => epoch(array.as_ref())?, |
|
_ => return exec_err!("Date part '{part}' not supported"), |
This leads to a user-visible inconsistency between the two functions. For example, date_part accepts "mon" | "mons" | "month" | "months", while date_trunc only accepts "month".
IntervalUnit, however, is the wrong enum for either of these functions — it exists to parse Postgres INTERVAL literals (INTERVAL '1 year 2 months'), which is why it has Century / Decade but no Quarter (Postgres rejects INTERVAL '1 QUARTER'). The right counterpart is arrow_arith::temporal::DatePart, which represents the date-part field namespace, includes Quarter, and is #[non_exhaustive]:
https://github.com/apache/arrow-rs/blob/8091f3f17b2de355f7c47e7a0907000d308f8f3e/arrow-arith/src/temporal.rs#L38-L79
Describe the solution you'd like
I propose:
- Add a
FromStr impl for arrow_arith::DatePart (with the Postgres alias map currently inside IntervalUnit::from_str plus the date-part-only field names).
- Migrate
date_part, date_trunc, and the Spark variants (date_trunc, trunc, time_trunc, date_part) to the new parser. The unparser sites in sql/src/unparser/utils.rs can also use it as an input canonicalizer.
Describe alternatives you've considered
No response
Additional context
No response
Is your feature request related to a problem or challenge?
Working on #18319 I noticed that
date_truncdefines its ownDateTruncGranularityenum.datafusion/datafusion/functions/src/datetime/date_trunc.rs
Lines 56 to 67 in 2c7af17
date_partsolves the same parsing problem differently — it delegates toarrow_cast::parse::IntervalUnit::from_str:datafusion/datafusion/functions/src/datetime/date_part.rs
Lines 230 to 244 in 2c7af17
This leads to a user-visible inconsistency between the two functions. For example,
date_partaccepts "mon" | "mons" | "month" | "months", whiledate_trunconly accepts "month".IntervalUnit, however, is the wrong enum for either of these functions — it exists to parse Postgres INTERVAL literals (INTERVAL '1 year 2 months'), which is why it hasCentury/Decadebut noQuarter(Postgres rejects INTERVAL '1 QUARTER'). The right counterpart isarrow_arith::temporal::DatePart, which represents the date-part field namespace, includesQuarter, and is #[non_exhaustive]:https://github.com/apache/arrow-rs/blob/8091f3f17b2de355f7c47e7a0907000d308f8f3e/arrow-arith/src/temporal.rs#L38-L79
Describe the solution you'd like
I propose:
FromStrimpl forarrow_arith::DatePart(with the Postgres alias map currently inside IntervalUnit::from_str plus the date-part-only field names).date_part,date_trunc, and the Spark variants (date_trunc,trunc,time_trunc,date_part) to the new parser. The unparser sites in sql/src/unparser/utils.rs can also use it as an input canonicalizer.Describe alternatives you've considered
No response
Additional context
No response