feat: Add Spark-compatible monthname function to datafusion-spark#21639
feat: Add Spark-compatible monthname function to datafusion-spark#21639JeelRajodiya wants to merge 6 commits intoapache:mainfrom
monthname function to datafusion-spark#21639Conversation
Implements `monthname(date_or_timestamp)` that returns the three-letter abbreviated month name (Jan, Feb, ..., Dec) from a date or timestamp, matching Apache Spark's behavior.
d13c56b to
a4228b9
Compare
d32a486 to
e066989
Compare
Jefffrey
left a comment
There was a problem hiding this comment.
Should be good to go once CI is green
|
I've fixed the CI errors |
| impl SparkMonthName { | ||
| pub fn new() -> Self { | ||
| Self { | ||
| signature: Signature::exact(vec![DataType::Date32], Volatility::Immutable), |
There was a problem hiding this comment.
Spark supports input types TIMESTAMP, TIMESTAMP_NTZ, DATE. Perhaps DataFusion will coerce the equivalent types, or should explicit support be added here?
There was a problem hiding this comment.
Sure, I've added explicit support for TIMESTAMP, TIMESTAMP_NTZ, DATE. It was needed.
| # Error: wrong argument type (string without cast) | ||
| statement error Failed to coerce arguments to satisfy a call to 'monthname' function | ||
| SELECT monthname('not-a-date'); |
There was a problem hiding this comment.
I think Spark returns NULL in this case if ANSI mode is disabled, which is the default prior to Spark 4.
There was a problem hiding this comment.
monthname was added in spark 4.0 (source code).
Should we be returning NULL in such cases? if yes, I'll add the ANSI mode flag so we return NULL.
@ExpressionDescription(
usage = "_FUNC_(date) - Returns the three-letter abbreviated month name from the given date.",
examples = """
Examples:
> SELECT _FUNC_('2008-02-20');
Feb
""",
group = "datetime_funcs",
since = "4.0.0")
case class MonthName(child: Expression) extends GetDateField with DefaultStringProducingExpression {
override val func = DateTimeUtils.getMonthName
override val funcName = "getMonthName"
override protected def withNewChildInternal(newChild: Expression): MonthName =
copy(child = newChild)
}
|
|
||
| # Scalar date input | ||
| query T | ||
| SELECT monthname('2024-03-15'::DATE); |
There was a problem hiding this comment.
Could you also add tests with timestamp input to show that the coercion works as expected
Rationale
The
datafusion-sparkcrate is missing themonthnamefunction. Spark'smonthname(date)returns the three-letter abbreviated month name (Jan, Feb, ..., Dec) from a date or timestamp — commonly used in Spark SQL workloads.What changes are included in this PR?
Adds
SparkMonthNametodatafusion-spark's datetime functions. It usesarrow::compute::date_part(DatePart::Month)to extract the month number and maps it to the abbreviated name. The signature accepts Timestamp types with automatic coercion from Date32/Date64.Are these changes tested?
Yes — 6 unit tests covering scalar dates, array dates with nulls, null scalars, timestamp microseconds, all 12 months, and return field nullability.
Are there any user-facing changes?
New
monthnamescalar function available when usingdatafusion-spark.