Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support to unparse ScalarValue::IntervalMonthDayNano to String #10956

Merged
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 7 additions & 1 deletion datafusion/expr/src/expr_fn.rs
Original file line number Diff line number Diff line change
Expand Up @@ -31,8 +31,9 @@ use crate::{
Signature, Volatility,
};
use crate::{AggregateUDFImpl, ColumnarValue, ScalarUDFImpl, WindowUDF, WindowUDFImpl};
use arrow::compute::kernels::cast_utils::parse_interval_month_day_nano;
use arrow::datatypes::{DataType, Field};
use datafusion_common::{Column, Result};
use datafusion_common::{Column, Result, ScalarValue};
use std::any::Any;
use std::fmt::Debug;
use std::ops::Not;
Expand Down Expand Up @@ -670,6 +671,11 @@ impl WindowUDFImpl for SimpleWindowUDF {
}
}

pub fn interval_month_day_nano_lit(value: &str) -> Expr {
let interval = parse_interval_month_day_nano(value).ok();
Expr::Literal(ScalarValue::IntervalMonthDayNano(interval))
}

#[cfg(test)]
mod test {
use super::*;
Expand Down
125 changes: 120 additions & 5 deletions datafusion/sql/src/unparser/expr.rs
Original file line number Diff line number Diff line change
Expand Up @@ -20,8 +20,9 @@ use std::{fmt::Display, vec};

use arrow_array::{Date32Array, Date64Array};
use arrow_schema::DataType;
use sqlparser::ast::Value::SingleQuotedString;
use sqlparser::ast::{
self, Expr as AstExpr, Function, FunctionArg, Ident, UnaryOperator,
self, Expr as AstExpr, Function, FunctionArg, Ident, Interval, UnaryOperator,
};

use datafusion_common::{
Expand Down Expand Up @@ -825,8 +826,26 @@ impl Unparser<'_> {
not_impl_err!("Unsupported scalar: {v:?}")
}
ScalarValue::IntervalDayTime(None) => Ok(ast::Expr::Value(ast::Value::Null)),
ScalarValue::IntervalMonthDayNano(Some(_i)) => {
not_impl_err!("Unsupported scalar: {v:?}")
ScalarValue::IntervalMonthDayNano(Some(i)) => {
let mut s = vec![];
if i.months != 0 {
s.push(format!("{} MONTH", i.months));
}
if i.days != 0 {
s.push(format!("{} DAY", i.days));
}
if i.nanoseconds != 0 {
s.push(Self::process_interval_nanosecond(i.nanoseconds));
}

let interval = Interval {
value: Box::new(ast::Expr::Value(SingleQuotedString(s.join(" ")))),
leading_field: None,
leading_precision: None,
last_field: None,
fractional_seconds_precision: None,
};
Ok(ast::Expr::Interval(interval))
}
ScalarValue::IntervalMonthDayNano(None) => {
Ok(ast::Expr::Value(ast::Value::Null))
Expand Down Expand Up @@ -859,6 +878,35 @@ impl Unparser<'_> {
}
}

fn process_interval_nanosecond(nano: i64) -> String {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @alamb. It looks great. I'll address this today. :)

Copy link
Contributor Author

@goldmedal goldmedal Jun 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alamb After using ArrayFormatter, I noticed that the results contain many redundant words. For example:

input expr: interval_month_day_nano_lit("-3 MONTH"),
output result: r#"INTERVAL '0 YEARS -3 MONS 0 DAYS 0 HOURS 0 MINS 0.000000000 SECS'"#,
---
input expr: interval_month_day_nano_lit("1 YEAR 1 MONTH 1 DAY 3 HOUR 10 MINUTE 20 SECOND"),
output result: r#"INTERVAL '0 YEARS 13 MONS 1 DAYS 3 HOURS 10 MINS 20.000000000 SECS'"#,

I think both of these are valid SQL. They just don't look very smart, but it's okay with me.

I implemented the code like

            ScalarValue::IntervalMonthDayNano(Some(_i)) => {
                let wrap_array = v.to_array()?;
                let Some(result) = array_value_to_string(&wrap_array, 0).ok() else {
                    return internal_err!("Unable to convert IntervalMonthDayNano to string");
                };
                let interval = Interval {
                    value: Box::new(ast::Expr::Value(SingleQuotedString(result.to_uppercase()))),
                    leading_field: None,
                    leading_precision: None,
                    last_field: None,
                    fractional_seconds_precision: None,
                };
                Ok(ast::Expr::Interval(interval))
            }

What do you think? Does it make sense?

Copy link
Contributor Author

@goldmedal goldmedal Jun 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm... OK. I think it would be an enhancement for the ArrowFormatter. If we want to make it smarter, I think we should modify the behavior in the arrow-cast crate.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm... OK. I think it would be an enhancement for the ArrowFormatter. If we want to make it smarter, I think we should modify the behavior in the arrow-cast crate.

I agree -- I filed apache/arrow-rs#5914 to track this suggestion

let mut s = vec![];
let hour = nano / 3_600_000_000_000;
let minute = nano / 60_000_000_000 % 60;
let second = nano / 1_000_000_000 % 60;
let millisecond = nano / 1_000_000 % 1_000;
let microsecond = nano / 1_000 % 1_000;
let nanosecond = nano % 1_000;
if hour != 0 {
s.push(format!("{} HOUR", hour));
}
if minute != 0 {
s.push(format!("{} MINUTE", minute));
}
if second != 0 {
s.push(format!("{} SECOND", second));
}
if millisecond != 0 {
s.push(format!("{} MILLISECOND", millisecond));
}
if microsecond != 0 {
s.push(format!("{} MICROSECOND", microsecond));
}
if nanosecond != 0 {
s.push(format!("{} NANOSECOND", nanosecond));
}
s.join(" ")
}

fn arrow_dtype_to_ast_dtype(&self, data_type: &DataType) -> Result<ast::DataType> {
match data_type {
DataType::Null => {
Expand Down Expand Up @@ -954,19 +1002,19 @@ impl Unparser<'_> {

#[cfg(test)]
mod tests {
use std::ops::{Add, Sub};
use std::{any::Any, sync::Arc, vec};

use arrow::datatypes::{Field, Schema};
use arrow_schema::DataType::Int8;

use datafusion_common::TableReference;
use datafusion_expr::AggregateExt;
use datafusion_expr::{
case, col, cube, exists, grouping_set, lit, not, not_exists, out_ref_col,
placeholder, rollup, table_scan, try_cast, when, wildcard, ColumnarValue,
ScalarUDF, ScalarUDFImpl, Signature, Volatility, WindowFrame,
WindowFunctionDefinition,
};
use datafusion_expr::{interval_month_day_nano_lit, AggregateExt};
use datafusion_functions_aggregate::count::count_udaf;
use datafusion_functions_aggregate::expr_fn::sum;

Expand Down Expand Up @@ -1256,6 +1304,73 @@ mod tests {
),
(col("need-quoted").eq(lit(1)), r#"("need-quoted" = 1)"#),
(col("need quoted").eq(lit(1)), r#"("need quoted" = 1)"#),
(
interval_month_day_nano_lit("3 NANOSECOND"),
r#"INTERVAL '3 NANOSECOND'"#,
),
(
interval_month_day_nano_lit("1000 NANOSECOND"),
r#"INTERVAL '1 MICROSECOND'"#,
),
(
interval_month_day_nano_lit("1000000 NANOSECOND"),
r#"INTERVAL '1 MILLISECOND'"#,
),
(
interval_month_day_nano_lit("1000000000 NANOSECOND"),
r#"INTERVAL '1 SECOND'"#,
),
(
interval_month_day_nano_lit("1001001001 NANOSECOND"),
r#"INTERVAL '1 SECOND 1 MILLISECOND 1 MICROSECOND 1 NANOSECOND'"#,
),
(
interval_month_day_nano_lit("3 SECOND"),
r#"INTERVAL '3 SECOND'"#,
),
(
interval_month_day_nano_lit("3 MINUTE"),
r#"INTERVAL '3 MINUTE'"#,
),
(
interval_month_day_nano_lit("3 HOUR"),
r#"INTERVAL '3 HOUR'"#,
),
(
interval_month_day_nano_lit("3 HOUR 10 MINUTE 20 SECOND"),
r#"INTERVAL '3 HOUR 10 MINUTE 20 SECOND'"#,
),
(interval_month_day_nano_lit("3 DAY"), r#"INTERVAL '3 DAY'"#),
(
interval_month_day_nano_lit("3 MONTH"),
r#"INTERVAL '3 MONTH'"#,
),
(
interval_month_day_nano_lit("1 MONTH 1 DAY 10 SECOND"),
r#"INTERVAL '1 MONTH 1 DAY 10 SECOND'"#,
),
(
interval_month_day_nano_lit("15 MONTH"),
r#"INTERVAL '15 MONTH'"#,
),
(
interval_month_day_nano_lit("1.5 MONTH"),
r#"INTERVAL '1 MONTH 15 DAY'"#,
),
(
interval_month_day_nano_lit("-3 MONTH"),
r#"INTERVAL '-3 MONTH'"#,
),
(
interval_month_day_nano_lit("1 MONTH")
.add(interval_month_day_nano_lit("1 DAY")),
r#"(INTERVAL '1 MONTH' + INTERVAL '1 DAY')"#,
),
(
interval_month_day_nano_lit("1 MONTH")
.sub(interval_month_day_nano_lit("1 DAY")),
r#"(INTERVAL '1 MONTH' - INTERVAL '1 DAY')"#,
),
];

for (expr, expected) in tests {
Expand Down