New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-37568][SQL] Support 2-arguments by the convert_timezone() function #34896
Conversation
Kubernetes integration test starting |
Kubernetes integration test status failure |
Test build #146184 has finished for PR 34896 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please, make ConvertTimezone
as a time zone aware expression.
@@ -3011,13 +3011,17 @@ object SubtractDates { | |||
arguments = """ | |||
Arguments: | |||
* sourceTz - the time zone for the input timestamp | |||
- if not specified, SESSION_LOCAL_TIMEZONE is used when sourceTs is timestamp_ntz, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SESSION_LOCAL_TIMEZONE
is not valid for end users. Point out the SQL config spark.sql.session.timeZone
or better SESSION_LOCAL_TIMEZONE.key
@@ -3028,6 +3032,15 @@ case class ConvertTimezone( | |||
sourceTs: Expression) | |||
extends TernaryExpression with ImplicitCastInputTypes with NullIntolerant { | |||
|
|||
def this(targetTz: Expression, sourceTs: Expression) = { | |||
this( | |||
Literal(SQLConf.get.sessionLocalTimeZone), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please, make the expression as time zone aware via extending TimeZoneAwareExpression
.
Literal(SQLConf.get.sessionLocalTimeZone), | ||
targetTz, | ||
if (sourceTs.dataType == TimestampNTZType) sourceTs | ||
else Cast(sourceTs, TimestampNTZType) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How other types can appear if this below
override def inputTypes: Seq[AbstractDataType] = Seq(StringType, StringType, TimestampNTZType)
accepts only TimestampNTZType
. Could add an end-to-end tests to timestamp-ltz.sql
like
https://github.com/apache/spark/blob/master/sql/core/src/test/resources/sql-tests/inputs/timestamp-ntz.sql#L19
Can one of the admins verify this patch? |
@MaxGekk Let me confirm how to extends |
case class ConvertTimezone(
sourceTz: Expression,
targetTz: Expression,
sourceTs: Expression,
timeZoneId: Option[String] = None)
extends TernaryExpression
with ImplicitCastInputTypes
with NullIntolerant
with TimeZoneAwareExpression {
override def withTimeZone(timeZoneId: String): TimeZoneAwareExpression = {
copy(timeZoneId = Option(timeZoneId))
}
def this(sourceTz: Expression, targetTz: Expression, sourceTs: Expression) =
this(sourceTz, targetTz, sourceTs, None)
def this(targetTz: Expression, sourceTs: Expression) =
this(Literal("UTC", StringType), targetTz, sourceTs) The source time zone in the second constructor doesn't matter since you are not going to use it. |
@MaxGekk
Please let me know if there is something I missed to use the mix-in. |
@MaxGekk
I think there is small benefit for mixing |
…tion Add new constructor to the `ConvertTimezone` expression (see apache#34817) which accepts only 2 arguments: 1. `<targetTz>` - the time zone to which the input timestamp should be converted. 2. `<sourceTs>` - the timestamp to convert. and sets `<sourceTz>` to the current session time zone (see the SQL config `spark.sql.session.timeZone`). Closes apache#34896 To help users in migrations from other systems to Spark SQL. Other systems support optional first parameter: - https://docs.aws.amazon.com/redshift/latest/dg/CONVERT_TIMEZONE.html - https://docs.snowflake.com/en/sql-reference/functions/convert_timezone.html No, it extends the existing signature, and the function hasn't been released yet. By running new tests: ``` $ build/sbt "sql/test:testOnly org.apache.spark.sql.expressions.ExpressionInfoSuite" $ build/sbt "sql/testOnly org.apache.spark.sql.SQLQueryTestSuite -- -z timestamp-ltz.sql" $ build/sbt "sql/testOnly org.apache.spark.sql.SQLQueryTestSuite -- -z timestamp-ntz.sql" ``` Closes apache#35951 from MaxGekk/convert_timezone-2-params. Authored-by: Max Gekk <max.gekk@gmail.com> Signed-off-by: Max Gekk <max.gekk@gmail.com> (cherry picked from commit 8eb8a42) Signed-off-by: Max Gekk <max.gekk@gmail.com>
What changes were proposed in this pull request?
Support 2-arguments by the convert_timezone() function.
If users omit sourceTz, spark takes it from sourceTimestamp or SESSION_LOCAL_TIMEZONE.
Why are the changes needed?
Users can call this API with fewer arguments. The same function in Snowflake supports 2 arguments.
Does this PR introduce any user-facing change?
Yes, users can call this function with 2 arguments.
How was this patch tested?
unit tests