Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-37568][SQL] Support 2-arguments by the convert_timezone() function #34896

Closed

Conversation

yoda-mon
Copy link
Contributor

What changes were proposed in this pull request?

Support 2-arguments by the convert_timezone() function.
If users omit sourceTz, spark takes it from sourceTimestamp or SESSION_LOCAL_TIMEZONE.

Why are the changes needed?

Users can call this API with fewer arguments. The same function in Snowflake supports 2 arguments.

Does this PR introduce any user-facing change?

Yes, users can call this function with 2 arguments.

How was this patch tested?

unit tests

@github-actions github-actions bot added the SQL label Dec 14, 2021
@SparkQA
Copy link

SparkQA commented Dec 14, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50657/

@SparkQA
Copy link

SparkQA commented Dec 14, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50657/

@SparkQA
Copy link

SparkQA commented Dec 14, 2021

Test build #146184 has finished for PR 34896 at commit a802c64.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon HyukjinKwon changed the title [SPARK-37568][SQL]Support 2-arguments by the convert_timezone() function [SPARK-37568][SQL] Support 2-arguments by the convert_timezone() function Dec 15, 2021
Copy link
Member

@MaxGekk MaxGekk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, make ConvertTimezone as a time zone aware expression.

@@ -3011,13 +3011,17 @@ object SubtractDates {
arguments = """
Arguments:
* sourceTz - the time zone for the input timestamp
- if not specified, SESSION_LOCAL_TIMEZONE is used when sourceTs is timestamp_ntz,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SESSION_LOCAL_TIMEZONE is not valid for end users. Point out the SQL config spark.sql.session.timeZone or better SESSION_LOCAL_TIMEZONE.key

@@ -3028,6 +3032,15 @@ case class ConvertTimezone(
sourceTs: Expression)
extends TernaryExpression with ImplicitCastInputTypes with NullIntolerant {

def this(targetTz: Expression, sourceTs: Expression) = {
this(
Literal(SQLConf.get.sessionLocalTimeZone),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, make the expression as time zone aware via extending TimeZoneAwareExpression.

Literal(SQLConf.get.sessionLocalTimeZone),
targetTz,
if (sourceTs.dataType == TimestampNTZType) sourceTs
else Cast(sourceTs, TimestampNTZType)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How other types can appear if this below

override def inputTypes: Seq[AbstractDataType] = Seq(StringType, StringType, TimestampNTZType)

accepts only TimestampNTZType. Could add an end-to-end tests to timestamp-ltz.sql like
https://github.com/apache/spark/blob/master/sql/core/src/test/resources/sql-tests/inputs/timestamp-ntz.sql#L19

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@yoda-mon
Copy link
Contributor Author

@MaxGekk Let me confirm how to extends TimeZoneAwareExpression.
At first I simply extends basic constructor which takes 3 arguments and overides timeZoneId and withTimeZone
https://github.com/yoda-mon/spark/pull/2/files
It seems to work fine and passed all tests, but I wonder if you want me to extends the constructor which takes 2 arguments, or to use withTimeZone in the constructor.
I took auxiliary constructor for case class, so I think it is difficult to implement both two ways. Should I switch to use companion object ?

@MaxGekk
Copy link
Member

MaxGekk commented Dec 27, 2021

@yoda-mon

  1. Extend TimeZoneAwareExpression and add one more arg timeZoneId to the case class
case class ConvertTimezone(
    sourceTz: Expression,
    targetTz: Expression,
    sourceTs: Expression,
    timeZoneId: Option[String] = None)
  extends TernaryExpression
  with ImplicitCastInputTypes
  with NullIntolerant
  with TimeZoneAwareExpression {
  1. Override withTimeZone()
  override def withTimeZone(timeZoneId: String): TimeZoneAwareExpression = {
    copy(timeZoneId = Option(timeZoneId))
  }
  1. Add 2 constructors that take 2 and 3 arguments
  def this(sourceTz: Expression, targetTz: Expression, sourceTs: Expression) =
    this(sourceTz, targetTz, sourceTs, None)
  def this(targetTz: Expression, sourceTs: Expression) =
    this(Literal("UTC", StringType), targetTz, sourceTs)

The source time zone in the second constructor doesn't matter since you are not going to use it.

@yoda-mon
Copy link
Contributor Author

yoda-mon commented Jan 7, 2022

@MaxGekk
Sorry for the delayed response, I tried to follow your advice.
However at

  def this(targetTz: Expression, sourceTs: Expression) =
    this(Literal("UTC"), targetTz, sourceTs)

sourceTz is not ignored. After the constructor I could use timeZoneId assourceTz that copied by the mix-in, but in that case, it's difficult to distinguish TimeStampLTZ and TimeStampNTZ.

Please let me know if there is something I missed to use the mix-in.

@yoda-mon
Copy link
Contributor Author

@MaxGekk
Gentle reminder,

  • Extending TimeZoneAwareExpression seems not so simple.
    • sourceTz is not ignored so I have to set appropriate timezone there.
    • At the constructor, timeZoneId by mix-in could not be used because it would be copied after that point.
    • At nullSafeEval or doGenCode I could use timeZoneId, but I have to check whether sourceTs is TimeStampNTZ or TimeStampLTZ so I have to pass it in some ways.

I think there is small benefit for mixing TimeZoneAwareExpression... Is the first idea is simple enough to add the option ?

@cloud-fan
Copy link
Contributor

@MaxGekk

@MaxGekk MaxGekk closed this in 8eb8a42 Mar 24, 2022
MaxGekk added a commit to MaxGekk/spark that referenced this pull request Mar 24, 2022
…tion

Add new constructor to the `ConvertTimezone` expression (see apache#34817) which accepts only 2 arguments:
1. `<targetTz>` - the time zone to which the input timestamp should be converted.
2. `<sourceTs>` - the timestamp to convert.

and sets `<sourceTz>` to the current session time zone (see the SQL config `spark.sql.session.timeZone`).

Closes apache#34896

To help users in migrations from other systems to Spark SQL. Other systems support optional first parameter:
- https://docs.aws.amazon.com/redshift/latest/dg/CONVERT_TIMEZONE.html
- https://docs.snowflake.com/en/sql-reference/functions/convert_timezone.html

No, it extends the existing signature, and the function hasn't been released yet.

By running new tests:
```
$ build/sbt "sql/test:testOnly org.apache.spark.sql.expressions.ExpressionInfoSuite"
$ build/sbt "sql/testOnly org.apache.spark.sql.SQLQueryTestSuite -- -z timestamp-ltz.sql"
$ build/sbt "sql/testOnly org.apache.spark.sql.SQLQueryTestSuite -- -z timestamp-ntz.sql"
```

Closes apache#35951 from MaxGekk/convert_timezone-2-params.

Authored-by: Max Gekk <max.gekk@gmail.com>
Signed-off-by: Max Gekk <max.gekk@gmail.com>
(cherry picked from commit 8eb8a42)
Signed-off-by: Max Gekk <max.gekk@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
5 participants