Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-32272][SQL] Add SQL standard command SET TIME ZONE #29064

Closed
wants to merge 12 commits into from
Closed

[SPARK-32272][SQL] Add SQL standard command SET TIME ZONE #29064

wants to merge 12 commits into from

Conversation

yaooqinn
Copy link
Member

@yaooqinn yaooqinn commented Jul 10, 2020

What changes were proposed in this pull request?

This PR adds the SQL standard command - SET TIME ZONE to the current default time zone displacement for the current SQL-session, which is the same as the existing `set spark.sql.session.timeZone=xxx'.

All in all, this PR adds syntax as following,

SET TIME ZONE LOCAL;
SET TIME ZONE 'valid time zone';  -- zone offset or region
SET TIME ZONE INTERVAL XXXX; -- xxx must in [-18, + 18] hours, * this range is bigger than ansi  [-14, + 14]

Why are the changes needed?

ANSI compliance and supply pure SQL users a way to retrieve all supported TimeZones

Does this PR introduce any user-facing change?

yes, add new syntax.

How was this patch tested?

add unit tests.

and locally verified reference doc

image

@yaooqinn
Copy link
Member Author

cc @cloud-fan @maropu @dongjoon-hyun thanks for reviewing

@cloud-fan
Copy link
Contributor

because it has to be qualified with interval qualifier HOUR TO MINUTE which is not fully supported in Spark now.

HOUR TO MINUTE is a supported interval literal syntax, isn't it?

@yaooqinn
Copy link
Member Author

because it has to be qualified with interval qualifier HOUR TO MINUTE which is not fully supported in Spark now.

HOUR TO MINUTE is a supported interval literal syntax, isn't it?

OK, I will add fully support then.

@SparkQA
Copy link

SparkQA commented Jul 10, 2020

Test build #125602 has finished for PR 29064 at commit 13056b5.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 10, 2020

Test build #125604 has finished for PR 29064 at commit 225695f.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@yaooqinn
Copy link
Member Author

the test case for listing all timezones is removed because it varies from different JDKs.

The interval is supported and also some extensions were made too

  1. offset range is [-18, +18] in spark which is bigger than ansi's,
  2. the second part is supported in Spark, e.g. +14:14:14, while ansi only indicates hours-minutes,
  3. multi value-units interval is much common than unit-to-unit ones, so this interval form is supported too.

cc @cloud-fan

@SparkQA
Copy link

SparkQA commented Jul 13, 2020

Test build #125749 has finished for PR 29064 at commit 3a62ecc.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

val interval = parseIntervalLiteral(ctx.interval)
if (interval.months != 0 || interval.days != 0 ||
math.abs(interval.microseconds) > 18 * DateTimeConstants.MICROS_PER_HOUR) {
throw new ParseException("The interval value must be in the range of [-18, +18] hours",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall we also fail for things like INTERVAL 1 MICROSECOND?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want to fail this like this completely, we need a new interval parser, otherwise, we can only forbid it based on the interval result. e.g. INTERVAL 1 MICROSECOND -1 MICROSECOND 1 HOUR will work...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

INTERVAL 1 MICROSECOND -1 MICROSECOND 1 HOUR is OK. We should forbid truncation like INTERVAL 1 MICROSECOND, which you have to ignore the 1 microsecond.

| SET TIME ZONE ALL #listTimeZones
| SET TIME ZONE interval #setTimeZone
| SET TIME ZONE timezone=(STRING | LOCAL) #setTimeZone
| SET TIME ZONE .*? #setTimeZone
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so we add this only for better parser message?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is used to fail invalid set time zone syntax explicitly, cuz' now we support

spark-sql (default)> set time zone abcd;
key	value
time zone abcd	<undefined>

-- !query schema
struct<key:string,value:string>
-- !query output
spark.sql.session.timeZone Asia/Hong_Kong
Copy link
Contributor

@cloud-fan cloud-fan Jul 13, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a weird output for SET TIME ZONE 'Asia/Hong_Kong'. Shall we add a Project node on the SetCommand to give better output?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do we expect as output? only keep the value Asia/Hong_Kong here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there any reference?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

postgres=# SET TIME ZONE 'Asia/Hong_kong';
SET

@yaooqinn yaooqinn changed the title [SPARK-32272][SQL] Add and extend SQL standard command SET TIME ZONE [SPARK-32272][SQL] Add SQL standard command SET TIME ZONE Jul 13, 2020
@SparkQA
Copy link

SparkQA commented Jul 13, 2020

Test build #125759 has finished for PR 29064 at commit 95a756a.

  • This patch fails PySpark pip packaging tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 13, 2020

Test build #125763 has finished for PR 29064 at commit c95fa7d.

  • This patch fails PySpark pip packaging tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 13, 2020

Test build #125773 has finished for PR 29064 at commit 5501213.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@maropu
Copy link
Member

maropu commented Jul 13, 2020

retest this please

@SparkQA
Copy link

SparkQA commented Jul 14, 2020

Test build #125791 has finished for PR 29064 at commit 5501213.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

We should also add a document page in SQL reference for it.

@yaooqinn
Copy link
Member Author

cc @maropu @cloud-fan @huaxingao. Please check the reference doc for set tz command.


### Description

The `SET TIME ZONE` command sets the current default time zone(`spark.sql.session.timeZone`) displacement for the `SparkSession`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The SET TIME ZONE command sets the time zone of the current session.


* **LOCAL**

Respectively set the time zone the one specified in environment variable `TZ`, or to the operating system time zone if `TZ` is undefined.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

set the time zone "to" the one specified

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

operating system -> system? We actually use the JVM timezone which can be different from the operating system,

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, it prefers user.timezone

@SparkQA
Copy link

SparkQA commented Jul 15, 2020

Test build #125891 has finished for PR 29064 at commit b3ac6f4.

  • This patch fails Spark unit tests.
  • This patch does not merge cleanly.
  • This patch adds no public classes.


* **LOCAL**

Respectively set the time zone the one specified in environment variable `TZ`, or to the operating system time zone if `TZ` is undefined.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Respectively set -> Sets?


* **interval_literal**

The [interval literal](sql-ref-literals.html#interval-literal) represents the displacement of time zone to the 'UTC'. It must be in the range of [-18, 18] hours and max to second precision, e.g. `INTERVAL 2 HOURS 30 MINITUES` or `INTERVAL '15:40:32' HOUR TO SECOND`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess something like this?
The interval literal represents the difference between the (system?) time zone and UTC.

SET TIME ZONE LOCAL;

-- Set time zone to the region-based zone ID.
SET TIME ZONE 'America/Los_Angeles'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: ; in the end?

@huaxingao
Copy link
Contributor

@yaooqinn Please also change menu-sql.yaml to add SET TIME ZONE to the side menu.

@SparkQA
Copy link

SparkQA commented Jul 16, 2020

Test build #125940 has finished for PR 29064 at commit e5aa3b3.

  • This patch fails due to an unknown error code, -9.
  • This patch does not merge cleanly.
  • This patch adds no public classes.


* **LOCAL**

Set the time zone to the one specified in the java `user.timezone` property or environment variable `TZ`, or to the system time zone if they are undefined.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we clarify what happens if both the java user.timezone property or environment variable TZ are specified?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about

Set the time zone to the one specified in the java `user.timezone` property,
or to the environment variable `TZ` if `user.timezone` is undefined,
or to the system time zone if both of them are undefined.

@SparkQA
Copy link

SparkQA commented Jul 16, 2020

Test build #125959 has finished for PR 29064 at commit 11f0fed.

  • This patch fails PySpark pip packaging tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@yaooqinn
Copy link
Member Author

retest this please

@cloud-fan
Copy link
Contributor

All github action checks passed, I think we are good to go. Thanks, merging to master!

@cloud-fan cloud-fan closed this in bdeb626 Jul 16, 2020
@SparkQA
Copy link

SparkQA commented Jul 16, 2020

Test build #125963 has finished for PR 29064 at commit a4c60f3.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 16, 2020

Test build #125974 has finished for PR 29064 at commit a4c60f3.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@@ -240,6 +240,9 @@ statement
| MSCK REPAIR TABLE multipartIdentifier #repairTable
| op=(ADD | LIST) identifier (STRING | .*?) #manageResource
| SET ROLE .*? #failNativeCommand
| SET TIME ZONE interval #setTimeZone
| SET TIME ZONE timezone=(STRING | LOCAL) #setTimeZone
| SET TIME ZONE .*? #setTimeZone
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are close to the DB2 syntax, except that we support interval and LOCAL, and we don't allow the optional SESSION keyword.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
7 participants