[SPARK-31005][SQL] Support time zone ids in casting strings to timestamps#27753
[SPARK-31005][SQL] Support time zone ids in casting strings to timestamps#27753MaxGekk wants to merge 4 commits intoapache:masterfrom
Conversation
|
Test build #119143 has finished for PR 27753 at commit
|
| } | ||
| } else if (i == 5 || i == 6) { | ||
| if (b == 'Z') { | ||
| if (b == '-' || b == '+') { |
There was a problem hiding this comment.
getZoneId() is able to handle zone offsets w/ prefix - and + but it doesn't support the format 7:3 like in
|
@cloud-fan @HyukjinKwon Please, review the PR. I haven't updated comments for |
|
Test build #119228 has finished for PR 27753 at commit
|
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala
Show resolved
Hide resolved
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/DateTimeUtilsSuite.scala
Show resolved
Hide resolved
| checkStringToTimestamp("2015-03-18T12:03:17.123121+7:30", expected) | ||
|
|
||
| zoneId = getZoneId("GMT+07:30") | ||
| expected = Option(date(2015, 3, 18, 12, 3, 17, 123120, zid = zoneId)) |
There was a problem hiding this comment.
why drop this test? 123120 is different from 123121
There was a problem hiding this comment.
reverted it back, and added more tests
| i += 1 | ||
| tz = Some(43) | ||
| } else if (b == '-' || b == '+') { | ||
| tz = Some(new String(bytes, j, 1)) |
There was a problem hiding this comment.
why not just b.toChar.toString?
There was a problem hiding this comment.
Just for consistency with another change
|
Test build #119309 has finished for PR 27753 at commit
|
|
thanks, merging to master/3.0! |
…amps ### What changes were proposed in this pull request? In the PR, I propose to change `DateTimeUtils.stringToTimestamp` to support any valid time zone id at the end of input string. After the changes, the function accepts zone ids in the formats: - no zone id. In that case, the function uses the local session time zone from the SQL config `spark.sql.session.timeZone` - -[h]h:[m]m - +[h]h:[m]m - Z - Short zone id, see https://docs.oracle.com/javase/8/docs/api/java/time/ZoneId.html#SHORT_IDS - Zone ID starts with 'UTC+', 'UTC-', 'GMT+', 'GMT-', 'UT+' or 'UT-'. The ID is split in two, with a two or three letter prefix and a suffix starting with the sign. The suffix must be in the formats: - +|-h[h] - +|-hh[:]mm - +|-hh:mm:ss - +|-hhmmss - Region-based zone IDs in the form `{area}/{city}`, such as `Europe/Paris` or `America/New_York`. The default set of region ids is supplied by the IANA Time Zone Database (TZDB). ### Why are the changes needed? - To use `stringToTimestamp` as a substitution of removed `stringToTime`, see #27710 (comment) - Improve UX of Spark SQL by allowing flexible formats of zone ids. Currently, Spark accepts only `Z` and zone offsets that can be inconvenient when a time zone offset is shifted due to daylight saving rules. For instance: ```sql spark-sql> select cast('2015-03-18T12:03:17.123456 Europe/Moscow' as timestamp); NULL ``` ### Does this PR introduce any user-facing change? Yes. After the changes, casting strings to timestamps allows time zone id at the end of the strings: ```sql spark-sql> select cast('2015-03-18T12:03:17.123456 Europe/Moscow' as timestamp); 2015-03-18 12:03:17.123456 ``` ### How was this patch tested? - Added new test cases to the `string to timestamp` test in `DateTimeUtilsSuite`. - Run `CastSuite` and `AnsiCastSuite`. Closes #27753 from MaxGekk/stringToTimestamp-uni-zoneId. Authored-by: Maxim Gekk <max.gekk@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> (cherry picked from commit 1fd9a91) Signed-off-by: Wenchen Fan <wenchen@databricks.com>
…amps ### What changes were proposed in this pull request? In the PR, I propose to change `DateTimeUtils.stringToTimestamp` to support any valid time zone id at the end of input string. After the changes, the function accepts zone ids in the formats: - no zone id. In that case, the function uses the local session time zone from the SQL config `spark.sql.session.timeZone` - -[h]h:[m]m - +[h]h:[m]m - Z - Short zone id, see https://docs.oracle.com/javase/8/docs/api/java/time/ZoneId.html#SHORT_IDS - Zone ID starts with 'UTC+', 'UTC-', 'GMT+', 'GMT-', 'UT+' or 'UT-'. The ID is split in two, with a two or three letter prefix and a suffix starting with the sign. The suffix must be in the formats: - +|-h[h] - +|-hh[:]mm - +|-hh:mm:ss - +|-hhmmss - Region-based zone IDs in the form `{area}/{city}`, such as `Europe/Paris` or `America/New_York`. The default set of region ids is supplied by the IANA Time Zone Database (TZDB). ### Why are the changes needed? - To use `stringToTimestamp` as a substitution of removed `stringToTime`, see apache#27710 (comment) - Improve UX of Spark SQL by allowing flexible formats of zone ids. Currently, Spark accepts only `Z` and zone offsets that can be inconvenient when a time zone offset is shifted due to daylight saving rules. For instance: ```sql spark-sql> select cast('2015-03-18T12:03:17.123456 Europe/Moscow' as timestamp); NULL ``` ### Does this PR introduce any user-facing change? Yes. After the changes, casting strings to timestamps allows time zone id at the end of the strings: ```sql spark-sql> select cast('2015-03-18T12:03:17.123456 Europe/Moscow' as timestamp); 2015-03-18 12:03:17.123456 ``` ### How was this patch tested? - Added new test cases to the `string to timestamp` test in `DateTimeUtilsSuite`. - Run `CastSuite` and `AnsiCastSuite`. Closes apache#27753 from MaxGekk/stringToTimestamp-uni-zoneId. Authored-by: Maxim Gekk <max.gekk@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
What changes were proposed in this pull request?
In the PR, I propose to change
DateTimeUtils.stringToTimestampto support any valid time zone id at the end of input string. After the changes, the function accepts zone ids in the formats:spark.sql.session.timeZone{area}/{city}, such asEurope/ParisorAmerica/New_York. The default set of region ids is supplied by the IANA Time Zone Database (TZDB).Why are the changes needed?
stringToTimestampas a substitution of removedstringToTime, see [SPARK-30960][SQL] add back the legacy date/timestamp format support in CSV/JSON parser #27710 (comment)Zand zone offsets that can be inconvenient when a time zone offset is shifted due to daylight saving rules. For instance:Does this PR introduce any user-facing change?
Yes. After the changes, casting strings to timestamps allows time zone id at the end of the strings:
How was this patch tested?
string to timestamptest inDateTimeUtilsSuite.CastSuiteandAnsiCastSuite.