New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-42635][SQL][3.3] Fix the TimestampAdd expression #40264
Conversation
This PR fixed the counter-intuitive behaviors of the `TimestampAdd` expression mentioned in https://issues.apache.org/jira/browse/SPARK-42635. See the following *user-facing* changes for details. Yes. This PR fixes the three problems mentioned in SPARK-42635: 1. When the time is close to daylight saving time transition, the result may be discontinuous and not monotonic. 2. Adding month, quarter, and year silently ignores `Int` overflow during unit conversion. 3. Adding sub-month units (week, day, hour, minute, second, millisecond, microsecond)silently ignores `Long` overflow during unit conversion. Some examples of the result changes: Old results: ``` // In America/Los_Angeles timezone: timestampadd(DAY, 1, 2011-03-12 03:00:00) = 2011-03-13 03:00:00 (this is correct, put it here for comparison) timestampadd(HOUR, 23, 2011-03-12 03:00:00) = 2011-03-13 03:00:00 timestampadd(HOUR, 24, 2011-03-12 03:00:00) = 2011-03-13 03:00:00 timestampadd(SECOND, 86400 - 1, 2011-03-12 03:00:00) = 2011-03-13 03:59:59 timestampadd(SECOND, 86400, 2011-03-12 03:00:00) = 2011-03-13 03:00:00 // In UTC timezone: timestampadd(quarter, 1431655764, 1970-01-01 00:00:00) = 1969-09-01 00:00:00 timestampadd(day, 106751992, 1970-01-01 00:00:00) = -290308-12-22 15:58:10.448384 ``` New results: ``` // In America/Los_Angeles timezone: timestampadd(DAY, 1, 2011-03-12 03:00:00) = 2011-03-13 03:00:00 timestampadd(HOUR, 23, 2011-03-12 03:00:00) = 2011-03-13 03:00:00 timestampadd(HOUR, 24, 2011-03-12 03:00:00) = 2011-03-13 04:00:00 timestampadd(SECOND, 86400 - 1, 2011-03-12 03:00:00) = 2011-03-13 03:59:59 timestampadd(SECOND, 86400, 2011-03-12 03:00:00) = 2011-03-13 04:00:00 // In UTC timezone: timestampadd(quarter, 1431655764, 1970-01-01 00:00:00) = throw overflow exception timestampadd(day, 106751992, 1970-01-01 00:00:00) = throw overflow exception ``` Pass existing tests and some new tests. Closes apache#40237 from chenhao-db/SPARK-42635. Authored-by: Chenhao Li <chenhao.li@databricks.com> Signed-off-by: Max Gekk <max.gekk@gmail.com>
@MaxGekk Please take a look, thanks for reviewing! |
@chenhao-db Could you fix the build errors:
|
@MaxGekk It seems that |
Seems like the test failure is related to the changes:
|
@MaxGekk I see. In old versions Spark doesn't include the error class in the error message: https://github.com/apache/spark/blob/branch-3.3/core/src/main/scala/org/apache/spark/ErrorInfo.scala#L74. I just removed the error class prefix in the expected error message. |
+1, LGTM. All GAs passed. Merging to 3.3. |
This is a backport of #40237. ### What changes were proposed in this pull request? This PR fixed the counter-intuitive behaviors of the `TimestampAdd` expression mentioned in https://issues.apache.org/jira/browse/SPARK-42635. See the following *user-facing* changes for details. ### Does this PR introduce _any_ user-facing change? Yes. This PR fixes the three problems mentioned in SPARK-42635: 1. When the time is close to daylight saving time transition, the result may be discontinuous and not monotonic. 2. Adding month, quarter, and year silently ignores `Int` overflow during unit conversion. 3. Adding sub-month units (week, day, hour, minute, second, millisecond, microsecond)silently ignores `Long` overflow during unit conversion. Some examples of the result changes: Old results: ``` // In America/Los_Angeles timezone: timestampadd(DAY, 1, 2011-03-12 03:00:00) = 2011-03-13 03:00:00 (this is correct, put it here for comparison) timestampadd(HOUR, 23, 2011-03-12 03:00:00) = 2011-03-13 03:00:00 timestampadd(HOUR, 24, 2011-03-12 03:00:00) = 2011-03-13 03:00:00 timestampadd(SECOND, 86400 - 1, 2011-03-12 03:00:00) = 2011-03-13 03:59:59 timestampadd(SECOND, 86400, 2011-03-12 03:00:00) = 2011-03-13 03:00:00 // In UTC timezone: timestampadd(quarter, 1431655764, 1970-01-01 00:00:00) = 1969-09-01 00:00:00 timestampadd(day, 106751992, 1970-01-01 00:00:00) = -290308-12-22 15:58:10.448384 ``` New results: ``` // In America/Los_Angeles timezone: timestampadd(DAY, 1, 2011-03-12 03:00:00) = 2011-03-13 03:00:00 timestampadd(HOUR, 23, 2011-03-12 03:00:00) = 2011-03-13 03:00:00 timestampadd(HOUR, 24, 2011-03-12 03:00:00) = 2011-03-13 04:00:00 timestampadd(SECOND, 86400 - 1, 2011-03-12 03:00:00) = 2011-03-13 03:59:59 timestampadd(SECOND, 86400, 2011-03-12 03:00:00) = 2011-03-13 04:00:00 // In UTC timezone: timestampadd(quarter, 1431655764, 1970-01-01 00:00:00) = throw overflow exception timestampadd(day, 106751992, 1970-01-01 00:00:00) = throw overflow exception ``` ### How was this patch tested? Pass existing tests and some new tests. Closes #40264 from chenhao-db/cherry-pick-SPARK-42635. Authored-by: Chenhao Li <chenhao.li@databricks.com> Signed-off-by: Max Gekk <max.gekk@gmail.com>
This is a backport of #40237.
What changes were proposed in this pull request?
This PR fixed the counter-intuitive behaviors of the
TimestampAdd
expression mentioned in https://issues.apache.org/jira/browse/SPARK-42635. See the following user-facing changes for details.Does this PR introduce any user-facing change?
Yes. This PR fixes the three problems mentioned in SPARK-42635:
Int
overflow during unit conversion.Long
overflow during unit conversion.Some examples of the result changes:
Old results:
New results:
How was this patch tested?
Pass existing tests and some new tests.