[SPARK-31111][SQL][Tests] Fix interval output issue in ExtractBenchmark #27867

yaooqinn · 2020-03-10T16:43:52Z

What changes were proposed in this pull request?

fix the error caused by interval output in ExtractBenchmark

Why are the changes needed?

fix a bug in the test

[info]   Running case: cast to interval
[error] Exception in thread "main" org.apache.spark.sql.AnalysisException: Cannot use interval type in the table schema.;;
[error] OverwriteByExpression RelationV2[] noop-table, true, true
[error] +- Project [(subtractdates(cast(cast(id#0L as timestamp) as date), -719162) + subtracttimestamps(cast(id#0L as timestamp), -30610249419876544)) AS ((CAST(CAST(id AS TIMESTAMP) AS DATE) - DATE '0001-01-01') + (CAST(id AS TIMESTAMP) - TIMESTAMP '1000-01-01 01:02:03.123456'))#2]
[error]    +- Range (1262304000, 1272304000, step=1, splits=Some(1))
[error]
[error] 	at org.apache.spark.sql.catalyst.util.TypeUtils$.failWithIntervalType(TypeUtils.scala:106)
[error] 	at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis$25(CheckAnalysis.scala:389)
[error] 	at org.a

Does this PR introduce any user-facing change?

no

How was this patch tested?

re-run benchmark

yaooqinn · 2020-03-10T16:46:16Z

cc @cloud-fan, thanks.

SparkQA · 2020-03-10T21:12:12Z

Test build #119625 has finished for PR 27867 at commit f4501cd.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

dongjoon-hyun · 2020-03-11T02:26:31Z

sql/core/benchmarks/ExtractBenchmark-results.txt

-MILLISECONDS of interval                           1435           1449          16          7.0         143.5       0.9X
-MICROSECONDS of interval                           1304           1314           9          7.7         130.4       1.0X
-EPOCH of interval                                  1440           1453          19          6.9         144.0       0.9X
+cast to interval                                   3984           4034          79          2.5         398.4       1.0X


So, this slowdown came from the change, right?

Hold on a bit... something wrong here

I fixed the problem, thanks, @dongjoon-hyun.

the benchmark cast to interval is slower because of the extra cast to string logic.

the extract-related benchmark tests are ok now

cloud-fan · 2020-03-11T03:42:02Z

sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/ExtractBenchmark.scala

    case "timestamp" => "cast(id as timestamp)"
    case "date" => "cast(cast(id as timestamp) as date)"
+    case "interval" if toStr => "cast((cast(cast(id as timestamp) as date) - date'0001-01-01') + " +


why we need to cast to string?

Hmm, in this benchmark, there are 2 kinds of tests, one is extract/date_part and the other is string to interval, the latter fails type checking when saving.
Or shall we just remove the string to interval case here https://github.com/apache/spark/pull/27867/files#diff-b89131adf146fef7cd6e3db381fd0807L111

Why it's not a problem for date/timestamp?

We forbid interval as output schema only

Maybe we should not use the noop sink here, but df.queryExecution.toRDD.foreach(_ => ())

good advice, thanks!

SparkQA · 2020-03-11T07:05:02Z

Test build #119644 has finished for PR 27867 at commit e5be851.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2020-03-11T07:05:02Z

Test build #119650 has finished for PR 27867 at commit 14722a1.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds no public classes.

yaooqinn · 2020-03-11T07:06:17Z

retest this please

SparkQA · 2020-03-11T11:23:56Z

Test build #119654 has finished for PR 27867 at commit 14722a1.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

cloud-fan · 2020-03-11T12:14:14Z

thanks, merging to master/3.0!

### What changes were proposed in this pull request? fix the error caused by interval output in ExtractBenchmark ### Why are the changes needed? fix a bug in the test ```scala [info] Running case: cast to interval [error] Exception in thread "main" org.apache.spark.sql.AnalysisException: Cannot use interval type in the table schema.;; [error] OverwriteByExpression RelationV2[] noop-table, true, true [error] +- Project [(subtractdates(cast(cast(id#0L as timestamp) as date), -719162) + subtracttimestamps(cast(id#0L as timestamp), -30610249419876544)) AS ((CAST(CAST(id AS TIMESTAMP) AS DATE) - DATE '0001-01-01') + (CAST(id AS TIMESTAMP) - TIMESTAMP '1000-01-01 01:02:03.123456'))#2] [error] +- Range (1262304000, 1272304000, step=1, splits=Some(1)) [error] [error] at org.apache.spark.sql.catalyst.util.TypeUtils$.failWithIntervalType(TypeUtils.scala:106) [error] at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis$25(CheckAnalysis.scala:389) [error] at org.a ``` ### Does this PR introduce any user-facing change? no ### How was this patch tested? re-run benchmark Closes #27867 from yaooqinn/SPARK-31111. Authored-by: Kent Yao <yaooqinn@hotmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> (cherry picked from commit 2b46662) Signed-off-by: Wenchen Fan <wenchen@databricks.com>

dongjoon-hyun · 2020-03-11T15:34:20Z

sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/ExtractBenchmark.scala

@@ -42,7 +42,9 @@ object ExtractBenchmark extends SqlBasedBenchmark {
      spark
        .range(sinceSecond, sinceSecond + cardinality, 1, 1)
        .selectExpr(exprs: _*)
-        .noop()


FYI, @MaxGekk , too.

### What changes were proposed in this pull request? fix the error caused by interval output in ExtractBenchmark ### Why are the changes needed? fix a bug in the test ```scala [info] Running case: cast to interval [error] Exception in thread "main" org.apache.spark.sql.AnalysisException: Cannot use interval type in the table schema.;; [error] OverwriteByExpression RelationV2[] noop-table, true, true [error] +- Project [(subtractdates(cast(cast(id#0L as timestamp) as date), -719162) + subtracttimestamps(cast(id#0L as timestamp), -30610249419876544)) AS ((CAST(CAST(id AS TIMESTAMP) AS DATE) - DATE '0001-01-01') + (CAST(id AS TIMESTAMP) - TIMESTAMP '1000-01-01 01:02:03.123456'))apache#2] [error] +- Range (1262304000, 1272304000, step=1, splits=Some(1)) [error] [error] at org.apache.spark.sql.catalyst.util.TypeUtils$.failWithIntervalType(TypeUtils.scala:106) [error] at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis$25(CheckAnalysis.scala:389) [error] at org.a ``` ### Does this PR introduce any user-facing change? no ### How was this patch tested? re-run benchmark Closes apache#27867 from yaooqinn/SPARK-31111. Authored-by: Kent Yao <yaooqinn@hotmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>

yaooqinn added 2 commits March 10, 2020 23:56

[SPARK-31111][SQL][Tests] Fix interval output issue in ExtractBenchmark

4b48971

jdk11

f4501cd

dongjoon-hyun added SQL TESTS labels Mar 10, 2020

dongjoon-hyun reviewed Mar 11, 2020

View reviewed changes

fix test

e5be851

cloud-fan reviewed Mar 11, 2020

View reviewed changes

address comments

14722a1

cloud-fan approved these changes Mar 11, 2020

View reviewed changes

cloud-fan closed this in 2b46662 Mar 11, 2020

dongjoon-hyun reviewed Mar 11, 2020

View reviewed changes

yaooqinn mentioned this pull request Mar 12, 2020

[SPARK-31129][SQL][Tests] Fix IntervalBenchmark and DateTimeBenchmark #27885

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-31111][SQL][Tests] Fix interval output issue in ExtractBenchmark #27867

[SPARK-31111][SQL][Tests] Fix interval output issue in ExtractBenchmark #27867

yaooqinn commented Mar 10, 2020 •

edited

Loading

yaooqinn commented Mar 10, 2020

SparkQA commented Mar 10, 2020

dongjoon-hyun Mar 11, 2020

yaooqinn Mar 11, 2020

yaooqinn Mar 11, 2020

yaooqinn Mar 11, 2020

cloud-fan Mar 11, 2020

yaooqinn Mar 11, 2020 •

edited

Loading

cloud-fan Mar 11, 2020

yaooqinn Mar 11, 2020

cloud-fan Mar 11, 2020

yaooqinn Mar 11, 2020

SparkQA commented Mar 11, 2020

SparkQA commented Mar 11, 2020

yaooqinn commented Mar 11, 2020

SparkQA commented Mar 11, 2020

cloud-fan commented Mar 11, 2020

dongjoon-hyun Mar 11, 2020

[SPARK-31111][SQL][Tests] Fix interval output issue in ExtractBenchmark #27867

[SPARK-31111][SQL][Tests] Fix interval output issue in ExtractBenchmark #27867

Conversation

yaooqinn commented Mar 10, 2020 • edited Loading

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

yaooqinn commented Mar 10, 2020

SparkQA commented Mar 10, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yaooqinn Mar 11, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SparkQA commented Mar 11, 2020

SparkQA commented Mar 11, 2020

yaooqinn commented Mar 11, 2020

SparkQA commented Mar 11, 2020

cloud-fan commented Mar 11, 2020

Choose a reason for hiding this comment

yaooqinn commented Mar 10, 2020 •

edited

Loading

yaooqinn Mar 11, 2020 •

edited

Loading