New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-31291][SQL][TEST] SQLQueryTestSuite: Sharing test data and test tables among multiple test cases #28060
Conversation
Test build #120521 has finished for PR 28060 at commit
|
Test build #120522 has finished for PR 28060 at commit
|
retest this please |
Looks cool, thanks for the work, @beliefer ! btw, how long will |
Test build #120525 has finished for PR 28060 at commit
|
The total time after optimization is less than that before optimization by nearly one minute. |
Test build #120530 has finished for PR 28060 at commit
|
if (testTables.contains("arraydata")) { | ||
((Seq(1, 2, 3), Seq(Seq(1, 2, 3))) :: (Seq(2, 3, 4), Seq(Seq(2, 3, 4))) :: Nil) | ||
.toDF("arraycol", "nestedarraycol") | ||
.createOrReplaceTempView("arraydata") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To avoid the overhead of per-session init, we cannot just move these local temp views into a session-independent place, e.g., global temp views?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The key point of conflict is not this. For example, test case A will create view testdata, and test case B will also create view testdata. However, the schema information of the two testdata is different. If the same session is shared globally, it will cause conflicts, especially in parallel execution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the same session is shared globally
I think that's not what @maropu means. We still create a fresh session for each testing file, but the testing views are created as global temp view, which are shared between all sessions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cloud-fan Thanks. I under @maropu 's suggestion now. I will try to use createGlobalTempView
and shared these views between all sessions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, but I missed your reply, @beliefer. Yea, that's what I wanted to say, thanks, @cloud-fan . I'll check later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cloud-fan
If use sparkSession.newSession()
, test case failed. such as:
23:25:05.078 ERROR org.apache.spark.sql.SQLQueryTestSuite: Error using configs:
[info] - operators.sql *** FAILED *** (5 seconds, 722 milliseconds)
[info] operators.sql
[info] Expected "struct<[(- key):int,(+ key):int]>", but got "struct<[]>" Schema did not match for query #4
[info] select -key, +key from testdata where key = 2: -- !query
[info] select -key, +key from testdata where key = 2
[info] -- !query schema
[info] struct<>
[info] -- !query output
[info] org.apache.spark.sql.AnalysisException
[info] Table or view not found: testdata; line 1 pos 23 (SQLQueryTestSuite.scala:464)
[info] org.scalatest.exceptions.TestFailedException:
[info] at org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:530)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
val a = spark.sql("show views;")
a.show();
+---------+---------+-----------+
|namespace| viewName|isTemporary|
+---------+---------+-----------+
| | aggtest| true|
| |arraydata| true|
| | mapdata| true|
| | onek| true|
| | tenk1| true|
+---------+---------+-----------+
val localSparkSession = spark.newSession()
val a2 = localSparkSession.sql("show views;")
a2.show();
+---------+--------+-----------+
|namespace|viewName|isTemporary|
+---------+--------+-----------+
+---------+--------+-----------+
Maybe I lost some thing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
like this? master...maropu:SPARK-31291
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
master...maropu:SPARK-31291
This is a method but will cause too many changes. After a discussion offline between @cloud-fan and me, I will try to use df.write.saveAsTable
replace df.createTempView
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cloud-fan @maropu I have used df.write.saveAsTable
replace df.createTempView
.
Because the origin temp view changed to tables, I have to regenerate some golden files.
Hi, All. cc @gatorsmile and @gengliangwang |
This was assigned to @beliefer after our offline talk. He is trying to find out the reasons why SQLQueryTestSuite took 35 minutes to finish. The time costs of each step/phase can help us locate the root cause. It would be interesting to know whether our compiler overhead are too big for these short queries. |
@dongjoon-hyun I think even in parallel execution, this PR will still help. |
cc @cloud-fan |
c392c03
to
45bba4c
Compare
Test build #120977 has finished for PR 28060 at commit
|
@@ -7,8 +7,8 @@ SELECT * FROM testdata LIMIT 2 | |||
-- !query schema | |||
struct<key:int,value:string> | |||
-- !query output | |||
1 1 | |||
2 2 | |||
51 51 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we add a sort before limit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I use repartition resolved the issue.
@@ -668,6 +690,7 @@ class SQLQueryTestSuite extends QueryTest with SharedSparkSession { | |||
try { | |||
TimeZone.setDefault(originalTimeZone) | |||
Locale.setDefault(originalLocale) | |||
unloadTestData(spark) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer createTestTables
and removeTestTables
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK.
the change LGTM, can you regenerate the benchmark numbers? |
OK. |
Test build #121005 has finished for PR 28060 at commit
|
retest this please |
Test build #121015 has finished for PR 28060 at commit
|
thanks, merging to master/3.0! |
…t tables among multiple test cases ### What changes were proposed in this pull request? `SQLQueryTestSuite` spend 35 minutes time to test. I've listed the 10 test cases that took the longest time in the `SQL` module below. Class | Spend time ↑ | Failure | Skip | Pass | Total test case -- | -- | -- | -- | -- | -- SQLQueryTestSuite | 35 minutes | 0 | 1 | 230 | 231 TPCDSQuerySuite | 3 minutes 8 seconds | 0 | 0 | 156 | 156 SQLQuerySuite | 2 minutes 52 seconds | 0 | 0 | 185 | 185 DynamicPartitionPruningSuiteAEOff | 1 minutes 52 seconds | 0 | 0 | 22 | 22 DataFrameFunctionsSuite | 1 minutes 37 seconds | 0 | 0 | 102 | 102 DynamicPartitionPruningSuiteAEOn | 1 minutes 24 seconds | 0 | 0 | 22 | 22 DataFrameSuite | 1 minutes 14 seconds | 0 | 2 | 157 | 159 SubquerySuite | 1 minutes 12 seconds | 0 | 1 | 70 | 71 SingleLevelAggregateHashMapSuite | 1 minutes 1 seconds | 0 | 0 | 50 | 50 DataFrameAggregateSuite | 59 seconds | 0 | 0 | 50 | 50 I checked the code of `SQLQueryTestSuite` and found `SQLQueryTestSuite` load test data repeatedly. This PR will improve the performance of `SQLQueryTestSuite`. The total time run `SQLQueryTestSuite` before and after this PR show below. Before No | Time -- | -- 1 | 20 minutes, 22 seconds 2 | 23 minutes, 21 seconds 3 | 21 minutes, 19 seconds 4 | 22 minutes, 26 seconds 5 | 20 minutes, 8 seconds After No | Time -- | -- 1 | 20 minutes, 52 seconds 2 | 20 minutes, 47 seconds 3 | 20 minutes, 7 seconds 4 | 21 minutes, 10 seconds 5 | 20 minutes, 4 seconds ### Why are the changes needed? Improve the performance of `SQLQueryTestSuite`. ### Does this PR introduce any user-facing change? 'No'. ### How was this patch tested? Jenkins test Closes #28060 from beliefer/avoid-load-test-data-repeatedly. Lead-authored-by: gengjiaan <gengjiaan@360.cn> Co-authored-by: beliefer <beliefer@163.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> (cherry picked from commit 014d335) Signed-off-by: Wenchen Fan <wenchen@databricks.com>
@cloud-fan @maropu Thanks for review this PR. |
…a and test tables among multiple test cases ### What changes were proposed in this pull request? This PR is related to #28060. `ThriftServerQueryTestSuite` spend 17 minutes time to test. I checked the code and found `ThriftServerQueryTestSuite` load test data repeatedly. I've listed all the test cases order by time with desc in the `hive-thriftserver` module below. Class | Spend time ↑ | Failure | Skip | Pass | Total test case -- | -- | -- | -- | -- | -- ThriftServerQueryTestSuite | 17 minutes | 0 | 15 | 140 | 155 CliSuite | 8 minutes 24 seconds | 0 | 0 | 24 | 24 SparkThriftServerProtocolVersionsSuite | 59 seconds | 0 | 0 | 210 | 210 HiveThriftBinaryServerSuite | 36 seconds | 0 | 1 | 21 | 22 SparkMetadataOperationSuite | 19 seconds | 0 | 0 | 7 | 7 HiveCliSessionStateSuite | 16 seconds | 0 | 0 | 2 | 2 SparkSQLEnvSuite | 16 seconds | 0 | 0 | 1 | 1 HiveThriftHttpServerSuite | 15 seconds | 0 | 0 | 3 | 3 SingleSessionSuite | 14 seconds | 0 | 0 | 3 | 3 JdbcConnectionUriSuite | 2.1 seconds | 0 | 0 | 1 | 1 ThriftServerWithSparkContextSuite | 1.4 seconds | 0 | 0 | 1 | 1 SparkExecuteStatementOperationSuite | 63 millseconds | 0 | 0 | 2 | 2 UISeleniumSuite | -1 millseconds | 0 | 1 | 0 | 1 I checked the code of `ThriftServerQueryTestSuite` and found `ThriftServerQueryTestSuite` load test data repeatedly. This PR will improve the performance of `ThriftServerQueryTestSuite`. Because #28060 provides `createTestTables`(https://github.com/apache/spark/blob/e42a3945acd614a26c7941a9eed161b500fb4520/sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala#L574) and `removeTestTables`(https://github.com/apache/spark/blob/e42a3945acd614a26c7941a9eed161b500fb4520/sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala#L666), this PR will still uses them. The total time run `ThriftServerQueryTestSuite` before and after this PR show below. Before No | Time -- | -- 1 | 18 minutes, 8 seconds 2 | 22 minutes, 44 seconds 3 | 17 minutes, 48 seconds 4 | 18 minutes, 30 seconds After No | Time -- | -- 1 | 16 minutes, 11 seconds 2 | 17 minutes, 19 seconds 3 | 18 minutes, 15 seconds 4 | 17 minutes, 27 seconds ### Why are the changes needed? Improve the performance of `ThriftServerQueryTestSuite`. ### Does this PR introduce any user-facing change? 'No'. ### How was this patch tested? Jenkins test Closes #28180 from beliefer/avoid-load-thrift-test-data-repeatedly. Authored-by: beliefer <beliefer@163.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
…a and test tables among multiple test cases ### What changes were proposed in this pull request? This PR is related to #28060. `ThriftServerQueryTestSuite` spend 17 minutes time to test. I checked the code and found `ThriftServerQueryTestSuite` load test data repeatedly. I've listed all the test cases order by time with desc in the `hive-thriftserver` module below. Class | Spend time ↑ | Failure | Skip | Pass | Total test case -- | -- | -- | -- | -- | -- ThriftServerQueryTestSuite | 17 minutes | 0 | 15 | 140 | 155 CliSuite | 8 minutes 24 seconds | 0 | 0 | 24 | 24 SparkThriftServerProtocolVersionsSuite | 59 seconds | 0 | 0 | 210 | 210 HiveThriftBinaryServerSuite | 36 seconds | 0 | 1 | 21 | 22 SparkMetadataOperationSuite | 19 seconds | 0 | 0 | 7 | 7 HiveCliSessionStateSuite | 16 seconds | 0 | 0 | 2 | 2 SparkSQLEnvSuite | 16 seconds | 0 | 0 | 1 | 1 HiveThriftHttpServerSuite | 15 seconds | 0 | 0 | 3 | 3 SingleSessionSuite | 14 seconds | 0 | 0 | 3 | 3 JdbcConnectionUriSuite | 2.1 seconds | 0 | 0 | 1 | 1 ThriftServerWithSparkContextSuite | 1.4 seconds | 0 | 0 | 1 | 1 SparkExecuteStatementOperationSuite | 63 millseconds | 0 | 0 | 2 | 2 UISeleniumSuite | -1 millseconds | 0 | 1 | 0 | 1 I checked the code of `ThriftServerQueryTestSuite` and found `ThriftServerQueryTestSuite` load test data repeatedly. This PR will improve the performance of `ThriftServerQueryTestSuite`. Because #28060 provides `createTestTables`(https://github.com/apache/spark/blob/e42a3945acd614a26c7941a9eed161b500fb4520/sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala#L574) and `removeTestTables`(https://github.com/apache/spark/blob/e42a3945acd614a26c7941a9eed161b500fb4520/sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala#L666), this PR will still uses them. The total time run `ThriftServerQueryTestSuite` before and after this PR show below. Before No | Time -- | -- 1 | 18 minutes, 8 seconds 2 | 22 minutes, 44 seconds 3 | 17 minutes, 48 seconds 4 | 18 minutes, 30 seconds After No | Time -- | -- 1 | 16 minutes, 11 seconds 2 | 17 minutes, 19 seconds 3 | 18 minutes, 15 seconds 4 | 17 minutes, 27 seconds ### Why are the changes needed? Improve the performance of `ThriftServerQueryTestSuite`. ### Does this PR introduce any user-facing change? 'No'. ### How was this patch tested? Jenkins test Closes #28180 from beliefer/avoid-load-thrift-test-data-repeatedly. Authored-by: beliefer <beliefer@163.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> (cherry picked from commit 2d3692e) Signed-off-by: Wenchen Fan <wenchen@databricks.com>
Hi, all.
Could you take a look? |
.createOrReplaceTempView("mapdata") | ||
.write | ||
.format("parquet") | ||
.saveAsTable("mapdata") | ||
|
||
session | ||
.read | ||
.format("csv") | ||
.options(Map("delimiter" -> "\t", "header" -> "false")) | ||
.schema("a int, b float") | ||
.load(testFile("test-data/postgresql/agg.data")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to be the root cause of failure.
java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: jar:file:/home/jenkins/workspace/spark-branch-3.0-test-maven-hadoop-2.7-hive-2.3/sql/core/target/spark-sql_2.12-3.0.1-SNAPSHOT-tests.jar!/test-data/postgresql/agg.data
Since I found the root cause, I'll make a follow-up PR soon. |
The one quick fix is copying the test file from Since this PR is about the performance, the fix will increase the test time a little for copying. |
I made a follow-up PR to recover |
…ftServerQueryTestSuite ### What changes were proposed in this pull request? [SPARK-31291](#28060) broke `ThriftServerQueryTestSuite` in Maven environment. This PR fixes it by copying the resource file from jars to local temp file. ### Why are the changes needed? To recover the Jenkins jobs in `master` and `branch-3.0`. - https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-3.0-test-maven-hadoop-2.7-hive-2.3/211/ ``` org.apache.spark.sql.hive.thriftserver.ThriftServerQueryTestSuite *** ABORTED *** ... java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: jar:file:/home/jenkins/workspace/spark-branch-3.0-test-maven-hadoop-2.7-hive-2.3/sql/core/target/ spark-sql_2.12-3.0.1-SNAPSHOT-tests.jar!/test-data/postgresql/agg.data ``` ![Screen Shot 2020-04-10 at 9 54 28 PM](https://user-images.githubusercontent.com/9700541/79035702-f03ad900-7b75-11ea-9eee-0c1581a28838.png) ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Pass the Jenkins with SBT and Maven. - [x] Sbt (`Test build #121117` #28186 (comment)) - [x] Maven (`Test build #121118` #28186 (comment)) Closes #28186 from dongjoon-hyun/SPARK-31291. Authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
…ftServerQueryTestSuite ### What changes were proposed in this pull request? [SPARK-31291](#28060) broke `ThriftServerQueryTestSuite` in Maven environment. This PR fixes it by copying the resource file from jars to local temp file. ### Why are the changes needed? To recover the Jenkins jobs in `master` and `branch-3.0`. - https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-3.0-test-maven-hadoop-2.7-hive-2.3/211/ ``` org.apache.spark.sql.hive.thriftserver.ThriftServerQueryTestSuite *** ABORTED *** ... java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: jar:file:/home/jenkins/workspace/spark-branch-3.0-test-maven-hadoop-2.7-hive-2.3/sql/core/target/ spark-sql_2.12-3.0.1-SNAPSHOT-tests.jar!/test-data/postgresql/agg.data ``` ![Screen Shot 2020-04-10 at 9 54 28 PM](https://user-images.githubusercontent.com/9700541/79035702-f03ad900-7b75-11ea-9eee-0c1581a28838.png) ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Pass the Jenkins with SBT and Maven. - [x] Sbt (`Test build #121117` #28186 (comment)) - [x] Maven (`Test build #121118` #28186 (comment)) Closes #28186 from dongjoon-hyun/SPARK-31291. Authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org> (cherry picked from commit b4c438a) Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
…t tables among multiple test cases ### What changes were proposed in this pull request? `SQLQueryTestSuite` spend 35 minutes time to test. I've listed the 10 test cases that took the longest time in the `SQL` module below. Class | Spend time ↑ | Failure | Skip | Pass | Total test case -- | -- | -- | -- | -- | -- SQLQueryTestSuite | 35 minutes | 0 | 1 | 230 | 231 TPCDSQuerySuite | 3 minutes 8 seconds | 0 | 0 | 156 | 156 SQLQuerySuite | 2 minutes 52 seconds | 0 | 0 | 185 | 185 DynamicPartitionPruningSuiteAEOff | 1 minutes 52 seconds | 0 | 0 | 22 | 22 DataFrameFunctionsSuite | 1 minutes 37 seconds | 0 | 0 | 102 | 102 DynamicPartitionPruningSuiteAEOn | 1 minutes 24 seconds | 0 | 0 | 22 | 22 DataFrameSuite | 1 minutes 14 seconds | 0 | 2 | 157 | 159 SubquerySuite | 1 minutes 12 seconds | 0 | 1 | 70 | 71 SingleLevelAggregateHashMapSuite | 1 minutes 1 seconds | 0 | 0 | 50 | 50 DataFrameAggregateSuite | 59 seconds | 0 | 0 | 50 | 50 I checked the code of `SQLQueryTestSuite` and found `SQLQueryTestSuite` load test data repeatedly. This PR will improve the performance of `SQLQueryTestSuite`. The total time run `SQLQueryTestSuite` before and after this PR show below. Before No | Time -- | -- 1 | 20 minutes, 22 seconds 2 | 23 minutes, 21 seconds 3 | 21 minutes, 19 seconds 4 | 22 minutes, 26 seconds 5 | 20 minutes, 8 seconds After No | Time -- | -- 1 | 20 minutes, 52 seconds 2 | 20 minutes, 47 seconds 3 | 20 minutes, 7 seconds 4 | 21 minutes, 10 seconds 5 | 20 minutes, 4 seconds ### Why are the changes needed? Improve the performance of `SQLQueryTestSuite`. ### Does this PR introduce any user-facing change? 'No'. ### How was this patch tested? Jenkins test Closes apache#28060 from beliefer/avoid-load-test-data-repeatedly. Lead-authored-by: gengjiaan <gengjiaan@360.cn> Co-authored-by: beliefer <beliefer@163.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
…a and test tables among multiple test cases ### What changes were proposed in this pull request? This PR is related to apache#28060. `ThriftServerQueryTestSuite` spend 17 minutes time to test. I checked the code and found `ThriftServerQueryTestSuite` load test data repeatedly. I've listed all the test cases order by time with desc in the `hive-thriftserver` module below. Class | Spend time ↑ | Failure | Skip | Pass | Total test case -- | -- | -- | -- | -- | -- ThriftServerQueryTestSuite | 17 minutes | 0 | 15 | 140 | 155 CliSuite | 8 minutes 24 seconds | 0 | 0 | 24 | 24 SparkThriftServerProtocolVersionsSuite | 59 seconds | 0 | 0 | 210 | 210 HiveThriftBinaryServerSuite | 36 seconds | 0 | 1 | 21 | 22 SparkMetadataOperationSuite | 19 seconds | 0 | 0 | 7 | 7 HiveCliSessionStateSuite | 16 seconds | 0 | 0 | 2 | 2 SparkSQLEnvSuite | 16 seconds | 0 | 0 | 1 | 1 HiveThriftHttpServerSuite | 15 seconds | 0 | 0 | 3 | 3 SingleSessionSuite | 14 seconds | 0 | 0 | 3 | 3 JdbcConnectionUriSuite | 2.1 seconds | 0 | 0 | 1 | 1 ThriftServerWithSparkContextSuite | 1.4 seconds | 0 | 0 | 1 | 1 SparkExecuteStatementOperationSuite | 63 millseconds | 0 | 0 | 2 | 2 UISeleniumSuite | -1 millseconds | 0 | 1 | 0 | 1 I checked the code of `ThriftServerQueryTestSuite` and found `ThriftServerQueryTestSuite` load test data repeatedly. This PR will improve the performance of `ThriftServerQueryTestSuite`. Because apache#28060 provides `createTestTables`(https://github.com/apache/spark/blob/e42a3945acd614a26c7941a9eed161b500fb4520/sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala#L574) and `removeTestTables`(https://github.com/apache/spark/blob/e42a3945acd614a26c7941a9eed161b500fb4520/sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala#L666), this PR will still uses them. The total time run `ThriftServerQueryTestSuite` before and after this PR show below. Before No | Time -- | -- 1 | 18 minutes, 8 seconds 2 | 22 minutes, 44 seconds 3 | 17 minutes, 48 seconds 4 | 18 minutes, 30 seconds After No | Time -- | -- 1 | 16 minutes, 11 seconds 2 | 17 minutes, 19 seconds 3 | 18 minutes, 15 seconds 4 | 17 minutes, 27 seconds ### Why are the changes needed? Improve the performance of `ThriftServerQueryTestSuite`. ### Does this PR introduce any user-facing change? 'No'. ### How was this patch tested? Jenkins test Closes apache#28180 from beliefer/avoid-load-thrift-test-data-repeatedly. Authored-by: beliefer <beliefer@163.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
…ftServerQueryTestSuite ### What changes were proposed in this pull request? [SPARK-31291](apache#28060) broke `ThriftServerQueryTestSuite` in Maven environment. This PR fixes it by copying the resource file from jars to local temp file. ### Why are the changes needed? To recover the Jenkins jobs in `master` and `branch-3.0`. - https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-3.0-test-maven-hadoop-2.7-hive-2.3/211/ ``` org.apache.spark.sql.hive.thriftserver.ThriftServerQueryTestSuite *** ABORTED *** ... java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: jar:file:/home/jenkins/workspace/spark-branch-3.0-test-maven-hadoop-2.7-hive-2.3/sql/core/target/ spark-sql_2.12-3.0.1-SNAPSHOT-tests.jar!/test-data/postgresql/agg.data ``` ![Screen Shot 2020-04-10 at 9 54 28 PM](https://user-images.githubusercontent.com/9700541/79035702-f03ad900-7b75-11ea-9eee-0c1581a28838.png) ### Does this PR introduce any user-facing change? No. ### How was this patch tested? Pass the Jenkins with SBT and Maven. - [x] Sbt (`Test build #121117` apache#28186 (comment)) - [x] Maven (`Test build #121118` apache#28186 (comment)) Closes apache#28186 from dongjoon-hyun/SPARK-31291. Authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
What changes were proposed in this pull request?
SQLQueryTestSuite
spend 35 minutes time to test.I've listed the 10 test cases that took the longest time in the
SQL
module below.I checked the code of
SQLQueryTestSuite
and foundSQLQueryTestSuite
load test data repeatedly.This PR will improve the performance of
SQLQueryTestSuite
.The total time run
SQLQueryTestSuite
before and after this PR show below.Before
After
Why are the changes needed?
Improve the performance of
SQLQueryTestSuite
.Does this PR introduce any user-facing change?
'No'.
How was this patch tested?
Jenkins test