Skip to content

Conversation

LuciferYang
Copy link
Contributor

@LuciferYang LuciferYang commented Jan 20, 2021

What changes were proposed in this pull request?

There is one Java UT error when testing sql/hive module independently in Scala 2.13 after SPARK-33212, the error message as follow:

[ERROR] Tests run: 2, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 20.353 s <<< FAILURE! - in org.apache.spark.sql.hive.JavaDataFrameSuite
[ERROR] org.apache.spark.sql.hive.JavaDataFrameSuite.testUDAF  Time elapsed: 18.548 s  <<< ERROR!
java.lang.NoClassDefFoundError: scala/collection/parallel/TaskSupport
	at org.apache.spark.sql.hive.JavaDataFrameSuite.checkAnswer(JavaDataFrameSuite.java:41)
	at org.apache.spark.sql.hive.JavaDataFrameSuite.testUDAF(JavaDataFrameSuite.java:92)
Caused by: java.lang.ClassNotFoundException: scala.collection.parallel.TaskSupport
	at org.apache.spark.sql.hive.JavaDataFrameSuite.checkAnswer(JavaDataFrameSuite.java:41)
	at org.apache.spark.sql.hive.JavaDataFrameSuite.testUDAF(JavaDataFrameSuite.java:92)

This pr add a Scala-2.13 profile with dependency of scala-parallel-collections_ to sql/hive module to fix the Java UT in Scala 2.13.

Why are the changes needed?

Recover the independent mvn test ability of sql/hive module in Scala 2.13.

Does this PR introduce any user-facing change?

No

How was this patch tested?

  • Pass the Jenkins or GitHub Action
  • Manual test
dev/change-scala-version.sh 2.13

mvn clean install -Dhadoop-3.2 -Phive-2.3 -Phadoop-cloud -Pmesos -Pyarn -Pkinesis-asl -Phive-thriftserver -Pspark-ganglia-lgpl -Pkubernetes -Phive -Pscala-2.13 -pl  sql/hive -am -DskipTests

mvn test -Dhadoop-3.2 -Phive-2.3 -Phadoop-cloud -Pmesos -Pyarn -Pkinesis-asl -Phive-thriftserver -Pspark-ganglia-lgpl -Pkubernetes -Phive -Pscala-2.13 -pl  sql/hive

Before

[ERROR] Tests run: 2, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 18.725 s <<< FAILURE! - in org.apache.spark.sql.hive.JavaDataFrameSuite
[ERROR] org.apache.spark.sql.hive.JavaDataFrameSuite.testUDAF  Time elapsed: 16.853 s  <<< ERROR!
java.lang.NoClassDefFoundError: scala/collection/parallel/TaskSupport
	at org.apache.spark.sql.hive.JavaDataFrameSuite.checkAnswer(JavaDataFrameSuite.java:41)
	at org.apache.spark.sql.hive.JavaDataFrameSuite.testUDAF(JavaDataFrameSuite.java:92)
Caused by: java.lang.ClassNotFoundException: scala.collection.parallel.TaskSupport
	at org.apache.spark.sql.hive.JavaDataFrameSuite.checkAnswer(JavaDataFrameSuite.java:41)
	at org.apache.spark.sql.hive.JavaDataFrameSuite.testUDAF(JavaDataFrameSuite.java:92)

[INFO] Running org.apache.spark.sql.hive.JavaMetastoreDataSourcesSuite
16:15:36.186 WARN org.apache.spark.sql.hive.test.TestHiveExternalCatalog: Couldn't find corresponding Hive SerDe for data source provider org.apache.spark.sql.json. Persisting data source table `default`.`javasavedtable` into Hive metastore in Spark SQL specific format, which is NOT compatible with Hive.
16:15:36.288 WARN org.apache.hadoop.hive.ql.session.SessionState: METASTORE_FILTER_HOOK will be ignored, since hive.security.authorization.manager is set to instance of HiveAuthorizerFactory.
16:15:36.396 WARN org.apache.hadoop.hive.conf.HiveConf: HiveConf of name hive.internal.ss.authz.settings.applied.marker does not exist
16:15:36.397 WARN org.apache.hadoop.hive.conf.HiveConf: HiveConf of name hive.stats.jdbc.timeout does not exist
16:15:36.397 WARN org.apache.hadoop.hive.conf.HiveConf: HiveConf of name hive.stats.retries.wait does not exist
[INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 3.481 s - in org.apache.spark.sql.hive.JavaMetastoreDataSourcesSuite
[INFO] 
[INFO] Results:
[INFO] 
[ERROR] Errors: 
[ERROR]   JavaDataFrameSuite.testUDAF:92->checkAnswer:41 » NoClassDefFound scala/collect...
[INFO] 
[ERROR] Tests run: 3, Failures: 0, Errors: 1, Skipped: 0

After

[INFO] Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 19.287 s - in org.apache.spark.sql.hive.JavaDataFrameSuite
[INFO] Running org.apache.spark.sql.hive.JavaMetastoreDataSourcesSuite
16:12:16.697 WARN org.apache.spark.sql.hive.test.TestHiveExternalCatalog: Couldn't find corresponding Hive SerDe for data source provider org.apache.spark.sql.json. Persisting data source table `default`.`javasavedtable` into Hive metastore in Spark SQL specific format, which is NOT compatible with Hive.
16:12:17.540 WARN org.apache.hadoop.hive.ql.session.SessionState: METASTORE_FILTER_HOOK will be ignored, since hive.security.authorization.manager is set to instance of HiveAuthorizerFactory.
16:12:17.653 WARN org.apache.hadoop.hive.conf.HiveConf: HiveConf of name hive.internal.ss.authz.settings.applied.marker does not exist
16:12:17.653 WARN org.apache.hadoop.hive.conf.HiveConf: HiveConf of name hive.stats.jdbc.timeout does not exist
16:12:17.654 WARN org.apache.hadoop.hive.conf.HiveConf: HiveConf of name hive.stats.retries.wait does not exist
[INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 3.58 s - in org.apache.spark.sql.hive.JavaMetastoreDataSourcesSuite
[INFO] 
[INFO] Results:
[INFO] 
[INFO] Tests run: 3, Failures: 0, Errors: 0, Skipped: 0

@LuciferYang LuciferYang changed the title [SPARK-34176][BUILD][SQL][test-maven] Recover the independent mvn test ability of sql/hive module in Scala 2.13 [WIP][SPARK-34176][BUILD][SQL][test-maven] Recover the independent mvn test ability of sql/hive module in Scala 2.13 Jan 20, 2021
@LuciferYang LuciferYang marked this pull request as draft January 20, 2021 07:49
@LuciferYang LuciferYang changed the title [WIP][SPARK-34176][BUILD][SQL][test-maven] Recover the independent mvn test ability of sql/hive module in Scala 2.13 [WIP][SPARK-34176][BUILD][SQL][test-maven] Restore the independent mvn test ability of sql/hive module in Scala 2.13 Jan 20, 2021
@LuciferYang LuciferYang changed the title [WIP][SPARK-34176][BUILD][SQL][test-maven] Restore the independent mvn test ability of sql/hive module in Scala 2.13 [SPARK-34176][BUILD][SQL][test-maven] Restore the independent mvn test ability of sql/hive module in Scala 2.13 Jan 20, 2021
@LuciferYang LuciferYang marked this pull request as ready for review January 20, 2021 08:18
@SparkQA
Copy link

SparkQA commented Jan 20, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38842/

@SparkQA
Copy link

SparkQA commented Jan 20, 2021

Test build #134256 has finished for PR 31259 at commit ad3cdb3.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 20, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38842/

@HyukjinKwon
Copy link
Member

retest this please

@SparkQA
Copy link

SparkQA commented Jan 20, 2021

Test build #134267 has started for PR 31259 at commit ad3cdb3.

@SparkQA
Copy link

SparkQA commented Jan 20, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38853/

@SparkQA
Copy link

SparkQA commented Jan 20, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38853/

@dongjoon-hyun
Copy link
Member

Thank you, @LuciferYang . Actually, the last Jenkins failed at R unit testing.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM. Thank you, @LuciferYang and @HyukjinKwon .
Merged to master.

cc @sunchao

@dongjoon-hyun dongjoon-hyun changed the title [SPARK-34176][BUILD][SQL][test-maven] Restore the independent mvn test ability of sql/hive module in Scala 2.13 [SPARK-34176][BUILD] Restore the independent mvn test ability of sql/hive module in Scala 2.13 Jan 20, 2021
@LuciferYang
Copy link
Contributor Author

thx ~ @HyukjinKwon and @dongjoon-hyun

skestle pushed a commit to skestle/spark that referenced this pull request Feb 3, 2021
…hive module in Scala 2.13

### What changes were proposed in this pull request?
There is one Java UT error when testing sql/hive module independently in Scala 2.13 after SPARK-33212,  the error message as follow:

```
[ERROR] Tests run: 2, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 20.353 s <<< FAILURE! - in org.apache.spark.sql.hive.JavaDataFrameSuite
[ERROR] org.apache.spark.sql.hive.JavaDataFrameSuite.testUDAF  Time elapsed: 18.548 s  <<< ERROR!
java.lang.NoClassDefFoundError: scala/collection/parallel/TaskSupport
	at org.apache.spark.sql.hive.JavaDataFrameSuite.checkAnswer(JavaDataFrameSuite.java:41)
	at org.apache.spark.sql.hive.JavaDataFrameSuite.testUDAF(JavaDataFrameSuite.java:92)
Caused by: java.lang.ClassNotFoundException: scala.collection.parallel.TaskSupport
	at org.apache.spark.sql.hive.JavaDataFrameSuite.checkAnswer(JavaDataFrameSuite.java:41)
	at org.apache.spark.sql.hive.JavaDataFrameSuite.testUDAF(JavaDataFrameSuite.java:92)
```

This pr add a Scala-2.13 profile with dependency of `scala-parallel-collections_` to `sql/hive` module to fix the Java UT in Scala 2.13.

### Why are the changes needed?
Recover the independent mvn test ability of sql/hive module in Scala 2.13.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?

- Pass the Jenkins or GitHub Action
- Manual test

```
dev/change-scala-version.sh 2.13

mvn clean install -Dhadoop-3.2 -Phive-2.3 -Phadoop-cloud -Pmesos -Pyarn -Pkinesis-asl -Phive-thriftserver -Pspark-ganglia-lgpl -Pkubernetes -Phive -Pscala-2.13 -pl  sql/hive -am -DskipTests

mvn test -Dhadoop-3.2 -Phive-2.3 -Phadoop-cloud -Pmesos -Pyarn -Pkinesis-asl -Phive-thriftserver -Pspark-ganglia-lgpl -Pkubernetes -Phive -Pscala-2.13 -pl  sql/hive
```

**Before**

```
[ERROR] Tests run: 2, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 18.725 s <<< FAILURE! - in org.apache.spark.sql.hive.JavaDataFrameSuite
[ERROR] org.apache.spark.sql.hive.JavaDataFrameSuite.testUDAF  Time elapsed: 16.853 s  <<< ERROR!
java.lang.NoClassDefFoundError: scala/collection/parallel/TaskSupport
	at org.apache.spark.sql.hive.JavaDataFrameSuite.checkAnswer(JavaDataFrameSuite.java:41)
	at org.apache.spark.sql.hive.JavaDataFrameSuite.testUDAF(JavaDataFrameSuite.java:92)
Caused by: java.lang.ClassNotFoundException: scala.collection.parallel.TaskSupport
	at org.apache.spark.sql.hive.JavaDataFrameSuite.checkAnswer(JavaDataFrameSuite.java:41)
	at org.apache.spark.sql.hive.JavaDataFrameSuite.testUDAF(JavaDataFrameSuite.java:92)

[INFO] Running org.apache.spark.sql.hive.JavaMetastoreDataSourcesSuite
16:15:36.186 WARN org.apache.spark.sql.hive.test.TestHiveExternalCatalog: Couldn't find corresponding Hive SerDe for data source provider org.apache.spark.sql.json. Persisting data source table `default`.`javasavedtable` into Hive metastore in Spark SQL specific format, which is NOT compatible with Hive.
16:15:36.288 WARN org.apache.hadoop.hive.ql.session.SessionState: METASTORE_FILTER_HOOK will be ignored, since hive.security.authorization.manager is set to instance of HiveAuthorizerFactory.
16:15:36.396 WARN org.apache.hadoop.hive.conf.HiveConf: HiveConf of name hive.internal.ss.authz.settings.applied.marker does not exist
16:15:36.397 WARN org.apache.hadoop.hive.conf.HiveConf: HiveConf of name hive.stats.jdbc.timeout does not exist
16:15:36.397 WARN org.apache.hadoop.hive.conf.HiveConf: HiveConf of name hive.stats.retries.wait does not exist
[INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 3.481 s - in org.apache.spark.sql.hive.JavaMetastoreDataSourcesSuite
[INFO]
[INFO] Results:
[INFO]
[ERROR] Errors:
[ERROR]   JavaDataFrameSuite.testUDAF:92->checkAnswer:41 » NoClassDefFound scala/collect...
[INFO]
[ERROR] Tests run: 3, Failures: 0, Errors: 1, Skipped: 0
```

**After**

```
[INFO] Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 19.287 s - in org.apache.spark.sql.hive.JavaDataFrameSuite
[INFO] Running org.apache.spark.sql.hive.JavaMetastoreDataSourcesSuite
16:12:16.697 WARN org.apache.spark.sql.hive.test.TestHiveExternalCatalog: Couldn't find corresponding Hive SerDe for data source provider org.apache.spark.sql.json. Persisting data source table `default`.`javasavedtable` into Hive metastore in Spark SQL specific format, which is NOT compatible with Hive.
16:12:17.540 WARN org.apache.hadoop.hive.ql.session.SessionState: METASTORE_FILTER_HOOK will be ignored, since hive.security.authorization.manager is set to instance of HiveAuthorizerFactory.
16:12:17.653 WARN org.apache.hadoop.hive.conf.HiveConf: HiveConf of name hive.internal.ss.authz.settings.applied.marker does not exist
16:12:17.653 WARN org.apache.hadoop.hive.conf.HiveConf: HiveConf of name hive.stats.jdbc.timeout does not exist
16:12:17.654 WARN org.apache.hadoop.hive.conf.HiveConf: HiveConf of name hive.stats.retries.wait does not exist
[INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 3.58 s - in org.apache.spark.sql.hive.JavaMetastoreDataSourcesSuite
[INFO]
[INFO] Results:
[INFO]
[INFO] Tests run: 3, Failures: 0, Errors: 0, Skipped: 0
```

Closes apache#31259 from LuciferYang/SPARK-34176.

Authored-by: yangjie01 <yangjie01@baidu.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
@LuciferYang LuciferYang deleted the SPARK-34176 branch June 6, 2022 03:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants