Skip to content

[SPARK-51600][CORE] Prepend classes of sql/hive and sql/hive-thriftserver when isTesting || isTestingSql is true #50385

Closed
LuciferYang wants to merge 5 commits intoapache:masterfrom
LuciferYang:SPARK-51600-2
Closed

[SPARK-51600][CORE] Prepend classes of sql/hive and sql/hive-thriftserver when isTesting || isTestingSql is true #50385
LuciferYang wants to merge 5 commits intoapache:masterfrom
LuciferYang:SPARK-51600-2

Conversation

@LuciferYang
Copy link
Copy Markdown
Contributor

@LuciferYang LuciferYang commented Mar 25, 2025

What changes were proposed in this pull request?

This pr aims to add a condition check for isTesting || isTestingSql to shouldPrePendSparkHive and shouldPrePendSparkHiveThriftServer. When running Maven tests, prepend classes should be performed for "sql/hive" and "sql/hive-thriftserver" modules.

Why are the changes needed?

After SPARK-49534 was merged, when spark-hive_xxx.jar is not present in the assembly/target/scala-2.13/jars directory, prepend classes will no longer be executed for sql/hive. Similar handling has been applied to sql/hive-thriftserver.

Although this resolves the issue described in #48015, it introduces another problem:

When we execute mvn test, if the dependent JARs are not pre-collected into the assembly/target/scala-2.13/jars directory and we directly run Maven tests on the sql/hive and sql/hive-thriftserver modules, some tests will fail.

Consider the following testing approach:

build/mvn clean -Phive -Phive-thriftserver
build/mvn clean install -DskipTests -pl sql/hive-thriftserver -am -Phive -Phive-thriftserver
build/mvn clean install -pl sql/hive-thriftserver -Phive -Phive-thriftserver
build/mvn clean install -pl sql/hive -Phive   

The tests for the sql/hive-thriftserver module *** RUN ABORTED *** due to the following reasons:

HiveThriftBinaryServerSuite:
18:48:19.595 ERROR org.apache.spark.sql.hive.thriftserver.HiveThriftBinaryServerSuite: 
=====================================
HiveThriftServer2Suite failure output
=====================================

### Attempt 0 ###
HiveThriftServer2 command line: ArraySeq(../../sbin/start-thriftserver.sh, --master, local, --hiveconf, javax.jdo.option.ConnectionURL=jdbc:derby:;databaseName=/home/runner/work/spark/spark/sql/hive-thriftserver/target/tmp/spark-2bca44a3-c220-485c-b2a4-289262293652;create=true, --hiveconf, hive.metastore.warehouse.dir=/home/runner/work/spark/spark/sql/hive-thriftserver/target...

18:48:22.634 WARN org.apache.spark.sql.hive.thriftserver.HiveThriftBinaryServerSuite: 

===== POSSIBLE THREAD LEAK IN SUITE o.a.s.sql.hive.thriftserver.HiveThriftBinaryServerSuite, threads: Thread-10 (daemon=true), Thread-11 (daemon=true) =====


*** RUN ABORTED ***
An exception or error caused a run to abort: Future timed out after [3 minutes] 
  java.util.concurrent.TimeoutException: Future timed out after [3 minutes]
  at scala.concurrent.impl.Promise$DefaultPromise.tryAwait0(Promise.scala:248)
  at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:261)
  at org.apache.spark.util.SparkThreadUtils$.awaitResultNoSparkExceptionConversion(SparkThreadUtils.scala:61)
  at org.apache.spark.util.SparkThreadUtils$.awaitResult(SparkThreadUtils.scala:45)
  at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:342)
  at org.apache.spark.sql.hive.thriftserver.HiveThriftServer2TestBase.startThriftServer(HiveThriftServer2Suites.scala:1345)
  at org.apache.spark.sql.hive.thriftserver.HiveThriftServer2TestBase.$anonfun$beforeAll$4(HiveThriftServer2Suites.scala:1403)
  at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.scala:18)
  at scala.util.Try$.apply(Try.scala:217)
  at org.apache.spark.sql.hive.thriftserver.HiveThriftServer2TestBase.$anonfun$beforeAll$3(HiveThriftServer2Suites.scala:1402)
  ...

Error: Failed to load class org.apache.spark.sql.hive.thriftserver.HiveThriftServer2.
Failed to load main class org.apache.spark.sql.hive.thriftserver.HiveThriftServer2.

HiveSparkSubmitSuite will have 15 failed tests due to the following reasons:

Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/spark/sql/hive/HiveUtils$
  	at org.apache.spark.sql.hive.SetMetastoreURLTest$.main(HiveSparkSubmitSuite.scala:390)
  	at org.apache.spark.sql.hive.SetMetastoreURLTest.main(HiveSparkSubmitSuite.scala)
  	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
  	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  	at java.base/java.lang.reflect.Method.invoke(Method.java:569)
  	at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
  	at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:1027)
  	at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:204)
  	at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:227)
  	at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:96)
  	at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1132)
  	at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1141)
  	at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
  Caused by: java.lang.ClassNotFoundException: org.apache.spark.sql.hive.HiveUtils$
  	at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:641)
  	at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:188)
  	at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:525)
  	... 14 more

The reason why the issue is not triggered by the Maven daily test is that a full build is executed before the test, which completes the process of collecting JARs into the assembly/target/scala-2.13/jars directory.

Does this PR introduce any user-facing change?

No

How was this patch tested?

image

build/mvn clean -Phive -Phive-thriftserver
build/mvn clean install -DskipTests -pl sql/hive-thriftserver -am -Phive -Phive-thriftserver
build/mvn clean install -pl sql/hive-thriftserver -Phive -Phive-thriftserver
build/mvn clean install -pl sql/hive -Phive   

sql/hive-thriftserver

Run completed in 12 minutes, 55 seconds.
Total number of tests run: 640
Suites: completed 20, aborted 0
Tests: succeeded 640, failed 0, canceled 0, ignored 26, pending 0
All tests passed.

sql/hive

Run completed in 1 hour, 17 minutes, 15 seconds.
Total number of tests run: 3987
Suites: completed 148, aborted 0
Tests: succeeded 3987, failed 0, canceled 2, ignored 606, pending 0
All tests passed.

Was this patch authored or co-authored using generative AI tooling?

No

@LuciferYang LuciferYang marked this pull request as draft March 25, 2025 13:03
@github-actions github-actions bot added the INFRA label Mar 25, 2025
@github-actions github-actions bot removed the INFRA label Mar 25, 2025
@LuciferYang LuciferYang marked this pull request as ready for review March 26, 2025 03:04
@LuciferYang
Copy link
Copy Markdown
Contributor Author

LuciferYang commented Mar 26, 2025

Actually, we can reproduce the issue described in this PR (Pull Request) in GA by change the execution process of the Maven daily test.

image

sql/hive-thriftserver

image

image

Afterwards, I will turn #50387 into a formal pr to enhance the Maven daily test.

@LuciferYang
Copy link
Copy Markdown
Contributor Author

cc @HyukjinKwon FYI

Copy link
Copy Markdown
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM. Thank you, @LuciferYang .

LuciferYang added a commit that referenced this pull request Mar 26, 2025
…tserver` when `isTesting || isTestingSql` is true

### What changes were proposed in this pull request?
This pr aims to add a condition check for `isTesting || isTestingSql` to `shouldPrePendSparkHive` and `shouldPrePendSparkHiveThriftServer`.  When running Maven tests, prepend classes should be performed for "sql/hive" and "sql/hive-thriftserver" modules.

### Why are the changes needed?
After SPARK-49534 was merged, when `spark-hive_xxx.jar` is not present in the `assembly/target/scala-2.13/jars` directory, prepend classes will no longer be executed for `sql/hive`. Similar handling has been applied to `sql/hive-thriftserver`.

Although this resolves the issue described in #48015, it introduces another problem:

When we execute `mvn test`, if the dependent JARs are not pre-collected into the `assembly/target/scala-2.13/jars` directory and we directly run Maven tests on the `sql/hive` and `sql/hive-thriftserver` modules, some tests will fail.

Consider the following testing approach:

```
build/mvn clean -Phive -Phive-thriftserver
build/mvn clean install -DskipTests -pl sql/hive-thriftserver -am -Phive -Phive-thriftserver
build/mvn clean install -pl sql/hive-thriftserver -Phive -Phive-thriftserver
build/mvn clean install -pl sql/hive -Phive
```

The tests for the `sql/hive-thriftserver` module  *** RUN ABORTED *** due to the following reasons:

```
HiveThriftBinaryServerSuite:
18:48:19.595 ERROR org.apache.spark.sql.hive.thriftserver.HiveThriftBinaryServerSuite:
=====================================
HiveThriftServer2Suite failure output
=====================================

### Attempt 0 ###
HiveThriftServer2 command line: ArraySeq(../../sbin/start-thriftserver.sh, --master, local, --hiveconf, javax.jdo.option.ConnectionURL=jdbc:derby:;databaseName=/home/runner/work/spark/spark/sql/hive-thriftserver/target/tmp/spark-2bca44a3-c220-485c-b2a4-289262293652;create=true, --hiveconf, hive.metastore.warehouse.dir=/home/runner/work/spark/spark/sql/hive-thriftserver/target...

18:48:22.634 WARN org.apache.spark.sql.hive.thriftserver.HiveThriftBinaryServerSuite:

===== POSSIBLE THREAD LEAK IN SUITE o.a.s.sql.hive.thriftserver.HiveThriftBinaryServerSuite, threads: Thread-10 (daemon=true), Thread-11 (daemon=true) =====

*** RUN ABORTED ***
An exception or error caused a run to abort: Future timed out after [3 minutes]
  java.util.concurrent.TimeoutException: Future timed out after [3 minutes]
  at scala.concurrent.impl.Promise$DefaultPromise.tryAwait0(Promise.scala:248)
  at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:261)
  at org.apache.spark.util.SparkThreadUtils$.awaitResultNoSparkExceptionConversion(SparkThreadUtils.scala:61)
  at org.apache.spark.util.SparkThreadUtils$.awaitResult(SparkThreadUtils.scala:45)
  at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:342)
  at org.apache.spark.sql.hive.thriftserver.HiveThriftServer2TestBase.startThriftServer(HiveThriftServer2Suites.scala:1345)
  at org.apache.spark.sql.hive.thriftserver.HiveThriftServer2TestBase.$anonfun$beforeAll$4(HiveThriftServer2Suites.scala:1403)
  at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.scala:18)
  at scala.util.Try$.apply(Try.scala:217)
  at org.apache.spark.sql.hive.thriftserver.HiveThriftServer2TestBase.$anonfun$beforeAll$3(HiveThriftServer2Suites.scala:1402)
  ...

```

```
Error: Failed to load class org.apache.spark.sql.hive.thriftserver.HiveThriftServer2.
Failed to load main class org.apache.spark.sql.hive.thriftserver.HiveThriftServer2.
```

`HiveSparkSubmitSuite` will have 15 failed tests due to the following reasons:

```
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/spark/sql/hive/HiveUtils$
  	at org.apache.spark.sql.hive.SetMetastoreURLTest$.main(HiveSparkSubmitSuite.scala:390)
  	at org.apache.spark.sql.hive.SetMetastoreURLTest.main(HiveSparkSubmitSuite.scala)
  	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
  	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  	at java.base/java.lang.reflect.Method.invoke(Method.java:569)
  	at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
  	at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:1027)
  	at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:204)
  	at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:227)
  	at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:96)
  	at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1132)
  	at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1141)
  	at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
  Caused by: java.lang.ClassNotFoundException: org.apache.spark.sql.hive.HiveUtils$
  	at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:641)
  	at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:188)
  	at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:525)
  	... 14 more

```

The reason why the issue is not triggered by the Maven daily test is that a full build is executed before the test, which completes the process of collecting JARs into the `assembly/target/scala-2.13/jars` directory.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
- Pass GitHub Actions
- Pass Maven test: https://github.com/LuciferYang/spark/runs/39370781864

![image](https://github.com/user-attachments/assets/a8ad2c08-b970-4ccf-81c5-b98430a0ab4a)

- re-check the test in #48015, the changes in this pr will not break it.

- Locally test

```
build/mvn clean -Phive -Phive-thriftserver
build/mvn clean install -DskipTests -pl sql/hive-thriftserver -am -Phive -Phive-thriftserver
build/mvn clean install -pl sql/hive-thriftserver -Phive -Phive-thriftserver
build/mvn clean install -pl sql/hive -Phive
```

**sql/hive-thriftserver**

```
Run completed in 12 minutes, 55 seconds.
Total number of tests run: 640
Suites: completed 20, aborted 0
Tests: succeeded 640, failed 0, canceled 0, ignored 26, pending 0
All tests passed.
```

**sql/hive**

```
Run completed in 1 hour, 17 minutes, 15 seconds.
Total number of tests run: 3987
Suites: completed 148, aborted 0
Tests: succeeded 3987, failed 0, canceled 2, ignored 606, pending 0
All tests passed.
```

### Was this patch authored or co-authored using generative AI tooling?
No

Closes #50385 from LuciferYang/SPARK-51600-2.

Authored-by: yangjie01 <yangjie01@baidu.com>
Signed-off-by: yangjie01 <yangjie01@baidu.com>
(cherry picked from commit b5f9a28)
Signed-off-by: yangjie01 <yangjie01@baidu.com>
@LuciferYang
Copy link
Copy Markdown
Contributor Author

Merged into master/branch-4.0. Thanks @dongjoon-hyun

LuciferYang added a commit that referenced this pull request Mar 27, 2025
…ing in maven daily test

### What changes were proposed in this pull request?
During the Maven daily test process, this pr has added cleanup work for the `assembly` module before `mvn test` (except for the `connect` module, as some tests in the `connect-client-jvm` module strongly depend on the completion of the `assembly` module build) to prevent the issue described in SPARK-51600 (#50385) from being unverifiable in the Maven daily test.

### Why are the changes needed?
Reduce the dependency of Maven daily test on the completion of the `assembly` module build.

### Does this PR introduce _any_ user-facing change?
No, just for maven daily test.

### How was this patch tested?
- Pass Github Actions
- test with Maven:  https://github.com/LuciferYang/spark/runs/39456959866

![image](https://github.com/user-attachments/assets/e6dd3d50-acee-4991-b2e6-a888412564cf)

### Was this patch authored or co-authored using generative AI tooling?
No

Closes #50387 from LuciferYang/maven-daily-remove-assembly-before-tests.

Lead-authored-by: yangjie01 <yangjie01@baidu.com>
Co-authored-by: YangJie <yangjie01@baidu.com>
Signed-off-by: yangjie01 <yangjie01@baidu.com>
a0x8o added a commit to a0x8o/spark that referenced this pull request Mar 27, 2025
…ing in maven daily test

### What changes were proposed in this pull request?
During the Maven daily test process, this pr has added cleanup work for the `assembly` module before `mvn test` (except for the `connect` module, as some tests in the `connect-client-jvm` module strongly depend on the completion of the `assembly` module build) to prevent the issue described in SPARK-51600 (apache/spark#50385) from being unverifiable in the Maven daily test.

### Why are the changes needed?
Reduce the dependency of Maven daily test on the completion of the `assembly` module build.

### Does this PR introduce _any_ user-facing change?
No, just for maven daily test.

### How was this patch tested?
- Pass Github Actions
- test with Maven:  https://github.com/LuciferYang/spark/runs/39456959866

![image](https://github.com/user-attachments/assets/e6dd3d50-acee-4991-b2e6-a888412564cf)

### Was this patch authored or co-authored using generative AI tooling?
No

Closes #50387 from LuciferYang/maven-daily-remove-assembly-before-tests.

Lead-authored-by: yangjie01 <yangjie01@baidu.com>
Co-authored-by: YangJie <yangjie01@baidu.com>
Signed-off-by: yangjie01 <yangjie01@baidu.com>
zifeif2 pushed a commit to zifeif2/spark that referenced this pull request Nov 14, 2025
…tserver` when `isTesting || isTestingSql` is true

### What changes were proposed in this pull request?
This pr aims to add a condition check for `isTesting || isTestingSql` to `shouldPrePendSparkHive` and `shouldPrePendSparkHiveThriftServer`.  When running Maven tests, prepend classes should be performed for "sql/hive" and "sql/hive-thriftserver" modules.

### Why are the changes needed?
After SPARK-49534 was merged, when `spark-hive_xxx.jar` is not present in the `assembly/target/scala-2.13/jars` directory, prepend classes will no longer be executed for `sql/hive`. Similar handling has been applied to `sql/hive-thriftserver`.

Although this resolves the issue described in apache#48015, it introduces another problem:

When we execute `mvn test`, if the dependent JARs are not pre-collected into the `assembly/target/scala-2.13/jars` directory and we directly run Maven tests on the `sql/hive` and `sql/hive-thriftserver` modules, some tests will fail.

Consider the following testing approach:

```
build/mvn clean -Phive -Phive-thriftserver
build/mvn clean install -DskipTests -pl sql/hive-thriftserver -am -Phive -Phive-thriftserver
build/mvn clean install -pl sql/hive-thriftserver -Phive -Phive-thriftserver
build/mvn clean install -pl sql/hive -Phive
```

The tests for the `sql/hive-thriftserver` module  *** RUN ABORTED *** due to the following reasons:

```
HiveThriftBinaryServerSuite:
18:48:19.595 ERROR org.apache.spark.sql.hive.thriftserver.HiveThriftBinaryServerSuite:
=====================================
HiveThriftServer2Suite failure output
=====================================

### Attempt 0 ###
HiveThriftServer2 command line: ArraySeq(../../sbin/start-thriftserver.sh, --master, local, --hiveconf, javax.jdo.option.ConnectionURL=jdbc:derby:;databaseName=/home/runner/work/spark/spark/sql/hive-thriftserver/target/tmp/spark-2bca44a3-c220-485c-b2a4-289262293652;create=true, --hiveconf, hive.metastore.warehouse.dir=/home/runner/work/spark/spark/sql/hive-thriftserver/target...

18:48:22.634 WARN org.apache.spark.sql.hive.thriftserver.HiveThriftBinaryServerSuite:

===== POSSIBLE THREAD LEAK IN SUITE o.a.s.sql.hive.thriftserver.HiveThriftBinaryServerSuite, threads: Thread-10 (daemon=true), Thread-11 (daemon=true) =====

*** RUN ABORTED ***
An exception or error caused a run to abort: Future timed out after [3 minutes]
  java.util.concurrent.TimeoutException: Future timed out after [3 minutes]
  at scala.concurrent.impl.Promise$DefaultPromise.tryAwait0(Promise.scala:248)
  at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:261)
  at org.apache.spark.util.SparkThreadUtils$.awaitResultNoSparkExceptionConversion(SparkThreadUtils.scala:61)
  at org.apache.spark.util.SparkThreadUtils$.awaitResult(SparkThreadUtils.scala:45)
  at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:342)
  at org.apache.spark.sql.hive.thriftserver.HiveThriftServer2TestBase.startThriftServer(HiveThriftServer2Suites.scala:1345)
  at org.apache.spark.sql.hive.thriftserver.HiveThriftServer2TestBase.$anonfun$beforeAll$4(HiveThriftServer2Suites.scala:1403)
  at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.scala:18)
  at scala.util.Try$.apply(Try.scala:217)
  at org.apache.spark.sql.hive.thriftserver.HiveThriftServer2TestBase.$anonfun$beforeAll$3(HiveThriftServer2Suites.scala:1402)
  ...

```

```
Error: Failed to load class org.apache.spark.sql.hive.thriftserver.HiveThriftServer2.
Failed to load main class org.apache.spark.sql.hive.thriftserver.HiveThriftServer2.
```

`HiveSparkSubmitSuite` will have 15 failed tests due to the following reasons:

```
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/spark/sql/hive/HiveUtils$
  	at org.apache.spark.sql.hive.SetMetastoreURLTest$.main(HiveSparkSubmitSuite.scala:390)
  	at org.apache.spark.sql.hive.SetMetastoreURLTest.main(HiveSparkSubmitSuite.scala)
  	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
  	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  	at java.base/java.lang.reflect.Method.invoke(Method.java:569)
  	at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
  	at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:1027)
  	at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:204)
  	at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:227)
  	at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:96)
  	at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1132)
  	at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1141)
  	at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
  Caused by: java.lang.ClassNotFoundException: org.apache.spark.sql.hive.HiveUtils$
  	at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:641)
  	at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:188)
  	at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:525)
  	... 14 more

```

The reason why the issue is not triggered by the Maven daily test is that a full build is executed before the test, which completes the process of collecting JARs into the `assembly/target/scala-2.13/jars` directory.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
- Pass GitHub Actions
- Pass Maven test: https://github.com/LuciferYang/spark/runs/39370781864

![image](https://github.com/user-attachments/assets/a8ad2c08-b970-4ccf-81c5-b98430a0ab4a)

- re-check the test in apache#48015, the changes in this pr will not break it.

- Locally test

```
build/mvn clean -Phive -Phive-thriftserver
build/mvn clean install -DskipTests -pl sql/hive-thriftserver -am -Phive -Phive-thriftserver
build/mvn clean install -pl sql/hive-thriftserver -Phive -Phive-thriftserver
build/mvn clean install -pl sql/hive -Phive
```

**sql/hive-thriftserver**

```
Run completed in 12 minutes, 55 seconds.
Total number of tests run: 640
Suites: completed 20, aborted 0
Tests: succeeded 640, failed 0, canceled 0, ignored 26, pending 0
All tests passed.
```

**sql/hive**

```
Run completed in 1 hour, 17 minutes, 15 seconds.
Total number of tests run: 3987
Suites: completed 148, aborted 0
Tests: succeeded 3987, failed 0, canceled 2, ignored 606, pending 0
All tests passed.
```

### Was this patch authored or co-authored using generative AI tooling?
No

Closes apache#50385 from LuciferYang/SPARK-51600-2.

Authored-by: yangjie01 <yangjie01@baidu.com>
Signed-off-by: yangjie01 <yangjie01@baidu.com>
(cherry picked from commit ea561d2)
Signed-off-by: yangjie01 <yangjie01@baidu.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants