[SPARK-51600][CORE] Prepend classes of sql/hive and sql/hive-thriftserver when isTesting || isTestingSql is true #50385
Closed
LuciferYang wants to merge 5 commits intoapache:masterfrom
Closed
[SPARK-51600][CORE] Prepend classes of sql/hive and sql/hive-thriftserver when isTesting || isTestingSql is true #50385LuciferYang wants to merge 5 commits intoapache:masterfrom
sql/hive and sql/hive-thriftserver when isTesting || isTestingSql is true #50385LuciferYang wants to merge 5 commits intoapache:masterfrom
Conversation
Contributor
Author
|
Actually, we can reproduce the issue described in this PR (Pull Request) in GA by change the execution process of the Maven daily test. sql/hive-thriftserver Afterwards, I will turn #50387 into a formal pr to enhance the Maven daily test. |
Contributor
Author
|
cc @HyukjinKwon FYI |
dongjoon-hyun
approved these changes
Mar 26, 2025
Member
dongjoon-hyun
left a comment
There was a problem hiding this comment.
+1, LGTM. Thank you, @LuciferYang .
LuciferYang
added a commit
that referenced
this pull request
Mar 26, 2025
…tserver` when `isTesting || isTestingSql` is true ### What changes were proposed in this pull request? This pr aims to add a condition check for `isTesting || isTestingSql` to `shouldPrePendSparkHive` and `shouldPrePendSparkHiveThriftServer`. When running Maven tests, prepend classes should be performed for "sql/hive" and "sql/hive-thriftserver" modules. ### Why are the changes needed? After SPARK-49534 was merged, when `spark-hive_xxx.jar` is not present in the `assembly/target/scala-2.13/jars` directory, prepend classes will no longer be executed for `sql/hive`. Similar handling has been applied to `sql/hive-thriftserver`. Although this resolves the issue described in #48015, it introduces another problem: When we execute `mvn test`, if the dependent JARs are not pre-collected into the `assembly/target/scala-2.13/jars` directory and we directly run Maven tests on the `sql/hive` and `sql/hive-thriftserver` modules, some tests will fail. Consider the following testing approach: ``` build/mvn clean -Phive -Phive-thriftserver build/mvn clean install -DskipTests -pl sql/hive-thriftserver -am -Phive -Phive-thriftserver build/mvn clean install -pl sql/hive-thriftserver -Phive -Phive-thriftserver build/mvn clean install -pl sql/hive -Phive ``` The tests for the `sql/hive-thriftserver` module *** RUN ABORTED *** due to the following reasons: ``` HiveThriftBinaryServerSuite: 18:48:19.595 ERROR org.apache.spark.sql.hive.thriftserver.HiveThriftBinaryServerSuite: ===================================== HiveThriftServer2Suite failure output ===================================== ### Attempt 0 ### HiveThriftServer2 command line: ArraySeq(../../sbin/start-thriftserver.sh, --master, local, --hiveconf, javax.jdo.option.ConnectionURL=jdbc:derby:;databaseName=/home/runner/work/spark/spark/sql/hive-thriftserver/target/tmp/spark-2bca44a3-c220-485c-b2a4-289262293652;create=true, --hiveconf, hive.metastore.warehouse.dir=/home/runner/work/spark/spark/sql/hive-thriftserver/target... 18:48:22.634 WARN org.apache.spark.sql.hive.thriftserver.HiveThriftBinaryServerSuite: ===== POSSIBLE THREAD LEAK IN SUITE o.a.s.sql.hive.thriftserver.HiveThriftBinaryServerSuite, threads: Thread-10 (daemon=true), Thread-11 (daemon=true) ===== *** RUN ABORTED *** An exception or error caused a run to abort: Future timed out after [3 minutes] java.util.concurrent.TimeoutException: Future timed out after [3 minutes] at scala.concurrent.impl.Promise$DefaultPromise.tryAwait0(Promise.scala:248) at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:261) at org.apache.spark.util.SparkThreadUtils$.awaitResultNoSparkExceptionConversion(SparkThreadUtils.scala:61) at org.apache.spark.util.SparkThreadUtils$.awaitResult(SparkThreadUtils.scala:45) at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:342) at org.apache.spark.sql.hive.thriftserver.HiveThriftServer2TestBase.startThriftServer(HiveThriftServer2Suites.scala:1345) at org.apache.spark.sql.hive.thriftserver.HiveThriftServer2TestBase.$anonfun$beforeAll$4(HiveThriftServer2Suites.scala:1403) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.scala:18) at scala.util.Try$.apply(Try.scala:217) at org.apache.spark.sql.hive.thriftserver.HiveThriftServer2TestBase.$anonfun$beforeAll$3(HiveThriftServer2Suites.scala:1402) ... ``` ``` Error: Failed to load class org.apache.spark.sql.hive.thriftserver.HiveThriftServer2. Failed to load main class org.apache.spark.sql.hive.thriftserver.HiveThriftServer2. ``` `HiveSparkSubmitSuite` will have 15 failed tests due to the following reasons: ``` Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/spark/sql/hive/HiveUtils$ at org.apache.spark.sql.hive.SetMetastoreURLTest$.main(HiveSparkSubmitSuite.scala:390) at org.apache.spark.sql.hive.SetMetastoreURLTest.main(HiveSparkSubmitSuite.scala) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:569) at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:1027) at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:204) at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:227) at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:96) at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1132) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1141) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Caused by: java.lang.ClassNotFoundException: org.apache.spark.sql.hive.HiveUtils$ at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:641) at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:188) at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:525) ... 14 more ``` The reason why the issue is not triggered by the Maven daily test is that a full build is executed before the test, which completes the process of collecting JARs into the `assembly/target/scala-2.13/jars` directory. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? - Pass GitHub Actions - Pass Maven test: https://github.com/LuciferYang/spark/runs/39370781864  - re-check the test in #48015, the changes in this pr will not break it. - Locally test ``` build/mvn clean -Phive -Phive-thriftserver build/mvn clean install -DskipTests -pl sql/hive-thriftserver -am -Phive -Phive-thriftserver build/mvn clean install -pl sql/hive-thriftserver -Phive -Phive-thriftserver build/mvn clean install -pl sql/hive -Phive ``` **sql/hive-thriftserver** ``` Run completed in 12 minutes, 55 seconds. Total number of tests run: 640 Suites: completed 20, aborted 0 Tests: succeeded 640, failed 0, canceled 0, ignored 26, pending 0 All tests passed. ``` **sql/hive** ``` Run completed in 1 hour, 17 minutes, 15 seconds. Total number of tests run: 3987 Suites: completed 148, aborted 0 Tests: succeeded 3987, failed 0, canceled 2, ignored 606, pending 0 All tests passed. ``` ### Was this patch authored or co-authored using generative AI tooling? No Closes #50385 from LuciferYang/SPARK-51600-2. Authored-by: yangjie01 <yangjie01@baidu.com> Signed-off-by: yangjie01 <yangjie01@baidu.com> (cherry picked from commit b5f9a28) Signed-off-by: yangjie01 <yangjie01@baidu.com>
Contributor
Author
|
Merged into master/branch-4.0. Thanks @dongjoon-hyun |
LuciferYang
added a commit
that referenced
this pull request
Mar 27, 2025
…ing in maven daily test ### What changes were proposed in this pull request? During the Maven daily test process, this pr has added cleanup work for the `assembly` module before `mvn test` (except for the `connect` module, as some tests in the `connect-client-jvm` module strongly depend on the completion of the `assembly` module build) to prevent the issue described in SPARK-51600 (#50385) from being unverifiable in the Maven daily test. ### Why are the changes needed? Reduce the dependency of Maven daily test on the completion of the `assembly` module build. ### Does this PR introduce _any_ user-facing change? No, just for maven daily test. ### How was this patch tested? - Pass Github Actions - test with Maven: https://github.com/LuciferYang/spark/runs/39456959866  ### Was this patch authored or co-authored using generative AI tooling? No Closes #50387 from LuciferYang/maven-daily-remove-assembly-before-tests. Lead-authored-by: yangjie01 <yangjie01@baidu.com> Co-authored-by: YangJie <yangjie01@baidu.com> Signed-off-by: yangjie01 <yangjie01@baidu.com>
a0x8o
added a commit
to a0x8o/spark
that referenced
this pull request
Mar 27, 2025
…ing in maven daily test ### What changes were proposed in this pull request? During the Maven daily test process, this pr has added cleanup work for the `assembly` module before `mvn test` (except for the `connect` module, as some tests in the `connect-client-jvm` module strongly depend on the completion of the `assembly` module build) to prevent the issue described in SPARK-51600 (apache/spark#50385) from being unverifiable in the Maven daily test. ### Why are the changes needed? Reduce the dependency of Maven daily test on the completion of the `assembly` module build. ### Does this PR introduce _any_ user-facing change? No, just for maven daily test. ### How was this patch tested? - Pass Github Actions - test with Maven: https://github.com/LuciferYang/spark/runs/39456959866  ### Was this patch authored or co-authored using generative AI tooling? No Closes #50387 from LuciferYang/maven-daily-remove-assembly-before-tests. Lead-authored-by: yangjie01 <yangjie01@baidu.com> Co-authored-by: YangJie <yangjie01@baidu.com> Signed-off-by: yangjie01 <yangjie01@baidu.com>
zifeif2
pushed a commit
to zifeif2/spark
that referenced
this pull request
Nov 14, 2025
…tserver` when `isTesting || isTestingSql` is true ### What changes were proposed in this pull request? This pr aims to add a condition check for `isTesting || isTestingSql` to `shouldPrePendSparkHive` and `shouldPrePendSparkHiveThriftServer`. When running Maven tests, prepend classes should be performed for "sql/hive" and "sql/hive-thriftserver" modules. ### Why are the changes needed? After SPARK-49534 was merged, when `spark-hive_xxx.jar` is not present in the `assembly/target/scala-2.13/jars` directory, prepend classes will no longer be executed for `sql/hive`. Similar handling has been applied to `sql/hive-thriftserver`. Although this resolves the issue described in apache#48015, it introduces another problem: When we execute `mvn test`, if the dependent JARs are not pre-collected into the `assembly/target/scala-2.13/jars` directory and we directly run Maven tests on the `sql/hive` and `sql/hive-thriftserver` modules, some tests will fail. Consider the following testing approach: ``` build/mvn clean -Phive -Phive-thriftserver build/mvn clean install -DskipTests -pl sql/hive-thriftserver -am -Phive -Phive-thriftserver build/mvn clean install -pl sql/hive-thriftserver -Phive -Phive-thriftserver build/mvn clean install -pl sql/hive -Phive ``` The tests for the `sql/hive-thriftserver` module *** RUN ABORTED *** due to the following reasons: ``` HiveThriftBinaryServerSuite: 18:48:19.595 ERROR org.apache.spark.sql.hive.thriftserver.HiveThriftBinaryServerSuite: ===================================== HiveThriftServer2Suite failure output ===================================== ### Attempt 0 ### HiveThriftServer2 command line: ArraySeq(../../sbin/start-thriftserver.sh, --master, local, --hiveconf, javax.jdo.option.ConnectionURL=jdbc:derby:;databaseName=/home/runner/work/spark/spark/sql/hive-thriftserver/target/tmp/spark-2bca44a3-c220-485c-b2a4-289262293652;create=true, --hiveconf, hive.metastore.warehouse.dir=/home/runner/work/spark/spark/sql/hive-thriftserver/target... 18:48:22.634 WARN org.apache.spark.sql.hive.thriftserver.HiveThriftBinaryServerSuite: ===== POSSIBLE THREAD LEAK IN SUITE o.a.s.sql.hive.thriftserver.HiveThriftBinaryServerSuite, threads: Thread-10 (daemon=true), Thread-11 (daemon=true) ===== *** RUN ABORTED *** An exception or error caused a run to abort: Future timed out after [3 minutes] java.util.concurrent.TimeoutException: Future timed out after [3 minutes] at scala.concurrent.impl.Promise$DefaultPromise.tryAwait0(Promise.scala:248) at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:261) at org.apache.spark.util.SparkThreadUtils$.awaitResultNoSparkExceptionConversion(SparkThreadUtils.scala:61) at org.apache.spark.util.SparkThreadUtils$.awaitResult(SparkThreadUtils.scala:45) at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:342) at org.apache.spark.sql.hive.thriftserver.HiveThriftServer2TestBase.startThriftServer(HiveThriftServer2Suites.scala:1345) at org.apache.spark.sql.hive.thriftserver.HiveThriftServer2TestBase.$anonfun$beforeAll$4(HiveThriftServer2Suites.scala:1403) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.scala:18) at scala.util.Try$.apply(Try.scala:217) at org.apache.spark.sql.hive.thriftserver.HiveThriftServer2TestBase.$anonfun$beforeAll$3(HiveThriftServer2Suites.scala:1402) ... ``` ``` Error: Failed to load class org.apache.spark.sql.hive.thriftserver.HiveThriftServer2. Failed to load main class org.apache.spark.sql.hive.thriftserver.HiveThriftServer2. ``` `HiveSparkSubmitSuite` will have 15 failed tests due to the following reasons: ``` Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/spark/sql/hive/HiveUtils$ at org.apache.spark.sql.hive.SetMetastoreURLTest$.main(HiveSparkSubmitSuite.scala:390) at org.apache.spark.sql.hive.SetMetastoreURLTest.main(HiveSparkSubmitSuite.scala) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:569) at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:1027) at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:204) at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:227) at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:96) at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1132) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1141) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Caused by: java.lang.ClassNotFoundException: org.apache.spark.sql.hive.HiveUtils$ at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:641) at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:188) at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:525) ... 14 more ``` The reason why the issue is not triggered by the Maven daily test is that a full build is executed before the test, which completes the process of collecting JARs into the `assembly/target/scala-2.13/jars` directory. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? - Pass GitHub Actions - Pass Maven test: https://github.com/LuciferYang/spark/runs/39370781864  - re-check the test in apache#48015, the changes in this pr will not break it. - Locally test ``` build/mvn clean -Phive -Phive-thriftserver build/mvn clean install -DskipTests -pl sql/hive-thriftserver -am -Phive -Phive-thriftserver build/mvn clean install -pl sql/hive-thriftserver -Phive -Phive-thriftserver build/mvn clean install -pl sql/hive -Phive ``` **sql/hive-thriftserver** ``` Run completed in 12 minutes, 55 seconds. Total number of tests run: 640 Suites: completed 20, aborted 0 Tests: succeeded 640, failed 0, canceled 0, ignored 26, pending 0 All tests passed. ``` **sql/hive** ``` Run completed in 1 hour, 17 minutes, 15 seconds. Total number of tests run: 3987 Suites: completed 148, aborted 0 Tests: succeeded 3987, failed 0, canceled 2, ignored 606, pending 0 All tests passed. ``` ### Was this patch authored or co-authored using generative AI tooling? No Closes apache#50385 from LuciferYang/SPARK-51600-2. Authored-by: yangjie01 <yangjie01@baidu.com> Signed-off-by: yangjie01 <yangjie01@baidu.com> (cherry picked from commit ea561d2) Signed-off-by: yangjie01 <yangjie01@baidu.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.



What changes were proposed in this pull request?
This pr aims to add a condition check for
isTesting || isTestingSqltoshouldPrePendSparkHiveandshouldPrePendSparkHiveThriftServer. When running Maven tests, prepend classes should be performed for "sql/hive" and "sql/hive-thriftserver" modules.Why are the changes needed?
After SPARK-49534 was merged, when
spark-hive_xxx.jaris not present in theassembly/target/scala-2.13/jarsdirectory, prepend classes will no longer be executed forsql/hive. Similar handling has been applied tosql/hive-thriftserver.Although this resolves the issue described in #48015, it introduces another problem:
When we execute
mvn test, if the dependent JARs are not pre-collected into theassembly/target/scala-2.13/jarsdirectory and we directly run Maven tests on thesql/hiveandsql/hive-thriftservermodules, some tests will fail.Consider the following testing approach:
The tests for the
sql/hive-thriftservermodule *** RUN ABORTED *** due to the following reasons:HiveSparkSubmitSuitewill have 15 failed tests due to the following reasons:The reason why the issue is not triggered by the Maven daily test is that a full build is executed before the test, which completes the process of collecting JARs into the
assembly/target/scala-2.13/jarsdirectory.Does this PR introduce any user-facing change?
No
How was this patch tested?
re-check the test in [SPARK-49534][CORE] No longer prepend
sql/hiveandsql/hive-thriftserverwhenspark-hive_xxx.jaris not in the classpath #48015, the changes in this pr will not break it.Locally test
sql/hive-thriftserver
sql/hive
Was this patch authored or co-authored using generative AI tooling?
No