Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-47118][BUILD][CORE][SQL][UI] Migrate from Jetty 10 to Jetty 11 #45154

Closed
wants to merge 38 commits into from

Conversation

HiuKwok
Copy link
Contributor

@HiuKwok HiuKwok commented Feb 18, 2024

What changes were proposed in this pull request?

This is an upgrade ticket to bump the Jetty version from 10 to 11, and the Jersey version from 2 to 3, so the project can gradually move away from Javax.servlet package and adopt the Jakrata standard, code changes on a high level involves:

  1. Bump jakarta.servlet-api from 4 to 5 in order to adopt the new namespace.

  2. Re-introduce the javax.servlet-api and jaxb-api jars, as Hive-related Jars are using old servelet reference internally, or else it will throw classNotFound during test and runtime, also this makes us decouple the HIve upgrade from this MR.

  3. Rewrite the Tservlet class from LIbThrift with alternative namespace Jakrata instead of Javax, again this helps us to decouple the LIbThrift upgrade task from this MR, or else we will need to wait for Hive 4.

  4. Update MimaExclude.scala to exclude the breaking compatibility that introduced by javax -> jakrata.

  5. Update Scalac option to prevent the enum scan on third-party jar org.dmg.pmml and its failed the build.

  6. Update ALL internal servlet implementation from Javax to Jakrata.

Why are the changes needed?

To the Spark one step closer to the latest version of Jetty and Jersey, to receive up-to-date security fixes.

Does this PR introduce any user-facing change?

No

How was this patch tested?

CI build & Unit test

Was this patch authored or co-authored using generative AI tooling?

No

@dongjoon-hyun
Copy link
Member

Thank you for making this effort, @HiuKwok . Could you resolve the conflicts?

@HiuKwok
Copy link
Contributor Author

HiuKwok commented Feb 19, 2024

@dongjoon-hyun done, awaiting for the CI build.

@HiuKwok
Copy link
Contributor Author

HiuKwok commented Feb 19, 2024

Checking on the failed tests

@HiuKwok
Copy link
Contributor Author

HiuKwok commented Feb 19, 2024

The MR is ready for review.

@dongjoon-hyun
Copy link
Member

Thank you, @HiuKwok !

pom.xml Show resolved Hide resolved
pom.xml Show resolved Hide resolved
"-Wconf:cat=deprecation&msg=it will become a keyword in Scala 3:e"
"-Wconf:cat=deprecation&msg=it will become a keyword in Scala 3:e",
// SPARK-46938 to prevent enum scan on pmml-model, under spark-mllib module.
"-Wconf:cat=other&site=org.dmg.pmml.*:w"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this? It seems that this PR only adds the following.

    <dependency>
      <groupId>javax.xml.bind</groupId>
      <artifactId>jaxb-api</artifactId>
    </dependency>

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is the result of the following removal, can we proceed this in a separate PR first, @HiuKwok ?

- ExclusionRule("javax.servlet", "javax.servlet-api"),

Copy link
Contributor Author

@HiuKwok HiuKwok Feb 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1. jaxb-api
I will need to double-check whatever is caused by the jaxb-api or not and I will get back to you.
FYI, this is the error I faced earlier if that helps.

[error] While parsing annotations in /home/runner/.cache/coursier/v1/https/maven-central.storage-download.googleapis.com/maven2/org/jpmml/pmml-model/1.4.8/pmml-model-1.4.8.jar(org/dmg/pmml/regression/RegressionModel.class), could not find FIELD in enum <none>.
[error] This is likely due to an implementation restriction: an annotation argument cannot refer to a member of the annotated class (scala/bug#7014).
[error] Applicable -Wconf / @nowarn filters for this fatal warning: msg=<part of the message>, cat=other, site=org.dmg.pmml.regression.RegressionModel
[error] While parsing annotations in /home/runner/.cache/coursier/v1/https/maven-central.storage-download.googleapis.com/maven2/org/jpmml/pmml-model/1.4.8/pmml-model-1.4.8.jar(org/dmg/pmml/DataField.class), could not find FIELD in enum <none>.
[error] This is likely due to an implementation restriction: an annotation argument cannot refer to a member of the annotated class (scala/bug#7014).
[error] Applicable -Wconf / @nowarn filters for this fatal warning: msg=<part of the message>, cat=other, site=org.dmg.pmml.DataField
[error] While parsing annotations in /home/runner/.cache/coursier/v1/https/maven-central.storage-download.googleapis.com/maven2/org/jpmml/pmml-model/1.4.8/pmml-model-1.4.8.jar(org/dmg/pmml/regression/RegressionTable.class), could not find FIELD in enum <none>.
[error] This is likely due to an implementation restriction: an annotation argument cannot refer to a member of the annotated class (scala/bug#7014).
[error] Applicable -Wconf / @nowarn filters for this fatal warning: msg=<part of the message>, cat=other, site=org.dmg.pmml.regression.RegressionTable
[error] While parsing annotations in /home/runner/.cache/coursier/v1/https/maven-central.storage-download.googleapis.com/maven2/org/jpmml/pmml-model/1.4.8/pmml-model-1.4.8.jar(org/dmg/pmml/regression/NumericPredictor.class), could not find FIELD in enum <none>.

and reference from one of my historical build.
https://github.com/HiuKwok/spark/actions/runs/7932736253/job/21659910112

2. ExclusionRule
Yes, I Will create a separate PR for it.

@dongjoon-hyun
Copy link
Member

dongjoon-hyun commented Feb 19, 2024

BTW, I realized that we need to make this PR less intrusive by reducing the surface of change. We had better introduce a layer inside Spark. Not only for this but also for the migration from Jetty 11 to Jerry 12.

Screenshot 2024-02-19 at 10 46 52

I made a PR for that.

@pan3793
Copy link
Member

pan3793 commented Feb 20, 2024

Re-introduce the javax.servlet-api and jaxb-api jars, as Hive-related Jars are using old servelet reference internally, or else it will throw classNotFound during test and runtime, also this makes us decouple the HIve upgrade from this MR.

could you please elaborate more on the invocation chains? e.g. provide stacktrace

@HiuKwok
Copy link
Contributor Author

HiuKwok commented Feb 20, 2024

BTW, I realized that we need to make this PR less intrusive by reducing the surface of change. We had better introduce a layer inside Spark. Not only for this but also for the migration from Jetty 11 to Jerry 12.

Screenshot 2024-02-19 at 10 46 52

I made a PR for that.

Sure, I review the existing code change and check whether how can we centralise the servlet reference, mostly likely this will be another MR prior to this upgrade.

@HiuKwok
Copy link
Contributor Author

HiuKwok commented Feb 20, 2024

Re-introduce the javax.servlet-api and jaxb-api jars, as Hive-related Jars are using old servelet reference internally, or else it will throw classNotFound during test and runtime, also this makes us decouple the HIve upgrade from this MR.

could you please elaborate more on the invocation chains? e.g. provide stacktrace

This is the stack trace I had from one of the old builds during the development, which asks for javax/servlet/Filter mainly.

[info] org.apache.spark.sql.hive.thriftserver.ThriftServerWithSparkContextInBinarySuite *** ABORTED *** (4 milliseconds)
[info]   java.lang.NoClassDefFoundError: javax/servlet/Filter
[info]   at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.logAuditEvent(HiveMetaStore.java:297)
[info]   at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.logInfo(HiveMetaStore.java:782)
[info]   at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.access$1100(HiveMetaStore.java:228)
[info]   at org.apache.hadoop.hive.metastore.HiveMetaStore.cleanupRawStore(HiveMetaStore.java:7283)
[info]   at org.apache.hadoop.hive.metastore.HiveMetaStore.access$600(HiveMetaStore.java:163)
[info]   at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.shutdown(HiveMetaStore.java:844)
[info]   at jdk.internal.reflect.GeneratedMethodAccessor74.invoke(Unknown Source)
[info]   at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
[info]   at java.base/java.lang.reflect.Method.invoke(Method.java:568)
[info]   at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:148)
[info]   at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:107)
[info]   at jdk.proxy2/jdk.proxy2.$Proxy35.shutdown(Unknown Source)
[info]   at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.close(HiveMetaStoreClient.java:553)
[info]   at jdk.internal.reflect.GeneratedMethodAccessor73.invoke(Unknown Source)
[info]   at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
[info]   at java.base/java.lang.reflect.Method.invoke(Method.java:568)
[info]   at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:173)
[info]   at jdk.proxy2/jdk.proxy2.$Proxy36.close(Unknown Source)
[info]   at org.apache.hadoop.hive.ql.metadata.Hive.close(Hive.java:416)
[info]   at org.apache.hadoop.hive.ql.metadata.Hive.access$000(Hive.java:169)
[info]   at org.apache.hadoop.hive.ql.metadata.Hive$1.remove(Hive.java:190)
[info]   at org.apache.hadoop.hive.ql.metadata.Hive.closeCurrent(Hive.java:383)
[info]   at org.apache.spark.sql.hive.thriftserver.SharedThriftServer.afterAll(SharedThriftServer.scala:77)
[info]   at org.apache.spark.sql.hive.thriftserver.SharedThriftServer.afterAll$(SharedThriftServer.scala:69)
[info]   at org.apache.spark.sql.hive.thriftserver.ThriftServerWithSparkContextInBinarySuite.afterAll(ThriftServerWithSparkContextSuite.scala:275)
[info]   at org.scalatest.BeforeAndAfterAll.$anonfun$run$1(BeforeAndAfterAll.scala:225)
[info]   at org.scalatest.Status.$anonfun$withAfterEffect$1(Status.scala:377)
[info]   at org.scalatest.Status.$anonfun$withAfterEffect$1$adapted(Status.scala:373)
[info]   at org.scalatest.FailedStatus$.whenCompleted(Status.scala:505)
[info]   at org.scalatest.Status.withAfterEffect(Status.scala:373)
[info]   at org.scalatest.Status.withAfterEffect$(Status.scala:371)
[info]   at org.scalatest.FailedStatus$.withAfterEffect(Status.scala:477)
[info]   at org.scalatest.BeforeAndAfterAll.run(BeforeAndAfterAll.scala:224)
[info]   at org.scalatest.BeforeAndAfterAll.run$(BeforeAndAfterAll.scala:208)
[info]   at org.apache.spark.SparkFunSuite.run(SparkFunSuite.scala:69)
[info]   at org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:321)
[info]   at org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:517)
[info]   at sbt.ForkMain$Run.lambda$runTest$1(ForkMain.java:414)
[info]   at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
[info]   at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
[info]   at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
[info]   at java.base/java.lang.Thread.run(Thread.java:840)

Also here is the old build which threw the exception, if this provides more clarity on the issue.
https://github.com/HiuKwok/spark/actions/runs/7940788196/job/21687327232

@github-actions github-actions bot added the DOCS label Feb 20, 2024
@HiuKwok
Copy link
Contributor Author

HiuKwok commented Feb 20, 2024

@dongjoon-hyun
I need some more time to investigate the issue related to jaxb-api, but in the meantimee I will create another MR similar to
#45168
which aims to centralise the Jakrata reference and minimise the required code change for the Jetty upgrade.

docs/core-migration-guide.md Outdated Show resolved Hide resolved
project/MimaExcludes.scala Show resolved Hide resolved
@pan3793
Copy link
Member

pan3793 commented Feb 21, 2024

@HiuKwok Hmm... seems a class initialization would trigger all referenced classes loading. we may need to fix it on Hive side by breaking the Utils up to avoid javax.* classes loading.
https://github.com/apache/hive/blob/rel/release-2.3.9/shims/common/src/main/java/org/apache/hadoop/hive/shims/Utils.java

@dongjoon-hyun
Copy link
Member

dongjoon-hyun commented Feb 21, 2024

Ya, I had the same question initially. For now, I have no idea about that, @mridulm . Definitely, Apache Spark didn't handle them properly in both MIMA and unidocs. I didn't take a look in depth yet, neither .

Very curious why these are showing up in javadoc

Co-authored-by: Mridul Muralidharan <1591700+mridulm@users.noreply.github.com>
@dongjoon-hyun
Copy link
Member

Thank you for update, @HiuKwok .

@HiuKwok
Copy link
Contributor Author

HiuKwok commented Feb 23, 2024

Thank you for update, @HiuKwok .

Indeed that is a maven scope problem, fixed.

@dongjoon-hyun
Copy link
Member

Thank you. I'll revisit this PR tomorrow morning with fresh eyes, @HiuKwok . I'm currently in California (PST) timezone.

@HiuKwok
Copy link
Contributor Author

HiuKwok commented Feb 23, 2024

Thank you. I'll revisit this PR tomorrow morning with fresh eyes, @HiuKwok . I'm currently in California (PST) timezone.

Sure, take your time 👍

@dongjoon-hyun
Copy link
Member

Merged to master for Apache Spark 4.0.0.

Thank you so much for making this happens, @HiuKwok and all!

To @HiuKwok , you can make a follow-up PR for the above comment (and some future post-commit review comments).

@HiuKwok
Copy link
Contributor Author

HiuKwok commented Feb 24, 2024

Merged to master for Apache Spark 4.0.0.

Thank you so much for making this happens, @HiuKwok and all!

To @HiuKwok , you can make a follow-up PR for the above comment (and some future post-commit review comments).

Sure, I will open a separate JIRA under SPARK-47046 for that.

dongjoon-hyun pushed a commit that referenced this pull request Feb 26, 2024
….servlet-api` dependency scope in `connect/server` module

### What changes were proposed in this pull request?
This is a follow up change from #45154, to remove redundant `<scope>` for both servlet-api as `compile` is the default scope.

### Why are the changes needed?

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
CI build

### Was this patch authored or co-authored using generative AI tooling?
No

Closes #45258 from HiuKwok/ft-hf-jetty-deps-scope.

Authored-by: HiuFung Kwok <hiufkwok@gmail.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
a0x8o added a commit to a0x8o/spark that referenced this pull request Feb 26, 2024
….servlet-api` dependency scope in `connect/server` module

### What changes were proposed in this pull request?
This is a follow up change from apache/spark#45154, to remove redundant `<scope>` for both servlet-api as `compile` is the default scope.

### Why are the changes needed?

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
CI build

### Was this patch authored or co-authored using generative AI tooling?
No

Closes #45258 from HiuKwok/ft-hf-jetty-deps-scope.

Authored-by: HiuFung Kwok <hiufkwok@gmail.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
TakawaAkirayo pushed a commit to TakawaAkirayo/spark that referenced this pull request Mar 4, 2024
….servlet-api` dependency scope in `connect/server` module

### What changes were proposed in this pull request?
This is a follow up change from apache#45154, to remove redundant `<scope>` for both servlet-api as `compile` is the default scope.

### Why are the changes needed?

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
CI build

### Was this patch authored or co-authored using generative AI tooling?
No

Closes apache#45258 from HiuKwok/ft-hf-jetty-deps-scope.

Authored-by: HiuFung Kwok <hiufkwok@gmail.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
ericm-db pushed a commit to ericm-db/spark that referenced this pull request Mar 5, 2024
### What changes were proposed in this pull request?
This is an upgrade ticket to bump the Jetty version from 10 to 11, and the Jersey version from 2 to 3, so the project can gradually move away from Javax.servlet package and adopt the Jakrata standard, code changes on a high level involves:

1.  Bump `jakarta.servlet-api` from 4 to 5 in order to adopt the new namespace.
2. Re-introduce the `javax.servlet-api` and `jaxb-api` jars, as Hive-related Jars are using old servelet reference internally, or else it will throw classNotFound during test and runtime, also this makes us decouple the HIve upgrade from this MR.
3. Rewrite the Tservlet class from LIbThrift with alternative namespace `Jakrata` instead of `Javax`, again this helps us to decouple the LIbThrift upgrade task from this MR, or else we will need to wait for Hive 4.

4. Update `MimaExclude.scala` to exclude the breaking compatibility that introduced by javax -> jakrata.

5. Update Scalac option to prevent the enum scan on third-party jar `org.dmg.pmml` and its failed the build.

6. Update ALL internal servlet implementation from Javax to Jakrata.

### Why are the changes needed?
To the Spark one step closer to the latest version of Jetty and Jersey, to receive up-to-date security fixes.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
CI build & Unit test

### Was this patch authored or co-authored using generative AI tooling?
No

Closes apache#45154 from HiuKwok/ft-hf-SPARK-45522-jetty-11.

Lead-authored-by: HiuFung Kwok <hiufkwok@gmail.com>
Co-authored-by: HiuFung Kwok <37996731+HiuKwok@users.noreply.github.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
ericm-db pushed a commit to ericm-db/spark that referenced this pull request Mar 5, 2024
….servlet-api` dependency scope in `connect/server` module

### What changes were proposed in this pull request?
This is a follow up change from apache#45154, to remove redundant `<scope>` for both servlet-api as `compile` is the default scope.

### Why are the changes needed?

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
CI build

### Was this patch authored or co-authored using generative AI tooling?
No

Closes apache#45258 from HiuKwok/ft-hf-jetty-deps-scope.

Authored-by: HiuFung Kwok <hiufkwok@gmail.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
@pan3793
Copy link
Member

pan3793 commented May 10, 2024

@HiuKwok @dongjoon-hyun I meet an issue on the latest master branch that seems related to this change. The brief reproducible steps are:

  1. make dist dev/make-distribution.sh --tgz -Phive,hive-thriftserver,yarn
  2. setup yarn/hadoop conf, and perform spark-sql --master=yarn
2024-05-10 17:36:32 ERROR SparkContext: Error initializing SparkContext.
org.sparkproject.jetty.util.MultiException: Multiple exceptions
	at org.sparkproject.jetty.util.MultiException.ifExceptionThrow(MultiException.java:117) ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
	at org.sparkproject.jetty.servlet.ServletHandler.initialize(ServletHandler.java:751) ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
	at org.sparkproject.jetty.servlet.ServletContextHandler.startContext(ServletContextHandler.java:392) ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
	at org.sparkproject.jetty.server.handler.ContextHandler.doStart(ContextHandler.java:902) ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
	at org.sparkproject.jetty.servlet.ServletContextHandler.doStart(ServletContextHandler.java:306) ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
	at org.sparkproject.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:93) ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
	at org.apache.spark.ui.ServerInfo.addHandler(JettyUtils.scala:514) ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
	at org.apache.spark.ui.SparkUI.$anonfun$attachAllHandlers$2(SparkUI.scala:81) ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
	at org.apache.spark.ui.SparkUI.$anonfun$attachAllHandlers$2$adapted(SparkUI.scala:81) ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
	at scala.collection.IterableOnceOps.foreach(IterableOnce.scala:619) ~[scala-library-2.13.13.jar:?]
	at scala.collection.IterableOnceOps.foreach$(IterableOnce.scala:617) ~[scala-library-2.13.13.jar:?]
	at scala.collection.AbstractIterable.foreach(Iterable.scala:935) ~[scala-library-2.13.13.jar:?]
	at org.apache.spark.ui.SparkUI.$anonfun$attachAllHandlers$1(SparkUI.scala:81) ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
	at org.apache.spark.ui.SparkUI.$anonfun$attachAllHandlers$1$adapted(SparkUI.scala:79) ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
	at scala.Option.foreach(Option.scala:437) ~[scala-library-2.13.13.jar:?]
	at org.apache.spark.ui.SparkUI.attachAllHandlers(SparkUI.scala:79) ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
	at org.apache.spark.SparkContext.$anonfun$new$31(SparkContext.scala:690) ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
	at org.apache.spark.SparkContext.$anonfun$new$31$adapted(SparkContext.scala:690) ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
	at scala.Option.foreach(Option.scala:437) ~[scala-library-2.13.13.jar:?]
	at org.apache.spark.SparkContext.<init>(SparkContext.scala:690) ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
	at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2963) ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
	at org.apache.spark.sql.SparkSession$Builder.$anonfun$getOrCreate$2(SparkSession.scala:1118) ~[spark-sql_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
	at scala.Option.getOrElse(Option.scala:201) [scala-library-2.13.13.jar:?]
	at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:1112) [spark-sql_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
	at org.apache.spark.sql.hive.thriftserver.SparkSQLEnv$.init(SparkSQLEnv.scala:64) [spark-hive-thriftserver_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
	at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.<init>(SparkSQLCLIDriver.scala:405) [spark-hive-thriftserver_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
	at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:162) [spark-hive-thriftserver_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
	at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala) [spark-hive-thriftserver_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:?]
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77) ~[?:?]
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:?]
	at java.base/java.lang.reflect.Method.invoke(Method.java:568) ~[?:?]
	at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) [spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
	at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:1019) [spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
	at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:196) [spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
	at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:219) [spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
	at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:95) [spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
	at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1109) [spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
	at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1118) [spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
	at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) [spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
	Suppressed: jakarta.servlet.UnavailableException: Class loading error for holder org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter-1bb4c431==org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter@1bb4c431{inst=false,async=true,src=EMBEDDED:null}
		at org.sparkproject.jetty.servlet.BaseHolder.doStart(BaseHolder.java:104) ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
		at org.sparkproject.jetty.servlet.FilterHolder.doStart(FilterHolder.java:93) ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
		at org.sparkproject.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:93) ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
		at org.sparkproject.jetty.servlet.ServletHandler.lambda$initialize$2(ServletHandler.java:724) ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
		at java.base/java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1625) ~[?:?]
		at java.base/java.util.stream.Streams$ConcatSpliterator.forEachRemaining(Streams.java:734) ~[?:?]
		at java.base/java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:762) ~[?:?]
		at org.sparkproject.jetty.servlet.ServletHandler.initialize(ServletHandler.java:749) ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
		at org.sparkproject.jetty.servlet.ServletContextHandler.startContext(ServletContextHandler.java:392) ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
		at org.sparkproject.jetty.server.handler.ContextHandler.doStart(ContextHandler.java:902) ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
		at org.sparkproject.jetty.servlet.ServletContextHandler.doStart(ServletContextHandler.java:306) ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
		at org.sparkproject.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:93) ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
		at org.apache.spark.ui.ServerInfo.addHandler(JettyUtils.scala:514) ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
		at org.apache.spark.ui.SparkUI.$anonfun$attachAllHandlers$2(SparkUI.scala:81) ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
		at org.apache.spark.ui.SparkUI.$anonfun$attachAllHandlers$2$adapted(SparkUI.scala:81) ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
		at scala.collection.IterableOnceOps.foreach(IterableOnce.scala:619) ~[scala-library-2.13.13.jar:?]
		at scala.collection.IterableOnceOps.foreach$(IterableOnce.scala:617) ~[scala-library-2.13.13.jar:?]
		at scala.collection.AbstractIterable.foreach(Iterable.scala:935) ~[scala-library-2.13.13.jar:?]
		at org.apache.spark.ui.SparkUI.$anonfun$attachAllHandlers$1(SparkUI.scala:81) ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
		at org.apache.spark.ui.SparkUI.$anonfun$attachAllHandlers$1$adapted(SparkUI.scala:79) ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
		at scala.Option.foreach(Option.scala:437) ~[scala-library-2.13.13.jar:?]
		at org.apache.spark.ui.SparkUI.attachAllHandlers(SparkUI.scala:79) ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
		at org.apache.spark.SparkContext.$anonfun$new$31(SparkContext.scala:690) ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
		at org.apache.spark.SparkContext.$anonfun$new$31$adapted(SparkContext.scala:690) ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
		at scala.Option.foreach(Option.scala:437) ~[scala-library-2.13.13.jar:?]
		at org.apache.spark.SparkContext.<init>(SparkContext.scala:690) ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
		at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2963) ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
		at org.apache.spark.sql.SparkSession$Builder.$anonfun$getOrCreate$2(SparkSession.scala:1118) ~[spark-sql_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
		at scala.Option.getOrElse(Option.scala:201) [scala-library-2.13.13.jar:?]
		at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:1112) [spark-sql_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
		at org.apache.spark.sql.hive.thriftserver.SparkSQLEnv$.init(SparkSQLEnv.scala:64) [spark-hive-thriftserver_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
		at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.<init>(SparkSQLCLIDriver.scala:405) [spark-hive-thriftserver_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
		at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:162) [spark-hive-thriftserver_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
		at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala) [spark-hive-thriftserver_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
		at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:?]
		at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77) ~[?:?]
		at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:?]
		at java.base/java.lang.reflect.Method.invoke(Method.java:568) ~[?:?]
		at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) [spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
		at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:1019) [spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
		at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:196) [spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
		at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:219) [spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
		at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:95) [spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
		at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1109) [spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
		at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1118) [spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
		at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) [spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
Caused by: jakarta.servlet.UnavailableException: Class loading error for holder org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter==org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter@2ae2fa13{inst=false,async=true,src=EMBEDDED:null}
	at org.sparkproject.jetty.servlet.BaseHolder.doStart(BaseHolder.java:104) ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
	at org.sparkproject.jetty.servlet.FilterHolder.doStart(FilterHolder.java:93) ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
	at org.sparkproject.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:93) ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
	at org.sparkproject.jetty.servlet.ServletHandler.lambda$initialize$2(ServletHandler.java:724) ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
	at java.base/java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1625) ~[?:?]
	at java.base/java.util.stream.Streams$ConcatSpliterator.forEachRemaining(Streams.java:734) ~[?:?]
	at java.base/java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:762) ~[?:?]
	at org.sparkproject.jetty.servlet.ServletHandler.initialize(ServletHandler.java:749) ~[spark-core_2.13-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
	... 38 more

@pan3793
Copy link
Member

pan3793 commented May 10, 2024

also cc the 4.0.0-preview1 release manager @cloud-fan

@dongjoon-hyun
Copy link
Member

Please file a proper JIRA issue, @pan3793 .

@pan3793
Copy link
Member

pan3793 commented May 11, 2024

I raised SPARK-48238. And I have no solution without reverting javax => jakarta namespace migration yet.

@cloud-fan
Copy link
Contributor

According to https://issues.apache.org/jira/browse/SPARK-48238 , we need to revert this. I'll cut another RC after it.

@pan3793
Copy link
Member

pan3793 commented May 22, 2024

Hi @HiuKwok, I noticed this PR migrated javax.{servlet|ws.rs} to the jakarta.{servlet|ws.rs}, do you have a plan to migrate java.xml to jakarta.xml? I suppose this is a work that Spark eventually has to do, though it would introduce breaking changes.

@hiufung-kwok
Copy link
Contributor

hiufung-kwok commented Oct 14, 2024

Hi @HiuKwok, I noticed this PR migrated javax.{servlet|ws.rs} to the jakarta.{servlet|ws.rs}, do you have a plan to migrate java.xml to jakarta.xml? I suppose this is a work that Spark eventually has to do, though it would introduce breaking changes.

Actually not just for javax.servlet|wr.rs, but we should also review all usages of Javax packages.
I created an Umbrella ticket for this purpose and I will continue to investigate.
https://issues.apache.org/jira/browse/SPARK-49963

dongjoon-hyun pushed a commit that referenced this pull request Oct 15, 2024
### What changes were proposed in this pull request?

- To Remove the dependency of `javax.ws.rs.ws-rs-api` as it's no longer required.

Prior discussion can be found on:
 - #41340
 - #45154

### Why are the changes needed?
In the past, the codebase used to have a few .scala classes referencing and using the `ws-rs-api`, such as b7fdc23#diff-9c5fb3d1b7e3b0f54bc5c4182965c4fe1f9023d449017cece3005d3f90e8e4d8R624-R627

However as the time passed by, all usages of `ws-rs-api` are either got removed / refactored. Hence there is no need to have it import on root POM as now and we can always re-introduce it later, if the usage can be justified again.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Unit-test, to make sure the codebase is not impacted by the removal of the dependency.

### Was this patch authored or co-authored using generative AI tooling?
No

Closes #48461 from hiufung-kwok/ft-hf-SPARK-49963-remove-ws-rs-api.

Authored-by: HiuFung Kwok <hiufung.kwok.852@gmail.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
himadripal pushed a commit to himadripal/spark that referenced this pull request Oct 19, 2024
### What changes were proposed in this pull request?

- To Remove the dependency of `javax.ws.rs.ws-rs-api` as it's no longer required.

Prior discussion can be found on:
 - apache#41340
 - apache#45154

### Why are the changes needed?
In the past, the codebase used to have a few .scala classes referencing and using the `ws-rs-api`, such as apache@b7fdc23#diff-9c5fb3d1b7e3b0f54bc5c4182965c4fe1f9023d449017cece3005d3f90e8e4d8R624-R627

However as the time passed by, all usages of `ws-rs-api` are either got removed / refactored. Hence there is no need to have it import on root POM as now and we can always re-introduce it later, if the usage can be justified again.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Unit-test, to make sure the codebase is not impacted by the removal of the dependency.

### Was this patch authored or co-authored using generative AI tooling?
No

Closes apache#48461 from hiufung-kwok/ft-hf-SPARK-49963-remove-ws-rs-api.

Authored-by: HiuFung Kwok <hiufung.kwok.852@gmail.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants