Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-11081] Shade Jersey and javax.rs.ws #9615

Closed
wants to merge 1 commit into from

Conversation

mccheah
Copy link
Contributor

@mccheah mccheah commented Nov 11, 2015

Jersey and javax.rs.ws are commonly depended on and cause dependency
conflicts in many applications that depend on the Spark maven artifacts.
This patch attempts to shade them.

However this patch set is a work in progress. Currently, the javax.rs.ws shading appears to be broken. When I run the mvn package task and generate the assembly jar, and unpack it, I'm missing the javax.rs.ws classes - they're not under org/spark-project/javax/rs/ws. I would appreciate some assistance in debugging how the shading is being done incorrectly here.

Jersey and javax.rs.ws are commonly depended on and cause dependency
conflicts in many applications that depend on the Spark maven artifacts.
This patch attempts to shade them.
@SparkQA
Copy link

SparkQA commented Nov 11, 2015

Test build #45586 has finished for PR 9615 at commit 83f98c9.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@@ -2165,6 +2166,9 @@
<include>org.eclipse.jetty:jetty-security</include>
<include>org.eclipse.jetty:jetty-util</include>
<include>org.eclipse.jetty:jetty-server</include>
<include>com.sun.jersey:jersey-core</include>
<include>com.sun.jersey:jersey-json</include>
<include>com.sun.jersey:jersey-server</include>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you also need to add the javax.ws.rs artifacts to this <includes> list?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

javax.ws.rs is actually inside the Jersey artifacts.

The shading didn't work as I expected regardless if I added the javax.rs.ws stuff to here as well though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @mccheah - if you look at jetty-server as an example, there are other build changes related to shading that you haven't done here:

  1. In the root pom, they should be marked as provided.
  2. In core/pom.xml they need to be added in includeArtifactIds.
  3. They may also need to be listed in other poms as well due to some compiler bugs.

I'd just go through and look everywhere in the source code you see jetty-server and do the same for these 3 artifacts. If it's still not working then, let me know and I can take a look.

> git grep jetty-server
core/pom.xml:      <artifactId>jetty-server</artifactId>
core/pom.xml:                guava,jetty-io,jetty-servlet,jetty-continuation,jetty-http,jetty-plus,jetty-util,jetty-server,jetty-security
pom.xml:        <artifactId>jetty-server</artifactId>
pom.xml:              <include>org.eclipse.jetty:jetty-server</include>
repl/pom.xml:      <artifactId>jetty-server</artifactId>
streaming/pom.xml:      <artifactId>jetty-server</artifactId>
yarn/pom.xml:      <artifactId>jetty-server</artifactId>

@vanzin
Copy link
Contributor

vanzin commented Nov 11, 2015

Another thing to keep in mind, given the recent issues with @VisibleForTesting: JAX-RS APIs use lots of annotations, and if you shade them, that is likely to cause problems with scalac-generated code (see #9585).

The good news is that since new versions of these APIs are backwards compatible, you shouldn't need to shade the javax.rs stuff at all; someone who wants to override them just needs to put the newer version in front of Spark's classpath. It's not as easy as including it in your application, but it's doable.

Given that, I wonder if it's even necessary to shade jersey at all; the version of jersey used in Spark is old and doesn't clash with the new package name used by newer releases of Jersey.

@mccheah
Copy link
Contributor Author

mccheah commented Nov 11, 2015

@vanzin we've hit cases specifically where swapping in the newer version of javax.rs.ws causes the Spark UI to break. I'm not convinced that javax.rs.ws versions are compatible between what Spark uses and the newer versions.

@pwendell great I'll try these suggestions.

Apologies for my lack of knowledge, but what are the sorts of things I should be testing with regards to this change? That is, what Spark components are specifically using the javax.rs.ws and Jersey stuff? I know the Spark UI is one of them, is there anything else? I'm also inspecting the jar files after building.

@pwendell
Copy link
Contributor

Regarding testing, the best way is to inspect the contents of the jar to make sure that the shaded version is inlined. If Spark code uses the shaded dependency directly, you can also use javap to inspect the byte code and make sure that the references are to the shaded versions of jersey rather than the real one.

@mccheah given the comments by @vanzin, can you say more about the specific incompatibility you are facing? It would be good to make sure that if we shade something it's due to a known incompatibility between some set of versions.

@vanzin
Copy link
Contributor

vanzin commented Nov 12, 2015

My concern is specifically with shading the annotations; we've seen that it causes issues with scalac-generated code. If you manage to fix the PR to shade jax-rs annotations, please do extensive tests with spark-shell to make sure it still works.

I'm also interested in which incompatibilities you ran into. These APIs are generally pretty good at maintaining backwards compatibility; unless you're using some milestone build (I remember pre-release jars of the JAX-RS 2.0 API having some differences from the final version), it should all work.

@mccheah
Copy link
Contributor Author

mccheah commented Nov 12, 2015

We're seeing that exception handling has changed between Jersey 01 and Jersey 02. In our application, we are including both Jersey 2 and javax.ws.rs-api-2.0.1.jar on the driver and executor classpath.

I simplified it to a Spark shell session:

bin/spark-shell --master spark://mcheah-mbp:7077 --jars /Users/mcheah/Downloads/jersey-common-2.19.jar,/Users/mcheah/Downloads/javax.ws.rs-api-2.0.1.jar

scala> new javax.ws.rs.NotFoundException()
java.lang.NoSuchMethodError: javax.ws.rs.ClientErrorException.validate(Ljavax/ws/rs/core/Response;Ljavax/ws/rs/core/Response$Status$Family;)Ljavax/ws/rs/core/Response;
at javax.ws.rs.ClientErrorException.<init>(ClientErrorException.java:64)
at javax.ws.rs.NotFoundException.<init>(NotFoundException.java:60)

We have other examples as well where similar errors are happening. I think in this case, javax.ws.rs.NotFoundException extends a class that is being loaded from the Jersey 1 jar, but the class in the Jersey 1 jar isn't compatible with what javax.ws.rs-api.2.0.1's equivalent class is.

cc @mingyukim @kevinychen

@mccheah
Copy link
Contributor Author

mccheah commented Nov 16, 2015

What does everyone think about the case I've outlined above?

@vanzin
Copy link
Contributor

vanzin commented Nov 16, 2015

You cannot use --jars alone to override Spark's classpath. You have to either use --jars and --conf spark.driver.userClassPathFirst=true, or use --driver-class-path.

@mccheah
Copy link
Contributor Author

mccheah commented Nov 16, 2015

I used

mcheah@mcheah-mbp spark-1.5.1-bin-hadoop2.4 $ bin/spark-shell --master spark://mcheah-mbp:7077 --driver-class-path /Users/mcheah/Downloads/jersey-common-2.19.jar,/Users/mcheah/Downloads/javax.ws.rs-api-2.0.1.jar --conf spark.driver.userClassPathFirst=true

And this time I got...

scala> import javax.ws.rs.NotFoundException
<console>:19: error: object NotFoundException is not a member of package javax.ws.rs
       import javax.ws.rs.NotFoundException

Edit: Using --jars and --conf spark.driver.userClassPathFirst=true led to the same NoSuchMethodError I listed above.

@vanzin
Copy link
Contributor

vanzin commented Nov 16, 2015

The separator for --driver-class-path is :, not ,...

@mccheah
Copy link
Contributor Author

mccheah commented Nov 16, 2015

The NoSuchMethodError occurs if I use --driver-class-path with the right delimiter.

And regardless - should we be shading so that users shouldn't have to set the userClassPathFirst settings on drivers and/or executors?

Edit: Actually I looked more closely at my exception trace. It's a different error, but it still doesn't work:

mcheah@mcheah-mbp spark-1.5.1-bin-hadoop2.4 $ bin/spark-shell --master spark://mcheah-mbp:7077 --driver-class-path /Users/mcheah/Downloads/jersey-common-2.19.jar:/Users/mcheah/Downloads/javax.ws.rs-api-2.0.1.jar

scala> import javax.ws.rs.NotFoundException
import javax.ws.rs.NotFoundException

scala> new NotFoundException()
java.lang.AbstractMethodError: javax.ws.rs.core.Response.getStatusInfo()Ljavax/ws/rs/core/Response$StatusType;
    at javax.ws.rs.WebApplicationException.validate(WebApplicationException.java:307)
    at javax.ws.rs.ClientErrorException.<init>(ClientErrorException.java:64)
    at javax.ws.rs.NotFoundException.<init>(NotFoundException.java:60)

@vanzin
Copy link
Contributor

vanzin commented Nov 17, 2015

And regardless - should we be shading so that users shouldn't have to set the userClassPathFirst settings on drivers and/or executors?

But you stated before that shading wasn't yet fully done by this patch? Is that not the case anymore? I've stated what my worry is: JAX-RS is based on annotations and shading annotations tends to break scalac-generated code.

If you have fixed the shading, you need to (i) build spark using maven and (ii) test that everything, especially the spark-shell, still works. Please refer to the PR I linked to before (#9585). The PR builder is not enough, since even if you trigger it with maven, it adds too many things to the test classpath, preventing the error from happening.

@mccheah
Copy link
Contributor Author

mccheah commented Nov 17, 2015

Ah, I haven't actually fixed the shading yet - despite adding a bunch of logic to pom files the classes weren't being included in the assembly jar the way I expected.

However I posted the reproduction cases to demonstrate the need for shading in the first place. Are we all on the same page with the reason why we need to do the shading here? If we're all on board I'll post what I've tried and we can debug further from there.

@vanzin
Copy link
Contributor

vanzin commented Nov 17, 2015

Ok, so I figured out why you were seeing the exceptions with --driver-class-path.

The old version of jersey hooks up to the JAX-RS API using META-INF/services files; the new version doesn't. So, when you have both the new jersey and the old jersey in the classpath, the JAX-RS code will load classes from the old jersey, because that's what the META-INF entries tell it to do.

The old jersey implementation of the Response class does not have the getStatusInfo method, so you see that error.

This also means that plain shading will not work. You would have to modify all the services files too; not sure whether the plugins does that, but I don't think so. And that's assuming that shading JAX-RS annotations doesn't break other things.

One thing I tried and seems to have worked in a quick test is to not package the old jersey service files in the assembly. I just added this to the sbt build:

diff --git a/project/SparkBuild.scala b/project/SparkBuild.scala
index 67724c4..e8317b0 100644
--- a/project/SparkBuild.scala
+++ b/project/SparkBuild.scala
@@ -445,6 +445,7 @@ object Assembly {
       case m if m.toLowerCase.endsWith("manifest.mf")          => MergeStrategy.discard
       case m if m.toLowerCase.matches("meta-inf.*\\.sf$")      => MergeStrategy.discard
       case "log4j.properties"                                  => MergeStrategy.discard
+      case m if m.toLowerCase.startsWith("meta-inf/services/javax.ws") => MergeStrategy.discard
       case m if m.toLowerCase.startsWith("meta-inf/services/") => MergeStrategy.filterDistinctLines
       case "reference.conf"                                    => MergeStrategy.concat
       case _                                                   => MergeStrategy.first

When I added the new jersey to the class path, the new classes were loaded (and worked), and the Spark UI was still able to serve things under /api. Without the new classes, /api also works. So it seems there might be a way out even without shading.

@mccheah
Copy link
Contributor Author

mccheah commented Nov 17, 2015

Hm... based on what we've found so far, it sounds like shading Jersey is a pretty difficult task. In general any scenario where the application code uses a different version of Jersey than what Spark uses will be a painful experience.

Based on this, does it perhaps make sense to try to remove our dependency on Jersey entirely? What are we using Jersey for? Would it be a significant undertaking to replace the parts that use Jersey with something else? (This was also suggested on the JIRA comments)

@vanzin
Copy link
Contributor

vanzin commented Nov 17, 2015

What are we using Jersey for?

Jersey is used to serve the JSON API. See http://spark.apache.org/docs/latest/monitoring.html, under "REST API".

But, if you paid attention to my investigation, it seems that if you remove the service files from the old jersey, it should be possible to use "userClassPathFirst" to override it for the application, which is how you should override libraries used by Spark anyway. That doesn't require shading anything.

@mccheah
Copy link
Contributor Author

mccheah commented Nov 17, 2015

But if you remove the META-INF/service files, but the user doesn't ship / use Jersey, then, the Jersey 1 stuff won't work, right? What are the implications for users that don't depend on Jersey? Edit: I see that "without the new classes, /api also works" - but what is Jersey falling back to then when the META-INF/services part is being excluded?

Also, shouldn't we be aiming for a solution that doesn't require the userClassPathFirst setting?

@vanzin
Copy link
Contributor

vanzin commented Nov 17, 2015

I did not remove the old jersey classes, just the META-INF/services entries. The classes are still there, and it seems the code is smart enough to figure out the implementation without the META-INF stuff.

If that really works, then it opens a door to allowing users to override jersey in their apps. My quick test worked, but more thorough testing would need to be done.

@mccheah
Copy link
Contributor Author

mccheah commented Nov 17, 2015

I think I have a different issue cropping up that won't be solved by just not including the META-INF/services/javax.ws.rs files in the assembly.

For context, my application is pulling in the spark-* artifacts as maven dependencies, and just spawning a Spark context in its JVM to connect to a Spark cluster.

I ran the following experiment with my app:

  1. Use a Spark assembly jar that merely excludes the javax.ws.rs files, as @vanzin had done in the above investigations. Start the Spark standalone master and worker daemons.
  2. In my application, I pull in Jersey 2 and explicitly exclude Jersey 1 dependencies. I went to my application's "lib" directory and verified that there were no jersey-1.X jars being loaded on my app's classpath and only Jersey 2.x jars were being loaded.
  3. My Spark context should connect to my cluster I'm running on localhost.
  4. My application fails at starting the Spark context with the following exception:
Caused by: java.lang.NoClassDefFoundError: com/sun/jersey/spi/container/servlet/ServletContainer
    at org.apache.spark.status.api.v1.ApiRootResource$.getServletHandler(ApiRootResource.scala:174) ~[spark-core_2.10-1.4.1-palantir3.jar:1.4.1-palantir3]
    at org.apache.spark.ui.SparkUI.initialize(SparkUI.scala:68) ~[spark-core_2.10-1.4.1-palantir3.jar:1.4.1-palantir3]
    at org.apache.spark.ui.SparkUI.<init>(SparkUI.scala:74) ~[spark-core_2.10-1.4.1-palantir3.jar:1.4.1-palantir3]
    at org.apache.spark.ui.SparkUI$.create(SparkUI.scala:190) ~[spark-core_2.10-1.4.1-palantir3.jar:1.4.1-palantir3]

(Side note: 1.4.1-palantir3 is just a few extra patches that are in master/1.5+ on top of 1.4.1. https://github.com/palantir/spark/commits/v1.4.1-palantir3)

@vanzin
Copy link
Contributor

vanzin commented Nov 17, 2015

For context, my application is pulling in the spark-* artifacts as maven dependencies

Yeah, that's a different use case, which isn't really covered by my suggestions. For that case, a different approach might be needed.

Instead of shading, a small pom file could be created to create a spark-specific version of the old jersey artifacts without the offending files in META-INF. That way, when you pull jersey transitively via spark-core, you'd be getting this sanitized jar instead of the original one.

Another approach could be to upgrade the version of jersey used by Spark. I am not particularly a fan of upgrading dependencies, but it's unlikely people depend on that dependency currently.

@mccheah
Copy link
Contributor Author

mccheah commented Nov 18, 2015

The exception we're getting here seems like it's a namespace-difference between Jersey 1 and Jersey 2, and Spark is trying to use Jersey 1's namespace when the class has been moved.

Upgrading to Jersey 2 is an option; if we shade Jersey 2 - which from my understanding doesn't get us tripped up by META-INF/services, because Jersey 2 doesn't use that - then can we get a solution that doesn't force Jersey users to specify userClassPathFirst?

@vanzin
Copy link
Contributor

vanzin commented Nov 18, 2015

The exception you're getting is because you explicitly removed the jersey 1 dependency, right? My suggestion would keep the classes, get rid of the META-INF stuff. So you'd have both jerseys in your app, Spark using the old one, your app using the new one.

if we shade Jersey 2

If possible, I prefer not shading anything.

@mccheah
Copy link
Contributor Author

mccheah commented Nov 18, 2015

To clarify, does this solution require that I load the jersey 2 jar before the jersey 1 jar in order to make sure the javax.rs.ws classes are loaded from jersey 2 in my app? Or are we saying that if META-INF/services is purged from Jersey 1 then it doesn't matter which javax.ws.rs classes are loaded?

@vanzin
Copy link
Contributor

vanzin commented Nov 18, 2015

If you purge the META-INF stuff, you'll just need to make sure your app has the newest javax.ws.rs classes available; that may or may not mean adding exclusions when adding a spark-core dependency, depending on the artifact names. The order the jersey 1 and 2 jars are included does not matter, since they do not conflict with each other.

@mccheah
Copy link
Contributor Author

mccheah commented Nov 18, 2015

I don't think the javax.ws.rs classes are actually compatible with each other.

I just tried another experiment. I took my Spark package hacked to remove the META-INF/services/javax.ws.rs* files. I then ran this:

bin/spark-shell --master spark://mcheah-mbp:7077 --driver-class-path /Users/mcheah/Downloads/jersey-common-2.19.jar:/Users/mcheah/Downloads/javax.ws.rs-api-2.0.1.jar

scala> new javax.ws.rs.NotFoundException
java.lang.NoClassDefFoundError: org/glassfish/hk2/utilities/Binder
    at org.glassfish.jersey.internal.RuntimeDelegateImpl.<init>(RuntimeDelegateImpl.java:63)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:526)

@vanzin
Copy link
Contributor

vanzin commented Nov 18, 2015

That's because jersey 2 has a bunch of other dependencies you need aside from the two jars you're adding. If you find all of them and add them to the classpath, then it will work.

@mccheah
Copy link
Contributor Author

mccheah commented Nov 18, 2015

So I tried the following experiment to simulate what we want to be doing and testing the hypothesis that it doesn't matter what order the Jersey jars are loaded in. What I did was, take all of the jersey 1 jars in my application's classpath, and remove the META-INF/services/javax.ws.* files from them. I also renamed my jersey-1 jars, prefixing them with 1-jersey-*.

I started my application and on the driver side I created a javax.ws.rs.NotFoundException, and hit this error:

Caused by: java.lang.NoSuchMethodError: javax.ws.rs.ClientErrorException.validate(Ljavax/ws/rs/core/Response;Ljavax/ws/rs/core/Response$Status$Family;)Ljavax/w
    at javax.ws.rs.ClientErrorException.<init>(ClientErrorException.java:64) ~[javax.ws.rs-api-2.0.1.jar:2.0.1]
    at javax.ws.rs.NotFoundException.<init>(NotFoundException.java:60) ~[javax.ws.rs-api-2.0.1.jar:2.0.1]

javax.ws.rs.NotFoundException is a class in the Jersey 2 jar, but it extends javax.ws.rs.ClientErrorException... which in turn extends javax.ws.rs.WebApplicationException. The WebApplicationException class exists in both Jersey jars but are different. In particular, WebApplicationException in the Jersey 2 jar contains the validate() method and that method is absent in Jersey 1. WebApplicationException was loaded from Jersey 1 and so it is missing the validate() method that ClientErrorException tries to call.

The NoSuchMethodError didn't manifest if I didn't rename the Jersey jars, indicating that class loading order indeed does matter here.

@vanzin
Copy link
Contributor

vanzin commented Nov 18, 2015

Seems like the old jersey-core jar has the javax classes within it, instead of depending on a separate artifact; so you'd need the new javax jar in front of it in the classpath.

If going with the "modify Spark build" path, a similar approach to removing META-INF files could be used to create a separate jar for the javax api only, that could then be excluded from spark-core dependencies if you want to.

@mccheah
Copy link
Contributor Author

mccheah commented Nov 18, 2015

So to recap this approach holistically, both from the perspective of my application and from what needs to change Spark side:

  1. Create "org.spark-project.jersey" artifact (loosely speaking) which is the Jersey 1 jar minus all the javax.ws.rs stuff (no need to actually shade/namespace the classes that way, just the artifact name)
  2. Put all the javax.ws.rs stuff extracted from step 1 into its own artifact, say "org.spark-project.javax.ws.rs". (META-INF/services/javax.ws.rs* files live in this artifact as well)
  3. Spark-core's pom depends on org.spark-project artifacts from step 1 and 2
  4. Spark assembly excludes META-INF/services/javax.ws.rs.*
  5. In my application's dependencies, exclude org.spark-project.javax.ws.rs from spark-core
  6. Set spark.executor.userClassPathFirst=true and ship Jersey 2 and new javax.ws.rs jars to the executors

Am I missing anything? Thanks a lot by the way @vanzin , this discussion has been extremely helpful.

@vanzin
Copy link
Contributor

vanzin commented Nov 18, 2015

Steps 1 and 4 are both needed for your app, since you're embedding Spark. Basically "org.spark-project.jersey" can't contain either javax.ws classes nor the META-INF files.

@mccheah
Copy link
Contributor Author

mccheah commented Nov 18, 2015

My app would depend on Jersey 2 only which doesn't package the javax.ws.rs in itself and doesn't have the META-INF/services/javax.ws.rs* stuff.

@vanzin
Copy link
Contributor

vanzin commented Nov 18, 2015

But it also depends on spark-core, no? You said so above. And spark-core would depend on the spark-specific jersey jar which in your list would have the META-INF classes...

@mccheah
Copy link
Contributor Author

mccheah commented Nov 18, 2015

No - spark-specific Jersey jar would not have javax.ws.rs classes. That would be in org.spark-project.javax.ws.rs.

Edit: spark-specific jersey jar would also exclude META-INF/service/javax.ws.rs* files and put those in org.spark-project.javax.ws.rs.

@vanzin
Copy link
Contributor

vanzin commented Nov 18, 2015

But it (and spark-core) need to depend on that jar containing the javax stuff...

@mccheah
Copy link
Contributor Author

mccheah commented Nov 18, 2015

No because my app would replace org.spark-project.javax.ws.rs with its own javax.ws.rs (which is transitively pulled in by Jersey 2 anyways), and those newer javax.ws.rs classes should be usable by org.spark-project.jersey as well. Although I think this particular part of the compatibility story is something we'll have to confirm in testing, or reason about a bit more precisely.

@vanzin
Copy link
Contributor

vanzin commented Nov 18, 2015

Your app is one app. This needs to work for all apps.

If spark-core doesn't depend on the artifact that has the javax classes, Spark won't work.

@mccheah
Copy link
Contributor Author

mccheah commented Nov 18, 2015

Spark-core would depend on the org.spark-project.javax.ws.rs but I would exclude it in my app specifically, as well as anyone who would use Jersey 2.

@vanzin
Copy link
Contributor

vanzin commented Nov 18, 2015

I see. I somehow missed that you were including the META-INF files in the javax jar.

Anyway, I'm pretty sure the current code in this PR won't work; would you mind updating the bug with your latest summary above, and closing this PR?

@mccheah
Copy link
Contributor Author

mccheah commented Nov 18, 2015

Hm alright. We could also update this PR inline but we can also just link to it for the discussion's sake in the follow-up PR. Closing! Thanks again!

@mccheah mccheah closed this Nov 18, 2015
@robert3005 robert3005 deleted the feature/shading-jersey branch September 24, 2016 04:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants