-
Notifications
You must be signed in to change notification settings - Fork 28k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-11195][CORE] Use correct classloader for TaskResultGetter #9367
Conversation
Make sure we are using the context classloader when deserializing failed TaskResults instead of the Spark classloader.
Jenkins, this is ok to test. |
I have a manual test that exhibits this behavior in https://issues.apache.org/jira/browse/SPARK-11195 and I am working on adding a test to the repo. In order to test this I basically want to mirror the classloader hierarchy created by |
Jenkins, this is ok to test. |
@brkyvz might know more about testing Spark Submit. |
Will |
|
How about this. In the main method, you can catch the exception and if it is the expected type, let the main method finish successfully. Otherwise, throw an exception, which causes a non-zero exit code. In |
Test build #44655 has finished for PR 9367 at commit
|
This looks good as I'm not otherwise clear why these two blocks would use different classloaders. |
@choochootrain are you able to put together a small test like what @yhuai mentions? then I think this is good to go. |
i'm writing the test right now, is there an easy way to get the relative path to the assembled spark jar so I can compile my job against it? |
@choochootrain I think you can write your app in SparkSubmitSuite (https://github.com/apache/spark/blob/master/core/src/test/scala/org/apache/spark/deploy/SparkSubmitSuite.scala#L590-L614 is an example). |
@yhuai i can't write the test directly in |
@choochootrain I just noticed that this PR is for branch-1.5. Should we also fix it in master? |
@yhuai yep this should also be in master. should I also submit a pr on master when this one looks good or will a maintainer be able to cherry-pick the commit? |
@choochootrain It will be great if you can submit a pr against the master. Our merge script only cherry-pick commit from master to a branch. |
Test build #44987 has finished for PR 9367 at commit
|
This test compiles an external Spark job which throws a user defined exception and asserts that Spark handles the TaskResult deserialization properly.
whoops, fixed the style error. |
|} | ||
""".stripMargin) | ||
// scalastyle:on line.size.limit | ||
val sparkJar = "../assembly/target/scala-2.10/spark-assembly-1.5.1-hadoop2.2.0.jar" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the only remaining issue is that this jar is hardcoded for the maven profiles that i am using. i was experimenting with putting the entire maven target/classes
directory on the classpath but that seems equally janky. any suggestions here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I probably missed something. Why do we need to put spark's assembly jar at here? When we run test, spark's classes are already in class path.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
btw, when you unit test SparkSubmitSuite
, you have to do assembly/assembly
before you can run any of these tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is what our jenkins does.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in order to test this issue, i need to submit an external jar to spark-submit
. i can compile my job it using TestUtils
, but i need to put (some version) of spark on the classpath so that javac
can resolve RDD
and so on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you need to compile using Spark? Can't you just create your own exception, compile it, pass that to Spark Submit. Then move all of the code here down to something like SimpleApplicationTest
, where you create your exception through reflection and then throw it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or maybe I didn't understand the issue that this patch is trying to solve very well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe you can compile a jar that just include your jar and put your app below. In your app, you use reflection to create an instance of your exception.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
all this patch does is make TaskResult
deserialization use Utils.getContextOrSparkClassLoader
(the classloader which loaded the spark-submit
ed jar) instead of Utils.getSparkClassLoader
(this is AppClassLoader
which only has spark classes in it). without this patch, a failed task would not be able to deserialize an exception if it did not exist in Utils.getSparkClassLoader
.
in order to reproduce this issue, i set up a situation where Utils.getContextOnSparkClassLoader
contains MyException
but Utils.getSparkClassLoader
does not (see https://issues.apache.org/jira/browse/SPARK-11195). this is easy to manually test with spark-submit
and a user defined exception, but turning this into an automated test is proving to be much trickier. here are the 3 options:
-
❌ if i place all of the code into
SparkSubmitSuite
, the bug won't be hit becauseMyException
will be in the root classloader and my patch makes no difference. -
❔ if i place all of the code into an external jar and run
spark-submit
, i can set up the same situation as my repro which found this bug. the issue i am running into is that i need a spark classpath in order to compile my jar. i can use the assembled jar, but this changes depending on the maven profiles that are enabled and so on. -
❔ i can try @brkyvz & @yhuai's hybrid approach of putting only the exception into a jar and the rest of the code into
SparkSubmitSuite
. i will have to do the following in order to repro this issue:- load the jar with
MyException
in a new classloader whose parent is the root classloader - somehow allow this classloader to be used by the driver and the executor without changing
Utils.getSparkClassLoader
.
at this point am i not reimplementing
spark-submit
? :) - load the jar with
the final approach is certainly worth trying, i'll take a look at it later today.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should go with the simplest option that reproduces the issue. In other SparkSubmitSuite
tests we used (2) but only out of necessity, where we just prepackage a jar and put it in the test resources dir. This makes it a little hard to maintain, e.g. we need a separate jar for scala-2.11.
In this case, maybe (3) is the simplest and most maintainable. It's unlikely that we'll ever have to modify MyException
, but the reproduction code itself should be kept flexible. Could you give it a try?
Test build #44991 has finished for PR 9367 at commit
|
@yhuai @choochootrain how is this one going? it does seem desirable to test this as much as possible. If we're having to introduce very complex mechanisms to do so and they aren't working, is there anything simpler even if less effective we can do to test it? |
I feel the easiest way is to have something like https://github.com/apache/spark/tree/master/sql/hive/src/test/resources/regression-test-SPARK-8489. So, we will not need to change anything in TestUtils. We just have a jar containing your class and the main object. |
it seems like the SPARK-8489 approach would be less invasive, but also less maintainable. any preferences? i'd like to stick with one and get this out asap. |
My vote is with (3). I feel it requires the least amount of new code that you have to write and is more maintainable. |
I'd like to go with 3. |
should be less invasive now :) |
Test build #45913 has finished for PR 9367 at commit
|
// load the exception from the jar | ||
val loader = new MutableURLClassLoader(new Array[URL](0), Thread.currentThread.getContextClassLoader) | ||
loader.addURL(jarFile.toURI.toURL) | ||
Thread.currentThread().setContextClassLoader(loader) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we set the original loader back?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
++
Test build #46008 timed out for PR 9367 at commit |
Test build #2073 has finished for PR 9367 at commit
|
Test build #2079 has finished for PR 9367 at commit
|
Test build #2080 has finished for PR 9367 at commit
|
@choochootrain We should also open a PR for master, right? |
assert(unknownFailure.findFirstMatchIn(exceptionMessage).isEmpty) | ||
|
||
// reset the classloader to the default value | ||
Thread.currentThread.setContextClassLoader(originalClassLoader) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use the following pattern?
// Get the original classloader
try {
// do our test
} finally {
// reset our classloader
}
So, we will not mess up the classloader even if the test somehow failed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
++
@choochootrain Please open a pr for master branch (this is our typical workflow). So, we can fix it in master and branch-1.6. I'd prefer fixing master and branch 1.6 first and then backport the fix to 1.5. |
sounds good, i'll go ahead and squash the commits and submit a pr to master and branch 1.6 |
Make sure we are using the context classloader when deserializing failed TaskResults instead of the Spark classloader. The issue is that `enqueueFailedTask` was using the incorrect classloader which results in `ClassNotFoundException`. Adds a test in TaskResultGetterSuite that compiles a custom exception, throws it on the executor, and asserts that Spark handles the TaskResult deserialization instead of returning `UnknownReason`. See #9367 for previous comments See SPARK-11195 for a full repro Author: Hurshal Patel <hpatel516@gmail.com> Closes #9779 from choochootrain/spark-11195-master. (cherry picked from commit 3cca5ff) Signed-off-by: Yin Huai <yhuai@databricks.com>
Make sure we are using the context classloader when deserializing failed TaskResults instead of the Spark classloader. The issue is that `enqueueFailedTask` was using the incorrect classloader which results in `ClassNotFoundException`. Adds a test in TaskResultGetterSuite that compiles a custom exception, throws it on the executor, and asserts that Spark handles the TaskResult deserialization instead of returning `UnknownReason`. See apache#9367 for previous comments See SPARK-11195 for a full repro Author: Hurshal Patel <hpatel516@gmail.com> Closes apache#9779 from choochootrain/spark-11195-master.
Make sure we are using the context classloader when deserializing failed TaskResults instead of the Spark classloader. The issue is that `enqueueFailedTask` was using the incorrect classloader which results in `ClassNotFoundException`. Adds a test in TaskResultGetterSuite that compiles a custom exception, throws it on the executor, and asserts that Spark handles the TaskResult deserialization instead of returning `UnknownReason`. See #9367 for previous comments See SPARK-11195 for a full repro Author: Hurshal Patel <hpatel516@gmail.com> Closes #9779 from choochootrain/spark-11195-master. (cherry picked from commit 3cca5ff) Signed-off-by: Yin Huai <yhuai@databricks.com> Conflicts: core/src/main/scala/org/apache/spark/TestUtils.scala
@choochootrain #9779 has been merged. Can you close this one? |
thanks! |
Make sure we are using the context classloader when deserializing failed TaskResults instead of the Spark classloader.