Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-11195][CORE] Use correct classloader for TaskResultGetter #9367

Closed
wants to merge 4 commits into from
Closed

[SPARK-11195][CORE] Use correct classloader for TaskResultGetter #9367

wants to merge 4 commits into from

Conversation

choochootrain
Copy link

Make sure we are using the context classloader when deserializing failed TaskResults instead of the Spark classloader.

Make sure we are using the context classloader when deserializing failed
TaskResults instead of the Spark classloader.
@JoshRosen
Copy link
Contributor

Jenkins, this is ok to test.

@choochootrain
Copy link
Author

I have a manual test that exhibits this behavior in https://issues.apache.org/jira/browse/SPARK-11195 and I am working on adding a test to the repo.

In order to test this I basically want to mirror the classloader hierarchy created by spark-submit - are there any conventions or existing tests which do something like this that I can look at?

@JoshRosen
Copy link
Contributor

Jenkins, this is ok to test.

@JoshRosen
Copy link
Contributor

@brkyvz might know more about testing Spark Submit.

@yhuai
Copy link
Contributor

yhuai commented Oct 30, 2015

Will SparkSubmitSuite help?

@choochootrain
Copy link
Author

SparkSubmitSuite is helpful, but I want to catch and assert the type of exception that is thrown when the job fails - calling into doMain in SparkSubmit.scala would be closer?

@yhuai
Copy link
Contributor

yhuai commented Oct 30, 2015

How about this. In the main method, you can catch the exception and if it is the expected type, let the main method finish successfully. Otherwise, throw an exception, which causes a non-zero exit code. In runSparkSubmit, we will find that the exit code is not 0 and fail the test.

@SparkQA
Copy link

SparkQA commented Oct 30, 2015

Test build #44655 has finished for PR 9367 at commit 90c47aa.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@srowen
Copy link
Member

srowen commented Oct 30, 2015

This looks good as I'm not otherwise clear why these two blocks would use different classloaders.

@srowen
Copy link
Member

srowen commented Nov 1, 2015

@choochootrain are you able to put together a small test like what @yhuai mentions? then I think this is good to go.

@choochootrain
Copy link
Author

i'm writing the test right now, is there an easy way to get the relative path to the assembled spark jar so I can compile my job against it?

@yhuai
Copy link
Contributor

yhuai commented Nov 2, 2015

@choochootrain
Copy link
Author

@yhuai i can't write the test directly in SparkSubmitSuite because this is a classloader issue that only repros (as far as I can tell) when an external jar is loaded. I'm adding a test that uses TestUtils.createCompiledClass to compile and submit my external jar.

@yhuai
Copy link
Contributor

yhuai commented Nov 3, 2015

@choochootrain I just noticed that this PR is for branch-1.5. Should we also fix it in master?

@choochootrain
Copy link
Author

@yhuai yep this should also be in master. should I also submit a pr on master when this one looks good or will a maintainer be able to cherry-pick the commit?

@yhuai
Copy link
Contributor

yhuai commented Nov 3, 2015

@choochootrain It will be great if you can submit a pr against the master. Our merge script only cherry-pick commit from master to a branch.

@SparkQA
Copy link

SparkQA commented Nov 4, 2015

Test build #44987 has finished for PR 9367 at commit c63ca09.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

This test compiles an external Spark job which throws a user defined
exception and asserts that Spark handles the TaskResult deserialization
properly.
@choochootrain
Copy link
Author

whoops, fixed the style error.

|}
""".stripMargin)
// scalastyle:on line.size.limit
val sparkJar = "../assembly/target/scala-2.10/spark-assembly-1.5.1-hadoop2.2.0.jar"
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the only remaining issue is that this jar is hardcoded for the maven profiles that i am using. i was experimenting with putting the entire maven target/classes directory on the classpath but that seems equally janky. any suggestions here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I probably missed something. Why do we need to put spark's assembly jar at here? When we run test, spark's classes are already in class path.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw, when you unit test SparkSubmitSuite, you have to do assembly/assembly before you can run any of these tests.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is what our jenkins does.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in order to test this issue, i need to submit an external jar to spark-submit. i can compile my job it using TestUtils, but i need to put (some version) of spark on the classpath so that javac can resolve RDD and so on.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you need to compile using Spark? Can't you just create your own exception, compile it, pass that to Spark Submit. Then move all of the code here down to something like SimpleApplicationTest, where you create your exception through reflection and then throw it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or maybe I didn't understand the issue that this patch is trying to solve very well

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe you can compile a jar that just include your jar and put your app below. In your app, you use reflection to create an instance of your exception.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all this patch does is make TaskResult deserialization use Utils.getContextOrSparkClassLoader (the classloader which loaded the spark-submited jar) instead of Utils.getSparkClassLoader (this is AppClassLoader which only has spark classes in it). without this patch, a failed task would not be able to deserialize an exception if it did not exist in Utils.getSparkClassLoader.

in order to reproduce this issue, i set up a situation where Utils.getContextOnSparkClassLoader contains MyException but Utils.getSparkClassLoader does not (see https://issues.apache.org/jira/browse/SPARK-11195). this is easy to manually test with spark-submit and a user defined exception, but turning this into an automated test is proving to be much trickier. here are the 3 options:

  • ❌ if i place all of the code into SparkSubmitSuite, the bug won't be hit because MyException will be in the root classloader and my patch makes no difference.

  • ❔ if i place all of the code into an external jar and run spark-submit, i can set up the same situation as my repro which found this bug. the issue i am running into is that i need a spark classpath in order to compile my jar. i can use the assembled jar, but this changes depending on the maven profiles that are enabled and so on.

  • ❔ i can try @brkyvz & @yhuai's hybrid approach of putting only the exception into a jar and the rest of the code into SparkSubmitSuite. i will have to do the following in order to repro this issue:

    • load the jar with MyException in a new classloader whose parent is the root classloader
    • somehow allow this classloader to be used by the driver and the executor without changing Utils.getSparkClassLoader.

    at this point am i not reimplementing spark-submit? :)

the final approach is certainly worth trying, i'll take a look at it later today.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should go with the simplest option that reproduces the issue. In other SparkSubmitSuite tests we used (2) but only out of necessity, where we just prepackage a jar and put it in the test resources dir. This makes it a little hard to maintain, e.g. we need a separate jar for scala-2.11.

In this case, maybe (3) is the simplest and most maintainable. It's unlikely that we'll ever have to modify MyException, but the reproduction code itself should be kept flexible. Could you give it a try?

@SparkQA
Copy link

SparkQA commented Nov 4, 2015

Test build #44991 has finished for PR 9367 at commit 53f7c4c.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@srowen
Copy link
Member

srowen commented Nov 10, 2015

@yhuai @choochootrain how is this one going? it does seem desirable to test this as much as possible. If we're having to introduce very complex mechanisms to do so and they aren't working, is there anything simpler even if less effective we can do to test it?

@yhuai
Copy link
Contributor

yhuai commented Nov 10, 2015

I feel the easiest way is to have something like https://github.com/apache/spark/tree/master/sql/hive/src/test/resources/regression-test-SPARK-8489. So, we will not need to change anything in TestUtils. We just have a jar containing your class and the main object.

@choochootrain
Copy link
Author

@srowen @andrewor14 @yhuai

it seems like the SPARK-8489 approach would be less invasive, but also less maintainable. any preferences? i'd like to stick with one and get this out asap.

@brkyvz
Copy link
Contributor

brkyvz commented Nov 11, 2015

My vote is with (3). I feel it requires the least amount of new code that you have to write and is more maintainable.

@yhuai
Copy link
Contributor

yhuai commented Nov 11, 2015

I'd like to go with 3.

@choochootrain
Copy link
Author

should be less invasive now :)
note that i still have to make some minor changes to TestUtils in order for the jar to be in the correct format.

@SparkQA
Copy link

SparkQA commented Nov 14, 2015

Test build #45913 has finished for PR 9367 at commit 71f8df9.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

// load the exception from the jar
val loader = new MutableURLClassLoader(new Array[URL](0), Thread.currentThread.getContextClassLoader)
loader.addURL(jarFile.toURI.toURL)
Thread.currentThread().setContextClassLoader(loader)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we set the original loader back?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++

@SparkQA
Copy link

SparkQA commented Nov 16, 2015

Test build #46008 timed out for PR 9367 at commit fadf2ca after a configured wait of 175m.

@SparkQA
Copy link

SparkQA commented Nov 17, 2015

Test build #2073 has finished for PR 9367 at commit fadf2ca.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 17, 2015

Test build #2079 has finished for PR 9367 at commit fadf2ca.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 17, 2015

Test build #2080 has finished for PR 9367 at commit fadf2ca.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@yhuai
Copy link
Contributor

yhuai commented Nov 17, 2015

@choochootrain We should also open a PR for master, right?

assert(unknownFailure.findFirstMatchIn(exceptionMessage).isEmpty)

// reset the classloader to the default value
Thread.currentThread.setContextClassLoader(originalClassLoader)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use the following pattern?

// Get the original classloader
try {
// do our test
} finally {
// reset our classloader
}

So, we will not mess up the classloader even if the test somehow failed.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++

@yhuai
Copy link
Contributor

yhuai commented Nov 17, 2015

@choochootrain Please open a pr for master branch (this is our typical workflow). So, we can fix it in master and branch-1.6. I'd prefer fixing master and branch 1.6 first and then backport the fix to 1.5.

@choochootrain
Copy link
Author

sounds good, i'll go ahead and squash the commits and submit a pr to master and branch 1.6

asfgit pushed a commit that referenced this pull request Nov 18, 2015
Make sure we are using the context classloader when deserializing failed TaskResults instead of the Spark classloader.

The issue is that `enqueueFailedTask` was using the incorrect classloader which results in `ClassNotFoundException`.

Adds a test in TaskResultGetterSuite that compiles a custom exception, throws it on the executor, and asserts that Spark handles the TaskResult deserialization instead of returning `UnknownReason`.

See #9367 for previous comments
See SPARK-11195 for a full repro

Author: Hurshal Patel <hpatel516@gmail.com>

Closes #9779 from choochootrain/spark-11195-master.

(cherry picked from commit 3cca5ff)
Signed-off-by: Yin Huai <yhuai@databricks.com>
ghost pushed a commit to dbtsai/spark that referenced this pull request Nov 18, 2015
Make sure we are using the context classloader when deserializing failed TaskResults instead of the Spark classloader.

The issue is that `enqueueFailedTask` was using the incorrect classloader which results in `ClassNotFoundException`.

Adds a test in TaskResultGetterSuite that compiles a custom exception, throws it on the executor, and asserts that Spark handles the TaskResult deserialization instead of returning `UnknownReason`.

See apache#9367 for previous comments
See SPARK-11195 for a full repro

Author: Hurshal Patel <hpatel516@gmail.com>

Closes apache#9779 from choochootrain/spark-11195-master.
asfgit pushed a commit that referenced this pull request Nov 18, 2015
Make sure we are using the context classloader when deserializing failed TaskResults instead of the Spark classloader.

The issue is that `enqueueFailedTask` was using the incorrect classloader which results in `ClassNotFoundException`.

Adds a test in TaskResultGetterSuite that compiles a custom exception, throws it on the executor, and asserts that Spark handles the TaskResult deserialization instead of returning `UnknownReason`.

See #9367 for previous comments
See SPARK-11195 for a full repro

Author: Hurshal Patel <hpatel516@gmail.com>

Closes #9779 from choochootrain/spark-11195-master.

(cherry picked from commit 3cca5ff)
Signed-off-by: Yin Huai <yhuai@databricks.com>

Conflicts:
	core/src/main/scala/org/apache/spark/TestUtils.scala
@yhuai
Copy link
Contributor

yhuai commented Nov 18, 2015

@choochootrain #9779 has been merged. Can you close this one?

@choochootrain
Copy link
Author

thanks!

@choochootrain choochootrain deleted the spark-11195 branch November 18, 2015 18:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
7 participants