Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-18646][REPL] Set parent classloader as null for ExecutorClassLoader #17074

Closed
wants to merge 7 commits into from
Closed

[SPARK-18646][REPL] Set parent classloader as null for ExecutorClassLoader #17074

wants to merge 7 commits into from

Conversation

taroplus
Copy link
Contributor

@taroplus taroplus commented Feb 26, 2017

What changes were proposed in this pull request?

By default, a class loader will have java's system class loader as its parent, however this ExecutorClassLoader should always resolve classes from given 'paren't class loader.

Problem happens when an application has a class loader structure like below

System-Classloader [ClassA, Runtime]
|
Application-Classloader [ClassA, Spark, Jackson etc etc] : parent
|
Executor Class Loader

The application serializes ClassA using Application class loader however Spark deserializes the class using System Class loader, this leads to various problems.

The change here to pass null to the parent class so that a link between ExecutorClassLoader and SystemClassLoader no longer exists.

For most of situations, the parent class loader is the system class loader so this wouldn't have any impact.

How was this patch tested?

UTs

This change is to address a classloader problem under sbt environment
more details can be found in Jira [https://issues.apache.org/jira/browse/SPARK-19675]
@taroplus
Copy link
Contributor Author

@holdenk @zsxwing

@holdenk
Copy link
Contributor

holdenk commented Feb 26, 2017

Thanks for working on this. It's been awhile since I looked at the classloader bits so I'd need to refresh my memory a bit, but regardless this PR probably needs better tests - specifically including something which wasn't working before and does work with the change.

@taroplus
Copy link
Contributor Author

@holdenk
I believe this was introduced in this change
fa0524f#diff-bb538fda94224dd0af01d0fd7e1b4ea0R39

the change is to stop passing 'parent' to the parent class(ClassLoader), however this introduced a link between ExecutorClassLoader to SystemClassLoader.

Most of the time, SystemClassLoader is parent loader itself. Then it causes issues with userClassPathFirst (SPARK-18646) and if it is not parent loader, it causes issues like InvalidClassException(SPARK-19675)

I added a UT for this

@vanzin
Copy link
Contributor

vanzin commented Mar 3, 2017

ok to test

@SparkQA
Copy link

SparkQA commented Mar 3, 2017

Test build #73796 has finished for PR 17074 at commit 13b0167.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 3, 2017

Test build #73799 has finished for PR 17074 at commit d66ad0c.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 3, 2017

Test build #73801 has finished for PR 17074 at commit 92c810f.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 3, 2017

Test build #73805 has started for PR 17074 at commit f9204c2.

@taroplus
Copy link
Contributor Author

taroplus commented Mar 4, 2017

@sknapp
Why my build was canceled ?

@jiangxb1987
Copy link
Contributor

retest this please

@SparkQA
Copy link

SparkQA commented Jun 19, 2017

Test build #78258 has finished for PR 17074 at commit f9204c2.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Contributor

@jiangxb1987 jiangxb1987 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR looks good overall, only have some minor comments.

@@ -44,7 +44,7 @@ class ExecutorClassLoader(
env: SparkEnv,
classUri: String,
parent: ClassLoader,
userClassPathFirst: Boolean) extends ClassLoader with Logging {
userClassPathFirst: Boolean) extends ClassLoader(null) with Logging {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add comment over this to explain why we are setting a null value to parent?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added to the description of this class

assert(scalaDir.mkdir(), s"Failed to create 'scala' directory in $tempDir1")

// move the generated class into scala dir
val filename = "Option.class"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use createCompiledClass() here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's because createCompiledClass doesn't handle package name, it needs to put the generated class under 'scala' package and also it needs to move the generated class to 'scala' folder.

Since createCompiledClass is widely used, I didn't want to touch it just for this very unique situation.

@jiangxb1987
Copy link
Contributor

Also cc @cloud-fan

@jiangxb1987
Copy link
Contributor

jiangxb1987 commented Jun 21, 2017

In the jira, you mentioned ExecutorClassLoader.getParent() will return the unexpected class loader, could you explain where we call this? At least I didn't see them in Spark codebase(neither current master branch nor branch 1.6)

@SparkQA
Copy link

SparkQA commented Jun 21, 2017

Test build #78368 has finished for PR 17074 at commit f9f9770.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@taroplus
Copy link
Contributor Author

taroplus commented Jun 21, 2017

SPARK-18646 is not the one I created, SPARK-19675 is the one I filed. But it kept getting closed again and again, so I linked my PR to SPARK-18646 which is still open.

But I think I know what it means. The author of that jira suggested that the fix would be to pass parent class loader to the constructor. If we call ExecutorClassLoader.getParent via debugger or something, it returns SystemClassLoader which is wrong and the person says that it should be the 'parent' class loader.

Original change (fa0524f), which introduced this issue, was not to pass parent to the constructor so that this class loader takes precedence over parent class loader, however it looks to me that the change forgot to set parent as null then if system class loader has something different class from the current or parent class loader, problems like this can happen.

I'm not sure if I explained well, .. let me know if you need more info

Copy link
Contributor

@jiangxb1987 jiangxb1987 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This approach looks good to merge, and it would be even better if a concise example of how this is causing a failure in production environment could be provided.

@@ -38,13 +38,15 @@ import org.apache.spark.util.{ParentClassLoader, Utils}
* Allows the user to specify if user class path should be first.
* This class loader delegates getting/finding resources to parent loader,
* which makes sense until REPL never provide resource dynamically.
* This class does not set parent classloader since this class loader
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about rephrase to:

[[ClassLoader]] will preferentially load class from parent. Only when parent is null or the load failed, that it will call the overridden `findClass` function. To avoid the potential issue caused by loading class using inappropriate class loader, we should set the parent of ClassLoader to null, so that we can fully control which class loader is used.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also add a empty line before this statement.

@jiangxb1987
Copy link
Contributor

ping @taroplus

asfgit pushed a commit that referenced this pull request Jul 13, 2017
…oader

## What changes were proposed in this pull request?

`ClassLoader` will preferentially load class from `parent`. Only when `parent` is null or the load failed, that it will call the overridden `findClass` function. To avoid the potential issue caused by loading class using inappropriate class loader, we should set the `parent` of `ClassLoader` to null, so that we can fully control which class loader is used.

This is take over of #17074,  the primary author of this PR is taroplus .

Should close #17074 after this PR get merged.

## How was this patch tested?

Add test case in `ExecutorClassLoaderSuite`.

Author: Kohki Nishio <taroplus@me.com>
Author: Xingbo Jiang <xingbo.jiang@databricks.com>

Closes #18614 from jiangxb1987/executor_classloader.

(cherry picked from commit e08d06b)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
@asfgit asfgit closed this in e08d06b Jul 13, 2017
MatthewRBruce pushed a commit to Shopify/spark that referenced this pull request Jul 31, 2018
…oader

## What changes were proposed in this pull request?

`ClassLoader` will preferentially load class from `parent`. Only when `parent` is null or the load failed, that it will call the overridden `findClass` function. To avoid the potential issue caused by loading class using inappropriate class loader, we should set the `parent` of `ClassLoader` to null, so that we can fully control which class loader is used.

This is take over of apache#17074,  the primary author of this PR is taroplus .

Should close apache#17074 after this PR get merged.

## How was this patch tested?

Add test case in `ExecutorClassLoaderSuite`.

Author: Kohki Nishio <taroplus@me.com>
Author: Xingbo Jiang <xingbo.jiang@databricks.com>

Closes apache#18614 from jiangxb1987/executor_classloader.

(cherry picked from commit e08d06b)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
rdblue pushed a commit to rdblue/spark that referenced this pull request Apr 3, 2019
…oader

## What changes were proposed in this pull request?

`ClassLoader` will preferentially load class from `parent`. Only when `parent` is null or the load failed, that it will call the overridden `findClass` function. To avoid the potential issue caused by loading class using inappropriate class loader, we should set the `parent` of `ClassLoader` to null, so that we can fully control which class loader is used.

This is take over of apache#17074,  the primary author of this PR is taroplus .

Should close apache#17074 after this PR get merged.

## How was this patch tested?

Add test case in `ExecutorClassLoaderSuite`.

Author: Kohki Nishio <taroplus@me.com>
Author: Xingbo Jiang <xingbo.jiang@databricks.com>

Closes apache#18614 from jiangxb1987/executor_classloader.
@taroplus taroplus deleted the executor_classloader branch September 23, 2021 04:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants