-
Notifications
You must be signed in to change notification settings - Fork 28.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SPARK-5358: Rework the classloader impelementation. #4166
Conversation
The fundamental issue is that you can't change the delegation scheme without overriding loadClass (rather than findClass). And, if you override loadClass, you kind of have to do it in Java, because you need a static initializer call to register yourself as parallel-capable.
The fundamental issue is that you can't change the delegation scheme without overriding loadClass (rather than findClass). And, if you override loadClass, you kind of have to do it in Java, because you need a static initializer call to register yourself as parallel-capable. This is an alternative to PR apache#4165. That PR requires Java 1.7. This one sticks with 1.6, and implements the classloader as a trait, because it can.
Can one of the admins verify this patch? |
BTW, I spent a few minutes pondering the deadlock scenario, and the delegation changes you see here. I'm pretty sure we're safe with 1.6-style coarse locking, because no class from a parent classloader will ever be able to refer to any of the child classloader's classes (statically - instances don't really matter). We don't have funky cyclic references between CLs, so I don't think we can have funky deadlocks. |
@MattWhelan I believe you that loadClass is preferable to findClass, but can you articulate why? And maybe as a comment on this SO post so that future people that copy/paste from the 1st answer can know as well? :-) Tangentially, I'd eventually like to port/copy/paste the Hadoop ApplicationClassLoader (which I believe they copy/pasted from Jetty) so that we can ignore any org.apache.spark.* classes that might accidentally be in the user's jar: I'm also half-tempted to copy/paste their ApplicationClassLoader into a separate utility project, so that going forward Jetty/Hadoop/Spark/etc. could all reuse the same tested functionality instead of copy/pasting the code around. But then I'd have to publish to maven central, it'd be another dependency to worry about, etc., etc... |
* (cache, self, parent). This lets this CL "hide" or "override" | ||
* class defs that also exist in the parent loader. | ||
*/ | ||
private[spark] trait GreedyClassLoader extends ClassLoader { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than making this a trait, can you just modify ChildExecutorURLClassLoader to have this logic? We don't need it anywhere else at this point, and this is the main intended goal of ChildExecutorURLClassLoader (to do greedy loading like this).
If you bring this one up to date we may be able to slip it into 1.3. Just let me know, thanks! |
Hey, sorry I was away for a bit. After an extended discussion with Vanzin on his #3233, I'm convinced that PR covers the problem that this one solves. Since that's been merged, let's go with that. |
The fundamental issue is that you can't change the delegation scheme
without overriding loadClass (rather than findClass). And, if you
override loadClass, you kind of have to do it in Java, because you need
a static initializer call to register yourself as parallel-capable.
This is an alternative to PR #4165. That PR requires Java 1.7. This
one sticks with 1.6, and implements the classloader as a trait, because
it can.