New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-4030] Make destroy public for broadcast variables #2922
Conversation
Also log where destroy was called from if a broadcast variable is used after destruction.
Test build #22125 has started for PR 2922 at commit
|
Test build #22125 has finished for PR 2922 at commit
|
Test FAILed. |
*/ | ||
private[spark] def destroy(blocking: Boolean) { | ||
def destroy(blocking: Boolean) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we only expose a version where blocking is set to true for users? It seems like asynchronous destroy is a bit more complex. @shivaram does your app need the async version?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No - I actually prefer the synchronous version in my applications. I will add another destroy() which is always synchronous and make it public.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay if that's the case I'd propose making destroy()
public and keeping destroy(blocking: Boolean)
as private. That way we minimize the surface area of public APIs.
small question - this looks good overall |
oh one thing - can we add a java version of this? should be pretty simple, right? |
@@ -60,6 +62,8 @@ abstract class Broadcast[T: ClassTag](val id: Long) extends Serializable { | |||
*/ | |||
@volatile private var _isValid = true | |||
|
|||
private var _destroySite = "" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How useful is it to store this? it only helps in case of an invalid Broadcast
instance. (PS you can use string interpolation instead of format
in these changes if you care to)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pwendell requested this in the JIRA -- The main reason is that we'd like to make it easy to debug if users call destroy
by mistake.
Oh and I used .format
to be consistent with the rest of the file. I'm not very sure what our policy is on this ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@srowen yeah I asked for tracking the callsite because we now have this case where someone can try to use a destroyed broadcast (destroyed by e.g. another thread) and it will be very hard for users to debug this. Tracking the callsite has almost no overhead here and it seemed like it might be useful for debugging.
Test build #22276 has started for PR 2922 at commit
|
Test build #22276 has finished for PR 2922 at commit
|
Test FAILed. |
Test build #22277 has started for PR 2922 at commit
|
@pwendell - I made destroy blocking by default and only made that version public (its not clear we need the non-blocking version to also be public -- we can add it later if required) Also all the Broadcast stuff in the Java API seems to come directly from the java classes ? Let me know if I missed something. |
Oh right yeah. Great, LGTM. |
Test build #22277 has finished for PR 2922 at commit
|
Test PASSed. |
Thanks. Merging this |
This change makes the destroy function public for broadcast variables. Motivation for the change is described in https://issues.apache.org/jira/browse/SPARK-4030.
This patch also logs where destroy was called from if a broadcast variable is used after destruction.