Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-4030] Make destroy public for broadcast variables #2922

Closed
wants to merge 3 commits into from

Conversation

shivaram
Copy link
Contributor

This change makes the destroy function public for broadcast variables. Motivation for the change is described in https://issues.apache.org/jira/browse/SPARK-4030.
This patch also logs where destroy was called from if a broadcast variable is used after destruction.

Also log where destroy was called from if a broadcast variable is used after destruction.
@shivaram
Copy link
Contributor Author

cc @pwendell @rxin for review

@SparkQA
Copy link

SparkQA commented Oct 24, 2014

Test build #22125 has started for PR 2922 at commit e80c1ab.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Oct 24, 2014

Test build #22125 has finished for PR 2922 at commit e80c1ab.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • abstract class Broadcast[T: ClassTag](val id: Long) extends Serializable with Logging

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22125/
Test FAILed.

*/
private[spark] def destroy(blocking: Boolean) {
def destroy(blocking: Boolean) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we only expose a version where blocking is set to true for users? It seems like asynchronous destroy is a bit more complex. @shivaram does your app need the async version?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No - I actually prefer the synchronous version in my applications. I will add another destroy() which is always synchronous and make it public.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay if that's the case I'd propose making destroy() public and keeping destroy(blocking: Boolean) as private. That way we minimize the surface area of public APIs.

@pwendell
Copy link
Contributor

small question - this looks good overall

@pwendell
Copy link
Contributor

oh one thing - can we add a java version of this? should be pretty simple, right?

@@ -60,6 +62,8 @@ abstract class Broadcast[T: ClassTag](val id: Long) extends Serializable {
*/
@volatile private var _isValid = true

private var _destroySite = ""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How useful is it to store this? it only helps in case of an invalid Broadcast instance. (PS you can use string interpolation instead of format in these changes if you care to)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pwendell requested this in the JIRA -- The main reason is that we'd like to make it easy to debug if users call destroy by mistake.

Oh and I used .format to be consistent with the rest of the file. I'm not very sure what our policy is on this ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@srowen yeah I asked for tracking the callsite because we now have this case where someone can try to use a destroyed broadcast (destroyed by e.g. another thread) and it will be very hard for users to debug this. Tracking the callsite has almost no overhead here and it seemed like it might be useful for debugging.

@SparkQA
Copy link

SparkQA commented Oct 27, 2014

Test build #22276 has started for PR 2922 at commit bed9c9d.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Oct 27, 2014

Test build #22276 has finished for PR 2922 at commit bed9c9d.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22276/
Test FAILed.

@SparkQA
Copy link

SparkQA commented Oct 27, 2014

Test build #22277 has started for PR 2922 at commit a11abab.

  • This patch merges cleanly.

@shivaram
Copy link
Contributor Author

@pwendell - I made destroy blocking by default and only made that version public (its not clear we need the non-blocking version to also be public -- we can add it later if required)

Also all the Broadcast stuff in the Java API seems to come directly from the java classes ? Let me know if I missed something.

@pwendell
Copy link
Contributor

Oh right yeah. Great, LGTM.

@SparkQA
Copy link

SparkQA commented Oct 27, 2014

Test build #22277 has finished for PR 2922 at commit a11abab.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22277/
Test PASSed.

@shivaram
Copy link
Contributor Author

Thanks. Merging this

@asfgit asfgit closed this in 9aa340a Oct 27, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants