Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-7318][Streaming] DStream cleans objects that are not closures #5860

Closed

Conversation

andrewor14
Copy link
Contributor

I added a check in ClosureCleaner#clean to fail fast if this is detected in the future. @tdas

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@SparkQA
Copy link

SparkQA commented May 2, 2015

Test build #31655 has started for PR 5860 at commit 67eeff4.

@SparkQA
Copy link

SparkQA commented May 2, 2015

Test build #31655 has finished for PR 5860 at commit 67eeff4.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • abstract class ShuffleHandle(val shuffleId: Int) extends Serializable

@AmplabJenkins
Copy link

Merged build finished. Test FAILed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/31655/
Test FAILed.

@tdas
Copy link
Contributor

tdas commented May 2, 2015

Fantastic catch! Though this does not merge.

@andrewor14
Copy link
Contributor Author

Yeah I know. I conflicted with myself.

Andrew Or added 2 commits May 2, 2015 17:13
…re-cleaner

Conflicts:
	core/src/main/scala/org/apache/spark/util/ClosureCleaner.scala
@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@SparkQA
Copy link

SparkQA commented May 3, 2015

Test build #31673 has started for PR 5860 at commit 5ee4e25.

@SparkQA
Copy link

SparkQA commented May 3, 2015

Test build #31673 has finished for PR 5860 at commit 5ee4e25.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • abstract class Dialect
    • class DialectException(msg: String, cause: Throwable) extends Exception(msg, cause)

@AmplabJenkins
Copy link

Merged build finished. Test FAILed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/31673/
Test FAILed.

This breaks a valid use case where the user code passes in a case
class into `map`. See ml.NormalizerSuite.
@andrewor14
Copy link
Contributor Author

@pwendell I don't think we can throw an exception if it's not a closure after all. The user code may look like sc.parallelize(1 to 10).map(MyCaseClass), where the "closure" is really a case class MyCaseClass(i: Int). This is failing one of the MLlib tests because this exact usage is observed there. We can log a warning at the most.

@andrewor14 andrewor14 changed the title [SPARK-7318] DStream incorrectly cleans RDD instead of closures [SPARK-7318][Streaming] Streaming cleans objects that are not closures May 3, 2015
@andrewor14 andrewor14 changed the title [SPARK-7318][Streaming] Streaming cleans objects that are not closures [SPARK-7318][Streaming] DStream cleans objects that are not closures May 3, 2015
@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@SparkQA
Copy link

SparkQA commented May 3, 2015

Test build #31681 has started for PR 5860 at commit 8e971d7.

@SparkQA
Copy link

SparkQA commented May 3, 2015

Test build #31681 has finished for PR 5860 at commit 8e971d7.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Merged build finished. Test PASSed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/31681/
Test PASSed.

@@ -179,6 +179,11 @@ private[spark] object ClosureCleaner extends Logging {
cleanTransitively: Boolean,
accessedFields: Map[Class[_], Set[String]]): Unit = {

if (!isClosure(func.getClass)) {
logWarning("Expected a closure; got " + func.getClass.getName)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pwendell Is this okay to log a warning?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason not to make this an assertion? Isn't it simply invalid if we call this on something that is not a closure?

@tdas
Copy link
Contributor

tdas commented May 5, 2015

LGTM, barring @pwendell LGTMs the ClosureCleaner change.

@pwendell
Copy link
Contributor

pwendell commented May 5, 2015

I LGTM too - I had a comment but realize it was already addressed.

@andrewor14
Copy link
Contributor Author

Thanks for all the LGTMs. I'm merging this into master 1.4

asfgit pushed a commit that referenced this pull request May 5, 2015
I added a check in `ClosureCleaner#clean` to fail fast if this is detected in the future. tdas

Author: Andrew Or <andrew@databricks.com>

Closes #5860 from andrewor14/streaming-closure-cleaner and squashes the following commits:

8e971d7 [Andrew Or] Do not throw exception if object to clean is not closure
5ee4e25 [Andrew Or] Fix tests
eed3390 [Andrew Or] Merge branch 'master' of github.com:apache/spark into streaming-closure-cleaner
67eeff4 [Andrew Or] Add tests
a4fa768 [Andrew Or] Clean the closure, not the RDD

(cherry picked from commit 57e9f29)
Signed-off-by: Andrew Or <andrew@databricks.com>
@asfgit asfgit closed this in 57e9f29 May 5, 2015
@andrewor14 andrewor14 deleted the streaming-closure-cleaner branch May 7, 2015 20:27
jeanlyn pushed a commit to jeanlyn/spark that referenced this pull request May 28, 2015
I added a check in `ClosureCleaner#clean` to fail fast if this is detected in the future. tdas

Author: Andrew Or <andrew@databricks.com>

Closes apache#5860 from andrewor14/streaming-closure-cleaner and squashes the following commits:

8e971d7 [Andrew Or] Do not throw exception if object to clean is not closure
5ee4e25 [Andrew Or] Fix tests
eed3390 [Andrew Or] Merge branch 'master' of github.com:apache/spark into streaming-closure-cleaner
67eeff4 [Andrew Or] Add tests
a4fa768 [Andrew Or] Clean the closure, not the RDD
jeanlyn pushed a commit to jeanlyn/spark that referenced this pull request Jun 12, 2015
I added a check in `ClosureCleaner#clean` to fail fast if this is detected in the future. tdas

Author: Andrew Or <andrew@databricks.com>

Closes apache#5860 from andrewor14/streaming-closure-cleaner and squashes the following commits:

8e971d7 [Andrew Or] Do not throw exception if object to clean is not closure
5ee4e25 [Andrew Or] Fix tests
eed3390 [Andrew Or] Merge branch 'master' of github.com:apache/spark into streaming-closure-cleaner
67eeff4 [Andrew Or] Add tests
a4fa768 [Andrew Or] Clean the closure, not the RDD
nemccarthy pushed a commit to nemccarthy/spark that referenced this pull request Jun 19, 2015
I added a check in `ClosureCleaner#clean` to fail fast if this is detected in the future. tdas

Author: Andrew Or <andrew@databricks.com>

Closes apache#5860 from andrewor14/streaming-closure-cleaner and squashes the following commits:

8e971d7 [Andrew Or] Do not throw exception if object to clean is not closure
5ee4e25 [Andrew Or] Fix tests
eed3390 [Andrew Or] Merge branch 'master' of github.com:apache/spark into streaming-closure-cleaner
67eeff4 [Andrew Or] Add tests
a4fa768 [Andrew Or] Clean the closure, not the RDD
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants