Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-22328][Core] ClosureCleaner should not miss referenced superclass fields #19556

Closed
wants to merge 7 commits into from

Conversation

viirya
Copy link
Member

@viirya viirya commented Oct 23, 2017

What changes were proposed in this pull request?

When the given closure uses some fields defined in super class, ClosureCleaner can't figure them and don't set it properly. Those fields will be in null values.

How was this patch tested?

Added test.

@SparkQA
Copy link

SparkQA commented Oct 23, 2017

Test build #82971 has finished for PR 19556 at commit 29c5d73.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 23, 2017

Test build #82973 has finished for PR 19556 at commit 6606910.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 23, 2017

Test build #82975 has finished for PR 19556 at commit 5ac7540.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • test(\"SPARK-22328: ClosureCleaner misses referenced superclass fields: case 1\")
  • test(\"SPARK-22328: ClosureCleaner misses referenced superclass fields: case 2\")
  • abstract class TestAbstractClass2 extends Serializable

@SparkQA
Copy link

SparkQA commented Oct 23, 2017

Test build #82987 has finished for PR 19556 at commit da747ca.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

setAccessedFields(outerClass, clone, obj, accessedFields)

var superClass = outerClass.getSuperclass()
while (superClass != null) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be something more like ...

var currentClass = outerClass
do {
  setAccessedFields(currentClass, clone, obj, accessedFields)
  currentClass = currentClass.getSuperclass()
} while (currentClass != null)

Just avoids repeating the key method call here. Same above and below.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Looks good.

@viirya
Copy link
Member Author

viirya commented Oct 24, 2017

cc @cloud-fan for review too.

@SparkQA
Copy link

SparkQA commented Oct 24, 2017

Test build #82999 has finished for PR 19556 at commit 5d7efd1.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@srowen srowen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't feel very expert on the ClosureCleaner, but have been looking at it a lot recently, and this looks reasonable to me at a glance.

/** Initializes the accessed fields for outer classes and their super classes. */
private def initAccessedFields(
accessedFields: Map[Class[_], Set[String]],
outerClasses: Seq[Class[_]]): Unit = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if multiple outer classes have the same parent class?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think from the view of closure, even multiple outer classes have the same parent class, the access of the fields in the parent class shouldn't conflict.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a related test. Please see if it can clarify your concern.

Copy link
Contributor

@jiangxb1987 jiangxb1987 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@SparkQA
Copy link

SparkQA commented Oct 24, 2017

Test build #83016 has finished for PR 19556 at commit de5cbde.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

outerClasses: Seq[Class[_]]): Unit = {
for (cls <- outerClasses) {
var currentClass = cls
do {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel it's better to use while loop here. Programmatically the loop requires currentClass != null, even for the first loop. To completely keep the previous behavior, we can add a assert(cls != null) before the loop.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. Updated.


var currentClass = outerClass
do {
setAccessedFields(currentClass, clone, obj, accessedFields)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assume we have class A and B having the same parent class P. P has 2 fields a and b. The closure accessed A.a and B.b, so when we clone A object, we should only set field a, when we clone B object, we should only set field b. However here seems we set field a and b for A and B object, which is sub-optimal.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems this is also a issue for the outerClasses, maybe I missed something...

Copy link
Member Author

@viirya viirya Oct 25, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems that is true. For a closure that only accessed A.a, we clone the whole A object which contains both a and b fields. This is the fact in existing ClosureCleaner.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As this is not a regression, IIUC, will it block this change?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea let's leave it

@SparkQA
Copy link

SparkQA commented Oct 25, 2017

Test build #83037 has finished for PR 19556 at commit 4d8f91e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • assert(currentClass != null, \"The outer class can't be null.\")

val clone = instantiateClass(outerClass, parent)

var currentClass = outerClass
do {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please update this do while loop too

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. It's late, I will update this and below tomorrow. Thanks.

new FieldAccessFinder(fields, findTransitively, Some(m), visitedMethods), 0)

var currentClass = cl
do {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here too

@viirya
Copy link
Member Author

viirya commented Oct 26, 2017

@cloud-fan Two remaining do while loop are updated.

@SparkQA
Copy link

SparkQA commented Oct 26, 2017

Test build #83069 has finished for PR 19556 at commit e26d093.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • assert(currentClass != null, \"The outer class can't be null.\")
  • assert(currentClass != null, \"The outer class can't be null.\")

@asfgit asfgit closed this in 4f8dc6b Oct 26, 2017
@cloud-fan
Copy link
Contributor

thanks, merging to master/2.2!

asfgit pushed a commit that referenced this pull request Oct 26, 2017
…ass fields

When the given closure uses some fields defined in super class, `ClosureCleaner` can't figure them and don't set it properly. Those fields will be in null values.

Added test.

Author: Liang-Chi Hsieh <viirya@gmail.com>

Closes #19556 from viirya/SPARK-22328.

(cherry picked from commit 4f8dc6b)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
MatthewRBruce pushed a commit to Shopify/spark that referenced this pull request Jul 31, 2018
…ass fields

When the given closure uses some fields defined in super class, `ClosureCleaner` can't figure them and don't set it properly. Those fields will be in null values.

Added test.

Author: Liang-Chi Hsieh <viirya@gmail.com>

Closes apache#19556 from viirya/SPARK-22328.

(cherry picked from commit 4f8dc6b)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
@viirya viirya deleted the SPARK-22328 branch December 27, 2023 18:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants