-
Notifications
You must be signed in to change notification settings - Fork 29.1k
[SPARK-27656][GraphX] Safely register class for GraphX #24555
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-27656][GraphX] Safely register class for GraphX #24555
Conversation
|
Test build #105250 has finished for PR 24555 at commit
|
srowen
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the upside -- that GraphX classes won't work if registration is enforced?
| classOf[HighlyCompressedMapStatus], | ||
| classOf[BitSet], | ||
| classOf[CompactBuffer[_]], | ||
| classOf[OpenHashSet[Int]], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these actually different types, if the generic type is all that varies?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Like https://github.com/apache/spark/blob/master/graphx/src/main/scala/org/apache/spark/graphx/GraphXUtils.scala#L45, I think they are different, since type specialization is used in https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/util/collection/OpenHashSet.scala#L44
| // We load them safely, ignore it if the class not found. | ||
| Seq( | ||
| "org.apache.spark.graphx.Edge", | ||
| "org.apache.spark.graphx.Edge$mcB$sp", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are synthetic classes and their name may change. Are they needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, they are needed if we want to register Edge, since type specialization is used in https://github.com/apache/spark/blob/master/graphx/src/main/scala/org/apache/spark/graphx/Edge.scala#L32
I had test this, if we do not reigster org.apache.spark.graphx.Edge$mcB$sp, Edge[Boolean] will not be handled by kryo.
| import org.apache.spark.serializer.KryoSerializer | ||
|
|
||
| class GraphXPrimitiveKeyOpenHashMapSuite extends SparkFunSuite { | ||
| test("Kryo class register") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, @zhengruifeng . This seems to fail on Jenkins. Is this tested in your local environment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did not test this class in local env, it seems that too many anonymous classes are envolved here. I will remove this.
|
Test build #105273 has finished for PR 24555 at commit
|
|
Just for completeness, what does this achieve? GraphX doesn't otherwise work at all with registration enforced? and it's faster when it's not enforced? |
|
@srowen I locally tested on some small datasets, and find the difference is tiny. |
What changes were proposed in this pull request?
Safely register class for GraphX
How was this patch tested?
added suites