-
Notifications
You must be signed in to change notification settings - Fork 28.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-5374][CORE] abstract RDD's DAG graph iteration in DAGScheduler #4134
Conversation
visit(waitingForVisit.pop()) | ||
} | ||
missing.toList | ||
stage.parents.filter(s => getCacheLocs(s.rdd).contains(Nil) && !s.isAvailable) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rdd in getMissingParentStages is not always stage's rdd. stage has a sequence of rdds and stage's rdd is child of stage. if stage's rdd at front of child rdd is cached, this line cannot filter this stage. but getMissingParentStages can complete it.
ping @JoshRosen |
I'm pretty busy with other work at the moment, so it'll be a little while before I can actually review this, but I'd be glad to let Jenkins test it to see whether it uncovers any problems (like I hit in my original patch). Jenkins, this is ok to test. |
Thanks for doing it. I took a quick look at this. While it does reduce the LOC, I feel the change is not necessary and actually makes the code harder to understand with the closures. Do we really want something like this? |
I'll take a deeper look over the weekend, but on a first pass I had a similar reaction to @rxin -- I'm not seeing a lot of benefit in terms of code clarity or maintainability, and we tend to avoid making changes to the DAGScheduler that don't offer significant benefits. |
Can one of the admins verify this patch? |
Closing this one, will do a more meaningful |
There are many methods in
DAGScheduler
that iterate an RDD's DAG graph such asgetParentStages
,getMissingParentStages
and so on. We should abstract this process to reduce code size.