Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-5374][CORE] abstract RDD's DAG graph iteration in DAGScheduler #4134

Closed
wants to merge 3 commits into from

Conversation

cloud-fan
Copy link
Contributor

There are many methods in DAGScheduler that iterate an RDD's DAG graph such as getParentStages, getMissingParentStages and so on. We should abstract this process to reduce code size.

visit(waitingForVisit.pop())
}
missing.toList
stage.parents.filter(s => getCacheLocs(s.rdd).contains(Nil) && !s.isAvailable)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rdd in getMissingParentStages is not always stage's rdd. stage has a sequence of rdds and stage's rdd is child of stage. if stage's rdd at front of child rdd is cached, this line cannot filter this stage. but getMissingParentStages can complete it.

@cloud-fan cloud-fan changed the title [SPARK-4654][CORE] Clean up DAGScheduler getMissingParentStages / stageDependsOn methods [SPARK-5374][CORE] abstract RDD's DAG graph iteration in DAGScheduler Jan 23, 2015
@cloud-fan
Copy link
Contributor Author

ping @JoshRosen

@JoshRosen
Copy link
Contributor

I'm pretty busy with other work at the moment, so it'll be a little while before I can actually review this, but I'd be glad to let Jenkins test it to see whether it uncovers any problems (like I hit in my original patch).

Jenkins, this is ok to test.

@rxin
Copy link
Contributor

rxin commented Jan 23, 2015

Thanks for doing it. I took a quick look at this. While it does reduce the LOC, I feel the change is not necessary and actually makes the code harder to understand with the closures. Do we really want something like this?

@markhamstra
Copy link
Contributor

I'll take a deeper look over the weekend, but on a first pass I had a similar reaction to @rxin -- I'm not seeing a lot of benefit in terms of code clarity or maintainability, and we tend to avoid making changes to the DAGScheduler that don't offer significant benefits.

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@cloud-fan
Copy link
Contributor Author

Closing this one, will do a more meaningful DAGScheduler refactor later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
6 participants