Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-5484][GraphX] Checkpoint every 25 iterations in Pregel #4273

Closed
wants to merge 1 commit into from

Conversation

ankurdave
Copy link
Contributor

Pregel-based iterative algorithms with more than ~50 iterations begin to slow down and eventually fail with a StackOverflowError due to Spark's lack of support for long lineage chains.

This PR causes Pregel to checkpoint the graph every 25 iterations if the checkpoint directory is set.

@ankurdave
Copy link
Contributor Author

Some users might not want to checkpoint even if their checkpoint directory is set, but I don't a good way to expose that as an option without breaking binary compatibility. @rxin

@SparkQA
Copy link

SparkQA commented Jan 29, 2015

Test build #26329 has finished for PR 4273 at commit 48364e6.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@rxin
Copy link
Contributor

rxin commented Jan 29, 2015

You can add a new parameter using a different method name, can't you?

@maropu
Copy link
Member

maropu commented Feb 4, 2015

How about adding a new configuration, e.g., spark.graphx.pregel.checkpoint.interval in SparkConf?

@maropu
Copy link
Member

maropu commented Feb 4, 2015

And also, this issue seems to be related to SPARK-5561.

@sryza
Copy link
Contributor

sryza commented Mar 11, 2015

Mind tagging this [GRAPHX] so it can get sorted properly?

@ankurdave ankurdave changed the title [SPARK-5484] Checkpoint every 25 iterations in Pregel [SPARK-5484][GraphX] Checkpoint every 25 iterations in Pregel Mar 11, 2015
@@ -139,6 +141,14 @@ object Pregel extends Logging {
// get to send messages. We must cache messages so it can be materialized on the next line,
// allowing us to uncache the previous iteration.
messages = g.mapReduceTriplets(sendMsg, mergeMsg, Some((newVerts, activeDirection))).cache()

if (checkpoint && i % checkpointFrequency == checkpointFrequency - 1) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should output a log if the checkpoint is not active when the checkpoint dir is not set

@srowen
Copy link
Member

srowen commented Apr 15, 2015

This seems sort of stalled, and https://issues.apache.org/jira/browse/SPARK-5561 suggests this should be accomplished with a more generalized solution anyway. Is it best to close this?

@ankurdave
Copy link
Contributor Author

Yeah, let's close this.

@ankurdave ankurdave closed this Apr 15, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants