Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-6847][Core][Streaming]Fix stack overflow issue when updateStateByKey is followed by a checkpointed dstream #10934

Closed
wants to merge 4 commits into from

Conversation

zsxwing
Copy link
Member

@zsxwing zsxwing commented Jan 27, 2016

Add a local property to indicate if checkpointing all RDDs that are marked with the checkpoint flag, and enable it in Streaming

…pointed dstream

Add a local property to indicate if checkpointing all RDDs that are marked with the checkpoint flag, and enable it in Streaming
@zsxwing
Copy link
Member Author

zsxwing commented Jan 27, 2016

/cc @andrewor14 @tdas

@@ -1535,6 +1535,10 @@ abstract class RDD[T: ClassTag](

private[spark] var checkpointData: Option[RDDCheckpointData[T]] = None

// Whether recursively checkpoint all RDDs that are marked with the checkpoint flag.
private val recursiveCheckpoint =
Option(sc.getLocalProperty("spark.checkpoint.recursive")).map(_.toBoolean).getOrElse(false)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well its always recursive. The difference is whether checkpoint all that have been marked or not.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better name suggestion for this one?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"spark.checkpoint.checkpointAllMarked" ?? @andrewor14 thoughts.

Btw, shouldnt this be a constant variable in some object?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a hard one... I think checkpointAllMarkedAncestors is least ambiguous

@SparkQA
Copy link

SparkQA commented Jan 27, 2016

Test build #50141 has finished for PR 10934 at commit 36cba8c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@zsxwing
Copy link
Member Author

zsxwing commented Jan 27, 2016

retest this please

@SparkQA
Copy link

SparkQA commented Jan 27, 2016

Test build #50173 has finished for PR 10934 at commit ef3983b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@@ -1535,6 +1535,10 @@ abstract class RDD[T: ClassTag](

private[spark] var checkpointData: Option[RDDCheckpointData[T]] = None

// Whether checkpoint all RDDs that are marked with the checkpoint flag.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to expand on this comment:

// Whether to checkpoint all ancestor RDDs that are marked for checkpointing. By default,
// we stop as soon as we find the first such RDD, an optimization that allows us to write
// less data but is not safe for all workloads. E.g. in streaming we may checkpoint both
// an RDD and its parent in every batch, in which case the parent may never be checkpointed
// and its lineage never truncated, leading to OOMs in the long run (SPARK-6847).

@andrewor14
Copy link
Contributor

Looks great! I only have documentation and test suggestions.

@SparkQA
Copy link

SparkQA commented Jan 30, 2016

Test build #50421 has finished for PR 10934 at commit 97e39c0.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

rdd.count()
// Check the two state RDDs are both checkpointed
rddsCheckpointed = stateRDDs.size == 2 && stateRDDs.forall(_.isCheckpointed)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm indentation is weird here?

@andrewor14
Copy link
Contributor

LGTM, I'll merge this once you address the minor comments

@SparkQA
Copy link

SparkQA commented Jan 31, 2016

Test build #50448 has finished for PR 10934 at commit 20e4509.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@andrewor14
Copy link
Contributor

Merged into master.

@asfgit asfgit closed this in 6075573 Feb 1, 2016
@andrewor14
Copy link
Contributor

I did not merge this into 1.6 and before for 2 reasons:

  • It doesn't merge cleanly, and more importantly
  • This changes internal semantics and it's not technically a bug

Let me know if you disagree.

@zsxwing zsxwing deleted the recursive-checkpoint branch February 1, 2016 19:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants