[SPARK-6847][Core][Streaming]Fix stack overflow issue when updateStateByKey is followed by a checkpointed dstream #10934

zsxwing · 2016-01-27T00:24:34Z

Add a local property to indicate if checkpointing all RDDs that are marked with the checkpoint flag, and enable it in Streaming

…pointed dstream Add a local property to indicate if checkpointing all RDDs that are marked with the checkpoint flag, and enable it in Streaming

zsxwing · 2016-01-27T00:26:12Z

/cc @andrewor14 @tdas

tdas · 2016-01-27T01:23:16Z

core/src/main/scala/org/apache/spark/rdd/RDD.scala

@@ -1535,6 +1535,10 @@ abstract class RDD[T: ClassTag](

  private[spark] var checkpointData: Option[RDDCheckpointData[T]] = None

+  // Whether recursively checkpoint all RDDs that are marked with the checkpoint flag.
+  private val recursiveCheckpoint =
+    Option(sc.getLocalProperty("spark.checkpoint.recursive")).map(_.toBoolean).getOrElse(false)


Well its always recursive. The difference is whether checkpoint all that have been marked or not.

Better name suggestion for this one?

"spark.checkpoint.checkpointAllMarked" ?? @andrewor14 thoughts.

Btw, shouldnt this be a constant variable in some object?

This is a hard one... I think checkpointAllMarkedAncestors is least ambiguous

SparkQA · 2016-01-27T03:01:39Z

Test build #50141 has finished for PR 10934 at commit 36cba8c.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

zsxwing · 2016-01-27T05:30:10Z

retest this please

SparkQA · 2016-01-27T07:34:59Z

Test build #50173 has finished for PR 10934 at commit ef3983b.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

andrewor14 · 2016-01-29T23:21:37Z

core/src/main/scala/org/apache/spark/rdd/RDD.scala

@@ -1535,6 +1535,10 @@ abstract class RDD[T: ClassTag](

  private[spark] var checkpointData: Option[RDDCheckpointData[T]] = None

+  // Whether checkpoint all RDDs that are marked with the checkpoint flag.


We need to expand on this comment:

// Whether to checkpoint all ancestor RDDs that are marked for checkpointing. By default, // we stop as soon as we find the first such RDD, an optimization that allows us to write // less data but is not safe for all workloads. E.g. in streaming we may checkpoint both // an RDD and its parent in every batch, in which case the parent may never be checkpointed // and its lineage never truncated, leading to OOMs in the long run (SPARK-6847).

andrewor14 · 2016-01-29T23:43:38Z

Looks great! I only have documentation and test suggestions.

SparkQA · 2016-01-30T02:23:37Z

Test build #50421 has finished for PR 10934 at commit 97e39c0.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

andrewor14 · 2016-01-30T06:04:41Z

streaming/src/test/scala/org/apache/spark/streaming/CheckpointSuite.scala

+          rdd.count()
+          // Check the two state RDDs are both checkpointed
+          rddsCheckpointed = stateRDDs.size == 2 && stateRDDs.forall(_.isCheckpointed)
+        }


hm indentation is weird here?

andrewor14 · 2016-01-30T06:06:25Z

LGTM, I'll merge this once you address the minor comments

SparkQA · 2016-01-31T01:20:12Z

Test build #50448 has finished for PR 10934 at commit 20e4509.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

andrewor14 · 2016-02-01T19:01:38Z

Merged into master.

andrewor14 · 2016-02-01T19:04:46Z

I did not merge this into 1.6 and before for 2 reasons:

It doesn't merge cleanly, and more importantly
This changes internal semantics and it's not technically a bug

Let me know if you disagree.

Fix stack overflow issue when updateStateByKey is followed by a check…

36cba8c

…pointed dstream Add a local property to indicate if checkpointing all RDDs that are marked with the checkpoint flag, and enable it in Streaming

tdas reviewed Jan 27, 2016
View reviewed changes

Address TD's comments

ef3983b

andrewor14 reviewed Jan 29, 2016
View reviewed changes

Address Andrew's comments

97e39c0

andrewor14 reviewed Jan 30, 2016
View reviewed changes

Rename and fix indentation

20e4509

asfgit closed this in 6075573 Feb 1, 2016

zsxwing deleted the recursive-checkpoint branch February 1, 2016 19:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-6847][Core][Streaming]Fix stack overflow issue when updateStateByKey is followed by a checkpointed dstream #10934

[SPARK-6847][Core][Streaming]Fix stack overflow issue when updateStateByKey is followed by a checkpointed dstream #10934

zsxwing commented Jan 27, 2016

zsxwing commented Jan 27, 2016

tdas Jan 27, 2016

zsxwing Jan 27, 2016

tdas Jan 27, 2016

andrewor14 Jan 29, 2016

SparkQA commented Jan 27, 2016

zsxwing commented Jan 27, 2016

SparkQA commented Jan 27, 2016

andrewor14 Jan 29, 2016

andrewor14 commented Jan 29, 2016

SparkQA commented Jan 30, 2016

andrewor14 Jan 30, 2016

andrewor14 commented Jan 30, 2016

SparkQA commented Jan 31, 2016

andrewor14 commented Feb 1, 2016

andrewor14 commented Feb 1, 2016

		@@ -1535,6 +1535,10 @@ abstract class RDD[T: ClassTag](

		private[spark] var checkpointData: Option[RDDCheckpointData[T]] = None

		// Whether checkpoint all RDDs that are marked with the checkpoint flag.

[SPARK-6847][Core][Streaming]Fix stack overflow issue when updateStateByKey is followed by a checkpointed dstream #10934

[SPARK-6847][Core][Streaming]Fix stack overflow issue when updateStateByKey is followed by a checkpointed dstream #10934

Conversation

zsxwing commented Jan 27, 2016

zsxwing commented Jan 27, 2016

tdas Jan 27, 2016

Choose a reason for hiding this comment

zsxwing Jan 27, 2016

Choose a reason for hiding this comment

tdas Jan 27, 2016

Choose a reason for hiding this comment

andrewor14 Jan 29, 2016

Choose a reason for hiding this comment

SparkQA commented Jan 27, 2016

zsxwing commented Jan 27, 2016

SparkQA commented Jan 27, 2016

andrewor14 Jan 29, 2016

Choose a reason for hiding this comment

andrewor14 commented Jan 29, 2016

SparkQA commented Jan 30, 2016

andrewor14 Jan 30, 2016

Choose a reason for hiding this comment

andrewor14 commented Jan 30, 2016

SparkQA commented Jan 31, 2016

andrewor14 commented Feb 1, 2016

andrewor14 commented Feb 1, 2016