[SPARK-13021][CORE] Fail fast when custom RDDs violate RDD.partition's API contract #10932

JoshRosen · 2016-01-26T23:31:50Z

Spark's Partition and RDD.partitions APIs have a contract which requires custom implementations of RDD.partitions to ensure that for all x, rdd.partitions(x).index == x; in other words, the index reported by a repartition needs to match its position in the partitions array.

If a custom RDD implementation violates this contract, then Spark has the potential to become stuck in an infinite recomputation loop when recomputing a subset of an RDD's partitions, since the tasks that are actually run will not correspond to the missing output partitions that triggered the recomputation. Here's a link to a notebook which demonstrates this problem: https://rawgit.com/JoshRosen/e520fb9a64c1c97ec985/raw/5e8a5aa8d2a18910a1607f0aa4190104adda3424/Violating%2520RDD.partitions%2520contract.html

In order to guard against this infinite loop behavior, this patch modifies Spark so that it fails fast and refuses to compute RDDs' whose partitions violate the API contract.

…s API contract Spark's `Partition` and `RDD.partitions` APIs have a contract which requires custom implementations of `RDD.partitions` to ensure that for all `x`, `rdd.partitions(x).index == x`; in other words, the `index` reported by a repartition needs to match its position in the partitions array. If a custom RDD implementation violates this contract, then Spark has the potential to become stuck in an infinite recomputation loop when recomputing a subset of an RDD's partitions, since the tasks that are actually run will not correspond to the missing output partitions that triggered the recomputation. Here's a link to a notebook which demonstrates this problem: https://rawgit.com/JoshRosen/e520fb9a64c1c97ec985/raw/5e8a5aa8d2a18910a1607f0aa4190104adda3424/Violating%2520RDD.partitions%2520contract.html In order to guard against this infinite loop behavior, I think that Spark should fail-fast and refuse to compute RDDs' whose `partitions` violate the API contract.

RussellSpitzer · 2016-01-26T23:39:18Z

I'm +1 on this in 2.0 :)

JoshRosen · 2016-01-26T23:46:45Z

An open question is whether we want to put this in 1.6.1; this risks breaking user code which happened to work accidentally but also helps to guard against infinite loop behavior.

rxin · 2016-01-26T23:52:25Z

I don't think we should put it in 1.6.x.

SparkQA · 2016-01-27T01:11:39Z

Test build #50137 has finished for PR 10932 at commit 10efe2e.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

JoshRosen · 2016-01-27T01:49:39Z

Jenkins, retest this please.

SparkQA · 2016-01-27T03:55:03Z

Test build #50157 has finished for PR 10932 at commit a0dd1e7.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- class BadRDD[T: ClassTag](prev: RDD[T]) extends RDD[T](prev)

yhuai · 2016-01-27T21:26:47Z

LGTM

yhuai · 2016-01-27T21:27:03Z

Merging to master.

Move test to avoid stopped SparkContext.

a0dd1e7

asfgit closed this in 32f7411 Jan 27, 2016

JoshRosen deleted the SPARK-13021 branch January 27, 2016 21:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-13021][CORE] Fail fast when custom RDDs violate RDD.partition's API contract #10932

[SPARK-13021][CORE] Fail fast when custom RDDs violate RDD.partition's API contract #10932

JoshRosen commented Jan 26, 2016

RussellSpitzer commented Jan 26, 2016

JoshRosen commented Jan 26, 2016

rxin commented Jan 26, 2016

SparkQA commented Jan 27, 2016

JoshRosen commented Jan 27, 2016

SparkQA commented Jan 27, 2016

yhuai commented Jan 27, 2016

yhuai commented Jan 27, 2016

[SPARK-13021][CORE] Fail fast when custom RDDs violate RDD.partition's API contract #10932

[SPARK-13021][CORE] Fail fast when custom RDDs violate RDD.partition's API contract #10932

Conversation

JoshRosen commented Jan 26, 2016

RussellSpitzer commented Jan 26, 2016

JoshRosen commented Jan 26, 2016

rxin commented Jan 26, 2016

SparkQA commented Jan 27, 2016

JoshRosen commented Jan 27, 2016

SparkQA commented Jan 27, 2016

yhuai commented Jan 27, 2016

yhuai commented Jan 27, 2016