[SPARK-14468] Always enable OutputCommitCoordinator #12244

andrewor14 · 2016-04-07T22:26:16Z

What changes were proposed in this pull request?

OutputCommitCoordinator was introduced to deal with concurrent task attempts racing to write output, leading to data loss or corruption. For more detail, read the JIRA description.

Before: OutputCommitCoordinator is enabled only if speculation is enabled.
After: OutputCommitCoordinator is always enabled.

Users may still disable this through spark.hadoop.outputCommitCoordination.enabled, but they really shouldn't...

How was this patch tested?

OutputCommitCoordinator*Suite

SparkQA · 2016-04-07T22:28:38Z

Test build #55254 has finished for PR 12244 at commit ba7fc4b.

This patch fails Scala style tests.
This patch merges cleanly.
This patch adds no public classes.

andrewor14 · 2016-04-07T22:40:44Z

@JoshRosen

JoshRosen · 2016-04-08T00:41:30Z

LGTM.

SparkQA · 2016-04-08T00:46:02Z

Test build #55255 has finished for PR 12244 at commit 22fafdd.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

## What changes were proposed in this pull request? `OutputCommitCoordinator` was introduced to deal with concurrent task attempts racing to write output, leading to data loss or corruption. For more detail, read the [JIRA description](https://issues.apache.org/jira/browse/SPARK-14468). Before: `OutputCommitCoordinator` is enabled only if speculation is enabled. After: `OutputCommitCoordinator` is always enabled. Users may still disable this through `spark.hadoop.outputCommitCoordination.enabled`, but they really shouldn't... ## How was this patch tested? `OutputCommitCoordinator*Suite` Author: Andrew Or <andrew@databricks.com> Closes #12244 from andrewor14/always-occ. (cherry picked from commit 3e29e37) Signed-off-by: Andrew Or <andrew@databricks.com>

andrewor14 · 2016-04-08T00:50:56Z

Thanks, merged into master, 1.6, 1.5 and 1.4.

SparkQA · 2016-04-08T00:55:47Z

Test build #55259 has finished for PR 12244 at commit 076f6b5.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

## What changes were proposed in this pull request? `OutputCommitCoordinator` was introduced to deal with concurrent task attempts racing to write output, leading to data loss or corruption. For more detail, read the [JIRA description](https://issues.apache.org/jira/browse/SPARK-14468). Before: `OutputCommitCoordinator` is enabled only if speculation is enabled. After: `OutputCommitCoordinator` is always enabled. Users may still disable this through `spark.hadoop.outputCommitCoordination.enabled`, but they really shouldn't... ## How was this patch tested? `OutputCommitCoordinator*Suite` Author: Andrew Or <andrew@databricks.com> Closes apache#12244 from andrewor14/always-occ. (cherry picked from commit 3e29e37) Signed-off-by: Andrew Or <andrew@databricks.com> (cherry picked from commit 77ebae3)

Do not tie OutputCommitCoordinator to speculation

ba7fc4b

Super ensure we use coordinator in test suites

8b207af

Andrew Or added 2 commits April 7, 2016 15:41

Fix style

22fafdd

Revert change to minimize conflicts

076f6b5

asfgit closed this in 3e29e37 Apr 8, 2016

andrewor14 deleted the always-occ branch April 8, 2016 00:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-14468] Always enable OutputCommitCoordinator #12244

[SPARK-14468] Always enable OutputCommitCoordinator #12244

andrewor14 commented Apr 7, 2016

SparkQA commented Apr 7, 2016

andrewor14 commented Apr 7, 2016

JoshRosen commented Apr 8, 2016

SparkQA commented Apr 8, 2016

andrewor14 commented Apr 8, 2016

SparkQA commented Apr 8, 2016

[SPARK-14468] Always enable OutputCommitCoordinator #12244

[SPARK-14468] Always enable OutputCommitCoordinator #12244

Conversation

andrewor14 commented Apr 7, 2016

What changes were proposed in this pull request?

How was this patch tested?

SparkQA commented Apr 7, 2016

andrewor14 commented Apr 7, 2016

JoshRosen commented Apr 8, 2016

SparkQA commented Apr 8, 2016

andrewor14 commented Apr 8, 2016

SparkQA commented Apr 8, 2016