Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-14468] Always enable OutputCommitCoordinator #12244

Closed
wants to merge 4 commits into from

Conversation

andrewor14
Copy link
Contributor

What changes were proposed in this pull request?

OutputCommitCoordinator was introduced to deal with concurrent task attempts racing to write output, leading to data loss or corruption. For more detail, read the JIRA description.

Before: OutputCommitCoordinator is enabled only if speculation is enabled.
After: OutputCommitCoordinator is always enabled.

Users may still disable this through spark.hadoop.outputCommitCoordination.enabled, but they really shouldn't...

How was this patch tested?

OutputCommitCoordinator*Suite

@SparkQA
Copy link

SparkQA commented Apr 7, 2016

Test build #55254 has finished for PR 12244 at commit ba7fc4b.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@andrewor14
Copy link
Contributor Author

@JoshRosen

@JoshRosen
Copy link
Contributor

LGTM.

@SparkQA
Copy link

SparkQA commented Apr 8, 2016

Test build #55255 has finished for PR 12244 at commit 22fafdd.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

asfgit pushed a commit that referenced this pull request Apr 8, 2016
## What changes were proposed in this pull request?

`OutputCommitCoordinator` was introduced to deal with concurrent task attempts racing to write output, leading to data loss or corruption. For more detail, read the [JIRA description](https://issues.apache.org/jira/browse/SPARK-14468).

Before: `OutputCommitCoordinator` is enabled only if speculation is enabled.
After: `OutputCommitCoordinator` is always enabled.

Users may still disable this through `spark.hadoop.outputCommitCoordination.enabled`, but they really shouldn't...

## How was this patch tested?

`OutputCommitCoordinator*Suite`

Author: Andrew Or <andrew@databricks.com>

Closes #12244 from andrewor14/always-occ.

(cherry picked from commit 3e29e37)
Signed-off-by: Andrew Or <andrew@databricks.com>
asfgit pushed a commit that referenced this pull request Apr 8, 2016
## What changes were proposed in this pull request?

`OutputCommitCoordinator` was introduced to deal with concurrent task attempts racing to write output, leading to data loss or corruption. For more detail, read the [JIRA description](https://issues.apache.org/jira/browse/SPARK-14468).

Before: `OutputCommitCoordinator` is enabled only if speculation is enabled.
After: `OutputCommitCoordinator` is always enabled.

Users may still disable this through `spark.hadoop.outputCommitCoordination.enabled`, but they really shouldn't...

## How was this patch tested?

`OutputCommitCoordinator*Suite`

Author: Andrew Or <andrew@databricks.com>

Closes #12244 from andrewor14/always-occ.

(cherry picked from commit 3e29e37)
Signed-off-by: Andrew Or <andrew@databricks.com>
asfgit pushed a commit that referenced this pull request Apr 8, 2016
## What changes were proposed in this pull request?

`OutputCommitCoordinator` was introduced to deal with concurrent task attempts racing to write output, leading to data loss or corruption. For more detail, read the [JIRA description](https://issues.apache.org/jira/browse/SPARK-14468).

Before: `OutputCommitCoordinator` is enabled only if speculation is enabled.
After: `OutputCommitCoordinator` is always enabled.

Users may still disable this through `spark.hadoop.outputCommitCoordination.enabled`, but they really shouldn't...

## How was this patch tested?

`OutputCommitCoordinator*Suite`

Author: Andrew Or <andrew@databricks.com>

Closes #12244 from andrewor14/always-occ.

(cherry picked from commit 3e29e37)
Signed-off-by: Andrew Or <andrew@databricks.com>
@andrewor14
Copy link
Contributor Author

Thanks, merged into master, 1.6, 1.5 and 1.4.

@asfgit asfgit closed this in 3e29e37 Apr 8, 2016
@andrewor14 andrewor14 deleted the always-occ branch April 8, 2016 00:53
@SparkQA
Copy link

SparkQA commented Apr 8, 2016

Test build #55259 has finished for PR 12244 at commit 076f6b5.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

zzcclp pushed a commit to zzcclp/spark that referenced this pull request Apr 8, 2016
## What changes were proposed in this pull request?

`OutputCommitCoordinator` was introduced to deal with concurrent task attempts racing to write output, leading to data loss or corruption. For more detail, read the [JIRA description](https://issues.apache.org/jira/browse/SPARK-14468).

Before: `OutputCommitCoordinator` is enabled only if speculation is enabled.
After: `OutputCommitCoordinator` is always enabled.

Users may still disable this through `spark.hadoop.outputCommitCoordination.enabled`, but they really shouldn't...

## How was this patch tested?

`OutputCommitCoordinator*Suite`

Author: Andrew Or <andrew@databricks.com>

Closes apache#12244 from andrewor14/always-occ.

(cherry picked from commit 3e29e37)
Signed-off-by: Andrew Or <andrew@databricks.com>
(cherry picked from commit 77ebae3)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants