[SPARK-6369] [SQL] Uses commit coordinator to help committing Hive and Parquet tables #5139

liancheng · 2015-03-23T17:29:05Z

This PR leverages the output commit coordinator introduced in #4066 to help committing Hive and Parquet tables.

This PR extracts output commit code in SparkHadoopWriter.commit to SparkHadoopMapRedUtil.commitTask, and reuses it for committing Parquet and Hive tables on executor side.

TODO

Add tests

SparkQA · 2015-03-23T17:33:13Z

Test build #29004 has started for PR 5139 at commit dfdf3ef.

This patch merges cleanly.

marmbrus · 2015-03-23T18:50:43Z

@aarondav if you have time, I'd appreciate your input here.

SparkQA · 2015-03-23T19:19:43Z

Test build #29004 has finished for PR 5139 at commit dfdf3ef.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

AmplabJenkins · 2015-03-23T19:19:46Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29004/
Test PASSed.

liancheng · 2015-03-29T03:54:16Z

@JoshRosen It would be great if you can also help reviewing this. Thanks in advance!

aarondav · 2015-03-29T18:30:04Z

core/src/main/scala/org/apache/spark/mapred/SparkHadoopMapRedUtil.scala

+      sparkTaskContext.attemptNumber())
+  }
+
+  def commitTask(


Please add documentation to this guy mentioning what it means to commitTask (i.e., we may contact the driver to become authorized to commit to ensure speculative tasks do not override each other, and that this may cause us to abort the task by throwing a CommitDeniedException if we cannot become authorized as such), pointing to the JIRA that this fixes (the original one).

aarondav · 2015-03-29T18:32:15Z

LGTM from my side, but @JoshRosen should confirm the driver side should be happy with this. Only comment was that now that it's extracted and used in a common location, we need to make sure its API is well-documented.

liancheng · 2015-03-30T15:04:00Z

@aarondav Thanks! Added javadoc for this.

SparkQA · 2015-03-30T15:08:19Z

Test build #29405 has started for PR 5139 at commit 9a4b82b.

SparkQA · 2015-03-30T17:01:29Z

Test build #29405 has finished for PR 5139 at commit 9a4b82b.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.
This patch does not change any dependencies.

AmplabJenkins · 2015-03-30T17:01:34Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29405/
Test PASSed.

JoshRosen · 2015-03-30T21:50:36Z

core/src/main/scala/org/apache/spark/mapred/SparkHadoopMapRedUtil.scala

+   * the driver in order to determine whether this attempt can commit (please see SPARK-4879 for
+   * details).
+   *
+   * Commit output coordinator is only contacted when the following two configurations are both set


"Commit output coordinator" -> "Output commit coordinator"

JoshRosen · 2015-03-30T21:59:55Z

LGTM from a Spark core point-of-view. One of the biggest risks here is passing the Long-valued parameters in the wrong order, but it looks like we've done it correctly here. I suppose that another risk might be calling the commit function with values of jobId, splitId, and attemptId that don't match / correspond to the ones used in the MapReduceTaskAttemptContext (that would undermine the whole scheme because the coordination wouldn't necessarily be guarding the right output paths), but our usage here looks fine as far as I can tell.

SparkQA · 2015-03-30T23:47:44Z

Test build #29434 has started for PR 5139 at commit 72eb628.

liancheng · 2015-03-30T23:48:02Z

Thanks @JoshRosen! Fixed the typo. I'm merging this to master and 1.3.

@aarondav

…d Parquet tables This PR leverages the output commit coordinator introduced in #4066 to help committing Hive and Parquet tables. This PR extracts output commit code in `SparkHadoopWriter.commit` to `SparkHadoopMapRedUtil.commitTask`, and reuses it for committing Parquet and Hive tables on executor side. TODO - [ ] Add tests  [<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/apache/spark/5139)  Author: Cheng Lian <lian@databricks.com> Closes #5139 from liancheng/spark-6369 and squashes the following commits: 72eb628 [Cheng Lian] Fixes typo in javadoc 9a4b82b [Cheng Lian] Adds javadoc and addresses @aarondav's comments dfdf3ef [Cheng Lian] Uses commit coordinator to help committing Hive and Parquet tables (cherry picked from commit fde6945) Signed-off-by: Cheng Lian <lian@databricks.com>

SparkQA · 2015-03-31T01:36:55Z

Test build #29434 has finished for PR 5139 at commit 72eb628.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.
This patch does not change any dependencies.

AmplabJenkins · 2015-03-31T01:36:59Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29434/
Test PASSed.

@aarondav

…d Parquet tables This PR leverages the output commit coordinator introduced in apache#4066 to help committing Hive and Parquet tables. This PR extracts output commit code in `SparkHadoopWriter.commit` to `SparkHadoopMapRedUtil.commitTask`, and reuses it for committing Parquet and Hive tables on executor side. TODO - [ ] Add tests  [<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/apache/spark/5139)  Author: Cheng Lian <lian@databricks.com> Closes apache#5139 from liancheng/spark-6369 and squashes the following commits: 72eb628 [Cheng Lian] Fixes typo in javadoc 9a4b82b [Cheng Lian] Adds javadoc and addresses @aarondav's comments dfdf3ef [Cheng Lian] Uses commit coordinator to help committing Hive and Parquet tables

Uses commit coordinator to help committing Hive and Parquet tables

dfdf3ef

aarondav reviewed Mar 29, 2015
View reviewed changes

Adds javadoc and addresses @aarondav's comments

9a4b82b

liancheng force-pushed the spark-6369 branch from 5fabe77 to 9a4b82b Compare March 30, 2015 15:05

JoshRosen reviewed Mar 30, 2015
View reviewed changes

Fixes typo in javadoc

72eb628

liancheng changed the title ~~[SPARK-6369] [SQL] [WIP] Uses commit coordinator to help committing Hive and Parquet tables~~ [SPARK-6369] [SQL] Uses commit coordinator to help committing Hive and Parquet tables Mar 30, 2015

asfgit closed this in fde6945 Mar 30, 2015

liancheng deleted the spark-6369 branch March 30, 2015 23:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-6369] [SQL] Uses commit coordinator to help committing Hive and Parquet tables #5139

[SPARK-6369] [SQL] Uses commit coordinator to help committing Hive and Parquet tables #5139

liancheng commented Mar 23, 2015

SparkQA commented Mar 23, 2015

marmbrus commented Mar 23, 2015

SparkQA commented Mar 23, 2015

AmplabJenkins commented Mar 23, 2015

liancheng commented Mar 29, 2015

aarondav Mar 29, 2015

aarondav commented Mar 29, 2015

liancheng commented Mar 30, 2015

SparkQA commented Mar 30, 2015

SparkQA commented Mar 30, 2015

AmplabJenkins commented Mar 30, 2015

JoshRosen Mar 30, 2015

JoshRosen commented Mar 30, 2015

SparkQA commented Mar 30, 2015

liancheng commented Mar 30, 2015

SparkQA commented Mar 31, 2015

AmplabJenkins commented Mar 31, 2015

[SPARK-6369] [SQL] Uses commit coordinator to help committing Hive and Parquet tables #5139

[SPARK-6369] [SQL] Uses commit coordinator to help committing Hive and Parquet tables #5139

Conversation

liancheng commented Mar 23, 2015

SparkQA commented Mar 23, 2015

marmbrus commented Mar 23, 2015

SparkQA commented Mar 23, 2015

AmplabJenkins commented Mar 23, 2015

liancheng commented Mar 29, 2015

aarondav Mar 29, 2015

Choose a reason for hiding this comment

aarondav commented Mar 29, 2015

liancheng commented Mar 30, 2015

SparkQA commented Mar 30, 2015

SparkQA commented Mar 30, 2015

AmplabJenkins commented Mar 30, 2015

JoshRosen Mar 30, 2015

Choose a reason for hiding this comment

JoshRosen commented Mar 30, 2015

SparkQA commented Mar 30, 2015

liancheng commented Mar 30, 2015

SparkQA commented Mar 31, 2015

AmplabJenkins commented Mar 31, 2015