Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-8604] [SQL] HadoopFsRelation subclasses should set their output format class #6998

Closed
wants to merge 2 commits into from

Conversation

liancheng
Copy link
Contributor

HadoopFsRelation subclasses, especially ParquetRelation2 should set its own output format class, so that the default output committer can be setup correctly when doing appending (where we ignore user defined output committers).

@@ -194,6 +194,16 @@ private[sql] class OrcRelation(
}

override def prepareJobForWrite(job: Job): OutputWriterFactory = {
job.getConfiguration match {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not job.setOutputFormatClass(classOf[OrcNewOutputFormat])?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OrcNewOutputFormat also works in this case, but in general the current ORC data source is implemented upon the old OrcOutputFormat because we are still using Hive 0.13.1 right now. It would be confusing to mix OrcNewOutputFormat here.

@SparkQA
Copy link

SparkQA commented Jun 24, 2015

Test build #35723 has finished for PR 6998 at commit 6db1368.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class TakeOrderedAndProject(

@SparkQA
Copy link

SparkQA commented Jun 25, 2015

Test build #35729 has finished for PR 6998 at commit 9be51d1.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class TakeOrderedAndProject(

asfgit pushed a commit that referenced this pull request Jun 25, 2015
…t format class

`HadoopFsRelation` subclasses, especially `ParquetRelation2` should set its own output format class, so that the default output committer can be setup correctly when doing appending (where we ignore user defined output committers).

Author: Cheng Lian <lian@databricks.com>

Closes #6998 from liancheng/spark-8604 and squashes the following commits:

9be51d1 [Cheng Lian] Adds more comments
6db1368 [Cheng Lian] HadoopFsRelation subclasses should set their output format class

(cherry picked from commit c337844)
Signed-off-by: Cheng Lian <lian@databricks.com>
@asfgit asfgit closed this in c337844 Jun 25, 2015
@liancheng liancheng deleted the spark-8604 branch June 25, 2015 07:08
@liancheng
Copy link
Contributor Author

Merged to master and branch-1.4.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants