Skip to content

[WIP][SPARK-36995][SQL] Add SQLFileCommitProtocol#34271

Closed
AngersZhuuuu wants to merge 1 commit intoapache:masterfrom
AngersZhuuuu:SPARK-36995
Closed

[WIP][SPARK-36995][SQL] Add SQLFileCommitProtocol#34271
AngersZhuuuu wants to merge 1 commit intoapache:masterfrom
AngersZhuuuu:SPARK-36995

Conversation

@AngersZhuuuu
Copy link
Copy Markdown
Contributor

What changes were proposed in this pull request?

Currently we use FileCommitProtocol for all insert job, but for SQL part, we should support more strategy such as writing data to stagingdir then remove to target path, with such commit protocol will support more case of DataSource Insert.

I have plan 4 step to do:

  1. Add SQLFileCommitProtocol inherit from FileCommitProtocol and use this in all SQL module to replace FileCommitProtocol. And in this protocol we add two necessary API getOutputPath & getWorkPath like PathOutputCommitter, Add SQL related logic to SQLHadoopMapReduceCommitProtocol and make it inherit from SQLCOmmitProtocol
  2. Make dynamic partition insert 's stagingdir can be customized by user.
  3. Add new commit protocol that support writing data to stagingdir then move to target path

Why are the changes needed?

Make SQL's file commit logic code refactored and support more function

Does this PR introduce any user-facing change?

User's self defined commit protocol for SQL may need to change to inherit from SQLFileCommitProtocol

How was this patch tested?

Exited UT

@AngersZhuuuu
Copy link
Copy Markdown
Contributor Author

One solution is we can just add below API in FileCommitProtocol


  def getOutputPath(): Path = null

  def getWorkPath(): Path = null

  def hasOutputPath(): Boolean = {
    return getOutputPath() != null;
  }

then we can complicate history customized FileCommitProtocol and also can implement more function.
WDYT @dongjoon-hyun @HyukjinKwon @viirya @cloud-fan @maropu @Ngone51

@SparkQA
Copy link
Copy Markdown

SparkQA commented Oct 13, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48672/

@SparkQA
Copy link
Copy Markdown

SparkQA commented Oct 13, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48672/

@SparkQA
Copy link
Copy Markdown

SparkQA commented Oct 13, 2021

Test build #144194 has finished for PR 34271 at commit 152ec71.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • abstract class SQLFileCommitProtocol extends FileCommitProtocol

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants