Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-32838][SQL]Check DataSource insert command path with actual path #30057

Closed
wants to merge 12 commits into from

Conversation

AngersZhuuuu
Copy link
Contributor

@AngersZhuuuu AngersZhuuuu commented Oct 15, 2020

What changes were proposed in this pull request?

Currently, we verify path in DataSourceAnalysis

// For dynamic partition overwrite, we do not delete partition directories ahead.
// We write to staging directories and move to final partition directories after writing
// job is done. So it is ok to have outputPath try to overwrite inputpath.
 if (overwrite && !insertCommand.dynamicPartitionOverwrite) {
      DDLUtils.verifyNotReadPath(actualQuery, outputPath)
}

/**
   * Throws exception if outputPath tries to overwrite inputpath.
   */
  def verifyNotReadPath(query: LogicalPlan, outputPath: Path) : Unit = {
    val inputPaths = query.collect {
      case LogicalRelation(r: HadoopFsRelation, _, _, _) =>
        r.location.rootPaths
    }.flatten

    if (inputPaths.contains(outputPath)) {
      throw new AnalysisException(
        "Cannot overwrite a path that is also being read from.")
    }
  }

For partition table, we can know that both outputPath and collected inputPaths are table's , not partition's path.
Actually, when we overwrite table A's partition a1 select from partition a2, then path won't conflict. But it compare both use table's path, then throw AnalysisException.

So we need to check this after Optimizer(got partition pushed down), If we don't care about dynamicPartitionFilter, we can just add a check rule about this issue on SparkPlan using FileSourceScanExec.selectedPartitions.

Why are the changes needed?

Fix bug

Does this PR introduce any user-facing change?

No

How was this patch tested?

UT

@AngersZhuuuu
Copy link
Contributor Author

FYI @dongjoon-hyun @cloud-fan

@SparkQA
Copy link

SparkQA commented Oct 15, 2020

Test build #129842 has finished for PR 30057 at commit 9eb5073.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 15, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34447/

@SparkQA
Copy link

SparkQA commented Oct 15, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34448/

@SparkQA
Copy link

SparkQA commented Oct 15, 2020

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34447/

@SparkQA
Copy link

SparkQA commented Oct 15, 2020

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34448/

@SparkQA
Copy link

SparkQA commented Oct 16, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34478/

@SparkQA
Copy link

SparkQA commented Oct 16, 2020

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34478/

@SparkQA
Copy link

SparkQA commented Oct 16, 2020

Test build #129873 has finished for PR 30057 at commit 620956d.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 16, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34480/

@SparkQA
Copy link

SparkQA commented Oct 16, 2020

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34480/

@SparkQA
Copy link

SparkQA commented Oct 16, 2020

Test build #129875 has finished for PR 30057 at commit 199aa8f.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 17, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34540/

@SparkQA
Copy link

SparkQA commented Oct 17, 2020

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34540/

@SparkQA
Copy link

SparkQA commented Oct 17, 2020

Test build #129935 has finished for PR 30057 at commit 9d9ab88.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

throw new AnalysisException(
s"Cannot overwrite a path that is also being read from.")
}
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we don't care about dynamicPartitionFilter, we can just add a check rule about this issue on SparkPlan using selectedpartition.

@AngersZhuuuu
Copy link
Contributor Author

retest this please

@SparkQA
Copy link

SparkQA commented Oct 17, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34549/

@SparkQA
Copy link

SparkQA commented Oct 17, 2020

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34549/

@SparkQA
Copy link

SparkQA commented Oct 17, 2020

Test build #129943 has finished for PR 30057 at commit 9d9ab88.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AngersZhuuuu
Copy link
Contributor Author

ping @dongjoon-hyun @cloud-fan

@dongjoon-hyun
Copy link
Member

cc @viirya

@viirya
Copy link
Member

viirya commented Oct 19, 2020

Why are the changes needed?
Fix bug

Can you describe clearly in the PR description what bug this fixes? The PR description is important for reviewers to understand the issue quickly. Please describe the issue the PR tries to fix in details. Thanks.

@SparkQA
Copy link

SparkQA commented Mar 29, 2021

Test build #136640 has finished for PR 30057 at commit 806aab1.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 29, 2021

Kubernetes integration test unable to build dist.

exiting with code: 1
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41222/

@yaooqinn
Copy link
Member

retest this please

@SparkQA
Copy link

SparkQA commented Mar 30, 2021

Test build #136710 has finished for PR 30057 at commit 806aab1.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 30, 2021

Kubernetes integration test unable to build dist.

exiting with code: 1
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41292/

@SparkQA
Copy link

SparkQA commented Mar 31, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41333/

@SparkQA
Copy link

SparkQA commented Mar 31, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41333/

@SparkQA
Copy link

SparkQA commented Mar 31, 2021

Test build #136751 has finished for PR 30057 at commit 81b1bd8.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AngersZhuuuu
Copy link
Contributor Author

retest this please

@SparkQA
Copy link

SparkQA commented Apr 2, 2021

Kubernetes integration test unable to build dist.

exiting with code: 1
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41427/

@SparkQA
Copy link

SparkQA commented Apr 2, 2021

Test build #136849 has finished for PR 30057 at commit 81b1bd8.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@github-actions
Copy link

Test build #751743803 for PR 30057 at commit 979e314.

@SparkQA
Copy link

SparkQA commented Apr 15, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41991/

@SparkQA
Copy link

SparkQA commented Apr 15, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41991/

@SparkQA
Copy link

SparkQA commented Apr 15, 2021

Test build #137415 has finished for PR 30057 at commit 979e314.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AngersZhuuuu
Copy link
Contributor Author

Gentle ping

@github-actions
Copy link

We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!

@github-actions github-actions bot added the Stale label Aug 19, 2021
@AngersZhuuuu
Copy link
Contributor Author

gentle ping @cloud-fan @viirya

@AngersZhuuuu
Copy link
Contributor Author

retest this please

@SparkQA
Copy link

SparkQA commented Aug 20, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47167/

@SparkQA
Copy link

SparkQA commented Aug 20, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47167/

@SparkQA
Copy link

SparkQA commented Aug 20, 2021

Test build #142666 has finished for PR 30057 at commit 979e314.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@github-actions github-actions bot closed this Aug 21, 2021
@CHENXCHEN
Copy link

@AngersZhuuuu any update about this, thanks!

@shrprasa
Copy link
Contributor

shrprasa commented Apr 6, 2023

@AngersZhuuuu @cloud-fan We are facing this same issue with Spark 3.2. can we get this PR merged if there are no concerns?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
8 participants