Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-36706][SQL][3.1] OverwriteByExpression conversion in DataSourceV2Strategy use wrong param in translateFilter #33997

Closed
wants to merge 5 commits into from

Conversation

huaxingao
Copy link
Contributor

@huaxingao huaxingao commented Sep 14, 2021

What changes were proposed in this pull request?

The wrong parameter is used in translateFilter in the following code

      val filters = splitConjunctivePredicates(deleteExpr).map {
        filter => DataSourceStrategy.translateFilter(deleteExpr,
          supportNestedPredicatePushdown = true).getOrElse(
            throw new AnalysisException(s"Cannot translate expression to source filter: $filter"))
      }.toArray

Using this as an example

spark.table("source2_t").writeTo("testcat.table_name").overwrite($"id1" === 3 && $"id2" === 3)

The above code will generate these filters:

And(EqualTo(id1, 3),EqualTo(id2, 3))  
And(EqualTo(id1, 3),EqualTo(id2, 3))

we want to fix the code so it will generate the filters like these:

EqualTo(id1, 3)  
EqualTo(id2, 3)

This problem only exists in 3.1. In 3.2 and 3.3, we have

      val filters = splitConjunctivePredicates(deleteExpr).flatMap { pred =>
        val filter = DataSourceStrategy.translateFilter(pred, supportNestedPredicatePushdown = true)
        if (filter.isEmpty) {
          throw QueryCompilationErrors.cannotTranslateExpressionToSourceFilterError(pred)
        }
        filter
      }.toArray

Why are the changes needed?

fix a bug in the code

Does this PR introduce any user-facing change?

no

How was this patch tested?

existing tests

@github-actions github-actions bot added the SQL label Sep 14, 2021
@@ -220,6 +220,33 @@ class DataFrameWriterV2Suite extends QueryTest with SharedSparkSession with Befo
Seq(Row(1L, "a"), Row(2L, "b"), Row(4L, "d"), Row(5L, "e"), Row(6L, "f")))
}

test("Overwrite: overwrite by expression: more than one filters") {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add JIRA number?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added Jira number and also updated PR description

Copy link
Member

@viirya viirya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fix looks correct.

@SparkQA
Copy link

SparkQA commented Sep 14, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47782/

@SparkQA
Copy link

SparkQA commented Sep 14, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47782/

@SparkQA
Copy link

SparkQA commented Sep 14, 2021

Test build #143280 has finished for PR 33997 at commit 954beb3.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 14, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47787/

@SparkQA
Copy link

SparkQA commented Sep 14, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47787/

@SparkQA
Copy link

SparkQA commented Sep 15, 2021

Test build #143284 has finished for PR 33997 at commit ffb4f3a.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@viirya
Copy link
Member

viirya commented Sep 15, 2021

cc @sunchao @dongjoon-hyun

@viirya
Copy link
Member

viirya commented Sep 15, 2021

@huaxingao Could you also mention the reason why we only need fix it for 3.1 too in the description? Thanks.

@SparkQA
Copy link

SparkQA commented Sep 15, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47790/

@SparkQA
Copy link

SparkQA commented Sep 15, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47790/

@SparkQA
Copy link

SparkQA commented Sep 15, 2021

Test build #143287 has finished for PR 33997 at commit 45b20c0.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@@ -220,6 +220,33 @@ class DataFrameWriterV2Suite extends QueryTest with SharedSparkSession with Befo
Seq(Row(1L, "a"), Row(2L, "b"), Row(4L, "d"), Row(5L, "e"), Row(6L, "f")))
}

test("SPARK-36706 Overwrite: overwrite by expression: more than one filters") {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we test that the filters are changed? it seems we only check query answers but I think they should pass no matter with or without the fix?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the test would pass without the fix. but couldn't figure out an easy way to get the OverwriteByExpressionExec node and check the filters on it.

@dongjoon-hyun
Copy link
Member

Hi, @huaxingao .
Shall we remove the test case and merge this PR first?
I believe we can add the test case later when you are ready.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM. Thank you, @huaxingao , @viirya , @sunchao .
Merged to branch-3.1.

dongjoon-hyun pushed a commit that referenced this pull request Sep 20, 2021
…eV2Strategy use wrong param in translateFilter

### What changes were proposed in this pull request?
The wrong parameter is used in `translateFilter` in the following code
```
      val filters = splitConjunctivePredicates(deleteExpr).map {
        filter => DataSourceStrategy.translateFilter(deleteExpr,
          supportNestedPredicatePushdown = true).getOrElse(
            throw new AnalysisException(s"Cannot translate expression to source filter: $filter"))
      }.toArray
```

Using this as an example
```
spark.table("source2_t").writeTo("testcat.table_name").overwrite($"id1" === 3 && $"id2" === 3)
```

The above code will generate these filters:
```
And(EqualTo(id1, 3),EqualTo(id2, 3))
And(EqualTo(id1, 3),EqualTo(id2, 3))
```

 we want to fix the code so it will generate the filters like these:
```
EqualTo(id1, 3)
EqualTo(id2, 3)
```

This problem only exists in 3.1. In 3.2 and 3.3, we have

```
      val filters = splitConjunctivePredicates(deleteExpr).flatMap { pred =>
        val filter = DataSourceStrategy.translateFilter(pred, supportNestedPredicatePushdown = true)
        if (filter.isEmpty) {
          throw QueryCompilationErrors.cannotTranslateExpressionToSourceFilterError(pred)
        }
        filter
      }.toArray
```

### Why are the changes needed?
fix a bug in the code

### Does this PR introduce _any_ user-facing change?
no

### How was this patch tested?
existing tests

Closes #33997 from huaxingao/spark-36706.

Authored-by: Huaxin Gao <huaxin_gao@apple.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
@huaxingao
Copy link
Contributor Author

Thanks a lot! @dongjoon-hyun @sunchao @viirya

@huaxingao huaxingao deleted the spark-36706 branch September 20, 2021 20:28
@SparkQA
Copy link

SparkQA commented Sep 20, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47970/

@SparkQA
Copy link

SparkQA commented Sep 20, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/47970/

fishcus pushed a commit to fishcus/spark that referenced this pull request Jan 12, 2022
…eV2Strategy use wrong param in translateFilter

### What changes were proposed in this pull request?
The wrong parameter is used in `translateFilter` in the following code
```
      val filters = splitConjunctivePredicates(deleteExpr).map {
        filter => DataSourceStrategy.translateFilter(deleteExpr,
          supportNestedPredicatePushdown = true).getOrElse(
            throw new AnalysisException(s"Cannot translate expression to source filter: $filter"))
      }.toArray
```

Using this as an example
```
spark.table("source2_t").writeTo("testcat.table_name").overwrite($"id1" === 3 && $"id2" === 3)
```

The above code will generate these filters:
```
And(EqualTo(id1, 3),EqualTo(id2, 3))
And(EqualTo(id1, 3),EqualTo(id2, 3))
```

 we want to fix the code so it will generate the filters like these:
```
EqualTo(id1, 3)
EqualTo(id2, 3)
```

This problem only exists in 3.1. In 3.2 and 3.3, we have

```
      val filters = splitConjunctivePredicates(deleteExpr).flatMap { pred =>
        val filter = DataSourceStrategy.translateFilter(pred, supportNestedPredicatePushdown = true)
        if (filter.isEmpty) {
          throw QueryCompilationErrors.cannotTranslateExpressionToSourceFilterError(pred)
        }
        filter
      }.toArray
```

### Why are the changes needed?
fix a bug in the code

### Does this PR introduce _any_ user-facing change?
no

### How was this patch tested?
existing tests

Closes apache#33997 from huaxingao/spark-36706.

Authored-by: Huaxin Gao <huaxin_gao@apple.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
5 participants