Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-34269][SQL][TESTS][FOLLOWUP] Test a subquery with view in aggregate's grouping expression #31352

Closed
wants to merge 5 commits into from

Conversation

imback82
Copy link
Contributor

@imback82 imback82 commented Jan 26, 2021

What changes were proposed in this pull request?

This PR is a follow-up to #31368 to add a test case that has a subquery with "view" in aggregate's grouping expression. The existing test tests a subquery with dataframe's temp view, so it doesn't contain a View node.

Why are the changes needed?

To increase the test coverage.

Does this PR introduce any user-facing change?

No

How was this patch tested?

Added a new test.

Project(newOutput, child)
override def apply(plan: LogicalPlan): LogicalPlan = {
AnalysisHelper.allowInvokingTransformsInAnalyzer {
plan transformUp {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no change below this line. If you don't like the diff, I can change the PR to something like the following:

  override def apply(plan: LogicalPlan): LogicalPlan = {
    AnalysisHelper.allowInvokingTransformsInAnalyzer {
      applyInternal(plan)
    }
  }

  private def applyInternal(plan: LogicalPlan): LogicalPlan = plan transformUp {

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or I would just simply do:

AnalysisHelper.allowInvokingTransformsInAnalyzer { plan transformUp {
  ...
}}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @HyukjinKwon for the suggestion! Since allowInvokingTransformsInAnalyzer introduces one more indentation, I had to move it to align with override, which I am not sure is the style we want to follow. Please let me know!

r.write.saveAsTable("tr")
sql("create view vr as select * from tr")
checkAnswer(
sql("select a, (select sum(d) from vr where a = c) sum_d from l l1 group by 1, 2"),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that the existing test with the same query worked fine because r is a dataframe temp view, which doesn't have the View node.

@github-actions github-actions bot added the SQL label Jan 26, 2021
Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@imback82 . Is the example correct?
Spark 3.1.1 RC1

scala> sql("create temporary view ta(a, b) as select 1, 2")

scala> sql("create temporary view tc(c, d) as select 1, 2")

scala> sql("select a, (select sum(d) from tc where a = c) sum_d from ta group by 1, 2").show
+---+-----+
|  a|sum_d|
+---+-----+
|  1|    2|
+---+-----+

scala> spark.version
res3: String = 3.1.1

Spark 3.2.0

scala> sql("create temporary view ta(a, b) as select 1, 2")

scala> sql("create temporary view tc(c, d) as select 1, 2")

scala> sql("select a, (select sum(d) from tc where a = c) sum_d from ta group by 1, 2").show
+---+-----+
|  a|sum_d|
+---+-----+
|  1|    2|
+---+-----+

scala> spark.version
res3: String = 3.2.0-SNAPSHOT 

@imback82
Copy link
Contributor Author

imback82 commented Jan 26, 2021

Thanks @dongjoon-hyun for checking. Let me double-check and get back to you.

@dongjoon-hyun
Copy link
Member

Thanks. Maybe, do I need to use some configuration?

@SparkQA
Copy link

SparkQA commented Jan 27, 2021

Test build #134519 has finished for PR 31352 at commit 651a2ec.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@imback82
Copy link
Contributor Author

Ah, it's failing only under tests:

  protected def assertNotAnalysisRule(): Unit = {
    if (Utils.isTesting &&
        AnalysisHelper.inAnalyzer.get > 0 &&
        AnalysisHelper.resolveOperatorDepth.get == 0) {
      throw new RuntimeException("This method should not be called in the analyzer")
    }
  }

Utils.isTesting is true if SPARK_TESTING env variable is set.

Is this OK?

@imback82
Copy link
Contributor Author

@cloud-fan Is this an issue if it fails only under tests (assertNotAnalysisRule() check), or no?

@HyukjinKwon
Copy link
Member

Let's remove the regression mentioned in PR description. Seems like it's not a regression.

@imback82
Copy link
Contributor Author

Let's remove the regression mentioned in PR description. Seems like it's not a regression.

In Spark 3.1.1-RC, if we start spark-shell with the SPARK_TESTING=true env variable, we see the exception whereas we don't see it in Spark 3.0. Since the exception is thrown only for "testing" mode, this would not be considered a regression?

@SparkQA
Copy link

SparkQA commented Jan 27, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39121/

@SparkQA
Copy link

SparkQA commented Jan 27, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39121/

@SparkQA
Copy link

SparkQA commented Jan 27, 2021

Test build #134536 has finished for PR 31352 at commit 7016fa5.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

@imback82 thanks for catching this issue! Since it only affects tests, I don't believe it's a real bug.

I think calling a rule in doCanonicalize is tricky, so I opened a PR to clean up this: #31368 , please take a look.

@dongjoon-hyun
Copy link
Member

Thank you, @imback82 , @HyukjinKwon , and @cloud-fan .
Ya, @cloud-fan 's new PR might be a better approach.

@imback82
Copy link
Contributor Author

Thanks all! I'll close this PR and check the other one.

@imback82 imback82 closed this Jan 27, 2021
@imback82 imback82 deleted the grouping_expr branch January 27, 2021 20:07
HyukjinKwon pushed a commit to HyukjinKwon/spark that referenced this pull request Jan 29, 2021
### What changes were proposed in this pull request?

The currently SQL (temp or permanent) view resolution is done in 2 steps:
1. In `SessionCatalog`, we get the view metadata, parse the view SQL string, and wrap it with `View`.
2. At the beginning of the optimizer, we run `EliminateView`, which drops the wrapper `View`, and apply some special logic to match the view schema.

Step 2 is tricky, as we need to retain the output attr expr id, while we need to add an extra `Project` to add cast and alias. This PR simplifies the view solution by building a completed plan (with cast and alias added) in `SessionCatalog`, so that we only have 1 step.

### Why are the changes needed?

Code simplification. It also fixes issues like apache#31352

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

existing tests

Closes apache#31368 from cloud-fan/try.

Authored-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
@cloud-fan
Copy link
Contributor

Hi @imback82 , can you re-open the PR to add the tests? thanks!

@imback82 imback82 changed the title [SPARK-34252][SQL] Subquery with view in aggregate's grouping expression fails during the analysis check [SPARK-34269][SQL][TESTS][FOLLOWUP] Test a subquery with view in aggregate's grouping expression Jan 30, 2021
@imback82 imback82 reopened this Jan 30, 2021
@SparkQA
Copy link

SparkQA commented Jan 30, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39265/

@SparkQA
Copy link

SparkQA commented Jan 30, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39265/

@imback82
Copy link
Contributor Author

Hi @imback82 , can you re-open the PR to add the tests? thanks!

Updated, thanks!

@SparkQA
Copy link

SparkQA commented Jan 31, 2021

Test build #134678 has finished for PR 31352 at commit 76772e7.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM. Merged to master.

skestle pushed a commit to skestle/spark that referenced this pull request Feb 3, 2021
### What changes were proposed in this pull request?

The currently SQL (temp or permanent) view resolution is done in 2 steps:
1. In `SessionCatalog`, we get the view metadata, parse the view SQL string, and wrap it with `View`.
2. At the beginning of the optimizer, we run `EliminateView`, which drops the wrapper `View`, and apply some special logic to match the view schema.

Step 2 is tricky, as we need to retain the output attr expr id, while we need to add an extra `Project` to add cast and alias. This PR simplifies the view solution by building a completed plan (with cast and alias added) in `SessionCatalog`, so that we only have 1 step.

### Why are the changes needed?

Code simplification. It also fixes issues like apache#31352

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

existing tests

Closes apache#31368 from cloud-fan/try.

Authored-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
skestle pushed a commit to skestle/spark that referenced this pull request Feb 3, 2021
…egate's grouping expression

### What changes were proposed in this pull request?

This PR is a follow-up to apache#31368 to add a test case that has a subquery with "view" in aggregate's grouping expression. The existing test tests a subquery with dataframe's temp view, so it doesn't contain a `View` node.

### Why are the changes needed?

To increase the test coverage.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Added a new test.

Closes apache#31352 from imback82/grouping_expr.

Authored-by: Terry Kim <yuminkim@gmail.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
5 participants