New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-34269][SQL][TESTS][FOLLOWUP] Test a subquery with view in aggregate's grouping expression #31352
Conversation
Project(newOutput, child) | ||
override def apply(plan: LogicalPlan): LogicalPlan = { | ||
AnalysisHelper.allowInvokingTransformsInAnalyzer { | ||
plan transformUp { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no change below this line. If you don't like the diff, I can change the PR to something like the following:
override def apply(plan: LogicalPlan): LogicalPlan = {
AnalysisHelper.allowInvokingTransformsInAnalyzer {
applyInternal(plan)
}
}
private def applyInternal(plan: LogicalPlan): LogicalPlan = plan transformUp {
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or I would just simply do:
AnalysisHelper.allowInvokingTransformsInAnalyzer { plan transformUp {
...
}}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @HyukjinKwon for the suggestion! Since allowInvokingTransformsInAnalyzer
introduces one more indentation, I had to move it to align with override
, which I am not sure is the style we want to follow. Please let me know!
r.write.saveAsTable("tr") | ||
sql("create view vr as select * from tr") | ||
checkAnswer( | ||
sql("select a, (select sum(d) from vr where a = c) sum_d from l l1 group by 1, 2"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that the existing test with the same query worked fine because r
is a dataframe temp view, which doesn't have the View
node.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@imback82 . Is the example correct?
Spark 3.1.1 RC1
scala> sql("create temporary view ta(a, b) as select 1, 2")
scala> sql("create temporary view tc(c, d) as select 1, 2")
scala> sql("select a, (select sum(d) from tc where a = c) sum_d from ta group by 1, 2").show
+---+-----+
| a|sum_d|
+---+-----+
| 1| 2|
+---+-----+
scala> spark.version
res3: String = 3.1.1
Spark 3.2.0
scala> sql("create temporary view ta(a, b) as select 1, 2")
scala> sql("create temporary view tc(c, d) as select 1, 2")
scala> sql("select a, (select sum(d) from tc where a = c) sum_d from ta group by 1, 2").show
+---+-----+
| a|sum_d|
+---+-----+
| 1| 2|
+---+-----+
scala> spark.version
res3: String = 3.2.0-SNAPSHOT
Thanks @dongjoon-hyun for checking. Let me double-check and get back to you. |
Thanks. Maybe, do I need to use some configuration? |
Test build #134519 has finished for PR 31352 at commit
|
Ah, it's failing only under tests: protected def assertNotAnalysisRule(): Unit = {
if (Utils.isTesting &&
AnalysisHelper.inAnalyzer.get > 0 &&
AnalysisHelper.resolveOperatorDepth.get == 0) {
throw new RuntimeException("This method should not be called in the analyzer")
}
}
Is this OK? |
@cloud-fan Is this an issue if it fails only under tests (assertNotAnalysisRule() check), or no? |
Let's remove the regression mentioned in PR description. Seems like it's not a regression. |
In Spark 3.1.1-RC, if we start |
Kubernetes integration test starting |
Kubernetes integration test status success |
Test build #134536 has finished for PR 31352 at commit
|
Thank you, @imback82 , @HyukjinKwon , and @cloud-fan . |
Thanks all! I'll close this PR and check the other one. |
### What changes were proposed in this pull request? The currently SQL (temp or permanent) view resolution is done in 2 steps: 1. In `SessionCatalog`, we get the view metadata, parse the view SQL string, and wrap it with `View`. 2. At the beginning of the optimizer, we run `EliminateView`, which drops the wrapper `View`, and apply some special logic to match the view schema. Step 2 is tricky, as we need to retain the output attr expr id, while we need to add an extra `Project` to add cast and alias. This PR simplifies the view solution by building a completed plan (with cast and alias added) in `SessionCatalog`, so that we only have 1 step. ### Why are the changes needed? Code simplification. It also fixes issues like apache#31352 ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? existing tests Closes apache#31368 from cloud-fan/try. Authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
Hi @imback82 , can you re-open the PR to add the tests? thanks! |
Kubernetes integration test starting |
Kubernetes integration test status success |
Updated, thanks! |
Test build #134678 has finished for PR 31352 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM. Merged to master.
### What changes were proposed in this pull request? The currently SQL (temp or permanent) view resolution is done in 2 steps: 1. In `SessionCatalog`, we get the view metadata, parse the view SQL string, and wrap it with `View`. 2. At the beginning of the optimizer, we run `EliminateView`, which drops the wrapper `View`, and apply some special logic to match the view schema. Step 2 is tricky, as we need to retain the output attr expr id, while we need to add an extra `Project` to add cast and alias. This PR simplifies the view solution by building a completed plan (with cast and alias added) in `SessionCatalog`, so that we only have 1 step. ### Why are the changes needed? Code simplification. It also fixes issues like apache#31352 ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? existing tests Closes apache#31368 from cloud-fan/try. Authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
…egate's grouping expression ### What changes were proposed in this pull request? This PR is a follow-up to apache#31368 to add a test case that has a subquery with "view" in aggregate's grouping expression. The existing test tests a subquery with dataframe's temp view, so it doesn't contain a `View` node. ### Why are the changes needed? To increase the test coverage. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Added a new test. Closes apache#31352 from imback82/grouping_expr. Authored-by: Terry Kim <yuminkim@gmail.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
What changes were proposed in this pull request?
This PR is a follow-up to #31368 to add a test case that has a subquery with "view" in aggregate's grouping expression. The existing test tests a subquery with dataframe's temp view, so it doesn't contain a
View
node.Why are the changes needed?
To increase the test coverage.
Does this PR introduce any user-facing change?
No
How was this patch tested?
Added a new test.