-
Notifications
You must be signed in to change notification settings - Fork 28.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-34421][SQL] Resolve temporary functions and views in views with CTEs #31550
[SPARK-34421][SQL] Resolve temporary functions and views in views with CTEs #31550
Conversation
Kubernetes integration test starting |
Kubernetes integration test status success |
Thanks for the fix! Not sure whether @cloud-fan and @LinhongLiu will review it, because today is the lunar new year! |
Kubernetes integration test starting |
Kubernetes integration test status success |
Test build #135114 has finished for PR 31550 at commit
|
withTempView("temp_view") { | ||
sql("CREATE TEMPORARY VIEW temp_view AS SELECT 0") | ||
|
||
intercept[AnalysisException] { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we also add an assertion for the error message here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, added in 1cd1cb3
Test build #135116 has finished for PR 31550 at commit
|
cc @HyukjinKwon as this became a blocker of 3.1.1 |
Kubernetes integration test starting |
Kubernetes integration test status success |
Test build #135120 has finished for PR 31550 at commit
|
Thanks for pinging me @peter-toth. I will share it in the vote thread. |
@cloud-fan @LinhongLiu @allisonwang-db @maryannxue, it would be great if reviewing this is prioritized to unblock the release. |
@@ -557,23 +557,30 @@ object ViewHelper { | |||
private def collectTemporaryObjects( | |||
catalog: SessionCatalog, child: LogicalPlan): (Seq[Seq[String]], Seq[String]) = { | |||
def collectTempViews(child: LogicalPlan): Seq[Seq[String]] = { | |||
child.collect { | |||
child.flatMap { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@peter-toth, thanks for fixing this.
any particular reason to replace collect
to flatMap
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have the same comment, too. IMO the previous one looks okay.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems like it simplifies collect(...).flatten
to flatMap(...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as @cloud-fan explained
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM with minor comments
@peter-toth, just read the description, do we have analysis error when creating the permanent view with CTE and refers temp objects before #30567 |
val tempViewName = "temp_view" | ||
|
||
withTempView(tempViewName) { | ||
spark.udf.register("temp_func", identity[Int](_)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you use withUserDefinedFunction
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, fixed in: ce34b5e
@@ -3943,6 +3943,43 @@ class SQLQuerySuite extends QueryTest with SharedSparkSession with AdaptiveSpark | |||
} | |||
} | |||
} | |||
|
|||
test("SPARK-34421: Resolve temporary functions and views in views with CTEs") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this issue only related to a temp function case? How about a permanent one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same question
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, permanent functions are not affected: #31550 (comment)
@@ -557,23 +557,30 @@ object ViewHelper { | |||
private def collectTemporaryObjects( | |||
catalog: SessionCatalog, child: LogicalPlan): (Seq[Seq[String]], Seq[String]) = { | |||
def collectTempViews(child: LogicalPlan): Seq[Seq[String]] = { | |||
child.collect { | |||
child.flatMap { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have the same comment, too. IMO the previous one looks okay.
|""".stripMargin) | ||
} | ||
assert(e.message.contains(s"Not allowed to create a permanent view `default`.`$viewName` " + | ||
s"by referencing a temporary view $tempViewName.")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hm, this message looks correct, so v3.0.2
has the same issue?
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 3.0.2
/_/
Using Scala version 2.12.10 (OpenJDK 64-Bit Server VM, Java 11.0.2)
Type in expressions to have them evaluated.
Type :help for more information.
scala> val tempViewName = "temp_view"
scala> val viewName = "view_on_temp_view"
scala> sql(s"CREATE TEMPORARY VIEW $tempViewName AS SELECT 0")
scala> sql(
| s"""
| |CREATE VIEW $viewName AS
| |WITH cte AS (
| | SELECT * FROM $tempViewName
| |)
| |SELECT * FROM cte
| |""".stripMargin)
scala> spark.table(viewName).show()
+---+
| 0|
+---+
| 0|
+---+
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While I agree with you, the check came with #30567 that is a 3.1 fix only.
This looks like a bug yes, details: #31550 (comment)
val e = intercept[AnalysisException] { | ||
sql( | ||
s""" | ||
|CREATE VIEW $viewName AS |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the bug happens in permanent view. Is it really a regression?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This part isn't, please see #31550 (comment) for details.
@@ -3943,6 +3943,43 @@ class SQLQuerySuite extends QueryTest with SharedSparkSession with AdaptiveSpark | |||
} | |||
} | |||
} | |||
|
|||
test("SPARK-34421: Resolve temporary functions and views in views with CTEs") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for the temp view, we should verify the view capture temp objects correctly, so you should also add a permanent view/function, table with the same name and see if "select * from temp_view" can return correct result
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ignore my comment, I found that is not necessary after checked your code one more time.
So it looks like the test case caused a bit of confusion here so I've split it into 2 cases.
|
Just a correction, temporary objects check for permanent views are existing for a long time, not introduced by #30567. But we never check the CTE scenario which is fixed by this PR. You can try the below queries on 3.0
|
Indeed, thanks for the correction. |
} | ||
} | ||
|
||
test("SPARK-34421: Resolve temporary objects in permanent views with CTEs") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we move the test to SQLViewTestSuite
, so that it's tested against temp view, global temp view, and permanent view automatically?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because temp view and permanent view have different behaviors, it's not easy to add this to SQLViewTestSuite
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah I see, makes sense.
hmm why is that? Looking at the code I think temp view and function should both trigger this bug. |
BTW let's also put the error message caused by this bug in the PR description, so that people can easily understand what happens, instead of just seeing |
Test build #135222 has finished for PR 31550 at commit
|
Sure, added to the PR description.
No idea yet, but can take a look at it tomorrow. |
It will be great to figure this out, in case there are other bugs. It doesn't block this PR though. |
take a look at "why temp view refers temp view in CTE is working without this PR". Here is an example to explain: use default;
create table same_name as select 100 as a;
create temp view test_view as select * from same_name;
select * from test_view; -- output 100
create temp view same_name as select 99 as a; -- a temp view has the same name with the table
select * from test_view; -- output 100, expected, temp view should not override table
drop table default.same_name; -- drop the table
select * from test_view; -- output 99, wrong result, we expect table not found because we dropped it |
I created |
thanks, merging to master/3.1! |
…h CTEs ### What changes were proposed in this pull request? This PR: - Fixes a bug that prevents analysis of: ``` CREATE TEMPORARY VIEW temp_view AS WITH cte AS (SELECT temp_func(0)) SELECT * FROM cte; SELECT * FROM temp_view ``` by throwing: ``` Undefined function: 'temp_func'. This function is neither a registered temporary function nor a permanent function registered in the database 'default'. ``` - and doesn't report analysis error when it should: ``` CREATE TEMPORARY VIEW temp_view AS SELECT 0; CREATE VIEW view_on_temp_view AS WITH cte AS (SELECT * FROM temp_view) SELECT * FROM cte ``` by properly collecting temporary objects from VIEW definitions with CTEs. - Minor refactor to make the affected code more readable. ### Why are the changes needed? To fix a bug introduced with #30567 ### Does this PR introduce _any_ user-facing change? Yes, the query works again. ### How was this patch tested? Added new UT + existing ones. Closes #31550 from peter-toth/SPARK-34421-temp-functions-in-views-with-cte. Authored-by: Peter Toth <peter.toth@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> (cherry picked from commit 27abb6a) Signed-off-by: Wenchen Fan <wenchen@databricks.com>
@peter-toth can you open backport PRs for 3.0/2.4 to fix the permanent view bug? thanks! |
Thanks for the review. I will open backport PRs soon. |
…s with CTEs This PR: - Fixes a bug that prevents analysis of: ``` CREATE TEMPORARY VIEW temp_view AS WITH cte AS (SELECT temp_func(0)) SELECT * FROM cte; SELECT * FROM temp_view ``` by throwing: ``` Undefined function: 'temp_func'. This function is neither a registered temporary function nor a permanent function registered in the database 'default'. ``` - and doesn't report analysis error when it should: ``` CREATE TEMPORARY VIEW temp_view AS SELECT 0; CREATE VIEW view_on_temp_view AS WITH cte AS (SELECT * FROM temp_view) SELECT * FROM cte ``` by properly collecting temporary objects from VIEW definitions with CTEs. - Minor refactor to make the affected code more readable. To fix a bug introduced with apache#30567 Yes, the query works again. Added new UT + existing ones. Closes apache#31550 from peter-toth/SPARK-34421-temp-functions-in-views-with-cte. Authored-by: Peter Toth <peter.toth@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
…s with CTEs This PR: - Fixes a bug that prevents analysis of: ``` CREATE TEMPORARY VIEW temp_view AS WITH cte AS (SELECT temp_func(0)) SELECT * FROM cte; SELECT * FROM temp_view ``` by throwing: ``` Undefined function: 'temp_func'. This function is neither a registered temporary function nor a permanent function registered in the database 'default'. ``` - and doesn't report analysis error when it should: ``` CREATE TEMPORARY VIEW temp_view AS SELECT 0; CREATE VIEW view_on_temp_view AS WITH cte AS (SELECT * FROM temp_view) SELECT * FROM cte ``` by properly collecting temporary objects from VIEW definitions with CTEs. - Minor refactor to make the affected code more readable. To fix a bug introduced with apache#30567 Yes, the query works again. Added new UT + existing ones. Closes apache#31550 from peter-toth/SPARK-34421-temp-functions-in-views-with-cte. Authored-by: Peter Toth <peter.toth@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
@cloud-fan, I've opened #31592 to backport the change to 3.0. |
@peter-toth No need, the function is 2.4 is |
If we decided to not backport SPARK-29630 to 2.4 at that time, we probably shouldn't backport this fix as well. 3.0 should be good enough. |
Yes, found that, but the recursive
Ok. |
What changes were proposed in this pull request?
This PR:
Fixes a bug that prevents analysis of:
by throwing:
and doesn't report analysis error when it should:
by properly collecting temporary objects from VIEW definitions with CTEs.
Minor refactor to make the affected code more readable.
Why are the changes needed?
To fix a bug introduced with #30567
Does this PR introduce any user-facing change?
Yes, the query works again.
How was this patch tested?
Added new UT + existing ones.