Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-34421][SQL] Resolve temporary functions and views in views with CTEs #31550

Conversation

peter-toth
Copy link
Contributor

@peter-toth peter-toth commented Feb 11, 2021

What changes were proposed in this pull request?

This PR:

  • Fixes a bug that prevents analysis of:

    CREATE TEMPORARY VIEW temp_view AS WITH cte AS (SELECT temp_func(0)) SELECT * FROM cte;
    SELECT * FROM temp_view
    

    by throwing:

    Undefined function: 'temp_func'. This function is neither a registered temporary function nor a permanent function registered in the database 'default'.
    
  • and doesn't report analysis error when it should:

    CREATE TEMPORARY VIEW temp_view AS SELECT 0;
    CREATE VIEW view_on_temp_view AS WITH cte AS (SELECT * FROM temp_view) SELECT * FROM cte
    

    by properly collecting temporary objects from VIEW definitions with CTEs.

  • Minor refactor to make the affected code more readable.

Why are the changes needed?

To fix a bug introduced with #30567

Does this PR introduce any user-facing change?

Yes, the query works again.

How was this patch tested?

Added new UT + existing ones.

@github-actions github-actions bot added the SQL label Feb 11, 2021
@peter-toth
Copy link
Contributor Author

cc @linhongliu-db, @cloud-fan

@SparkQA
Copy link

SparkQA commented Feb 11, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39696/

@SparkQA
Copy link

SparkQA commented Feb 11, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39696/

@peter-toth peter-toth changed the title [SPARK-34421][SQL] Resolve temporary functions and views in views with CTEs [WIP][SPARK-34421][SQL] Resolve temporary functions and views in views with CTEs Feb 11, 2021
@peter-toth peter-toth changed the title [WIP][SPARK-34421][SQL] Resolve temporary functions and views in views with CTEs [SPARK-34421][SQL] Resolve temporary functions and views in views with CTEs Feb 11, 2021
@gatorsmile
Copy link
Member

Thanks for the fix! Not sure whether @cloud-fan and @LinhongLiu will review it, because today is the lunar new year!

CC @allisonwang-db @maryannxue

@SparkQA
Copy link

SparkQA commented Feb 11, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39698/

@SparkQA
Copy link

SparkQA commented Feb 11, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39698/

@SparkQA
Copy link

SparkQA commented Feb 11, 2021

Test build #135114 has finished for PR 31550 at commit 1f13cfe.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

withTempView("temp_view") {
sql("CREATE TEMPORARY VIEW temp_view AS SELECT 0")

intercept[AnalysisException] {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we also add an assertion for the error message here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, added in 1cd1cb3

@SparkQA
Copy link

SparkQA commented Feb 11, 2021

Test build #135116 has finished for PR 31550 at commit f8564cb.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@peter-toth
Copy link
Contributor Author

cc @HyukjinKwon as this became a blocker of 3.1.1

@SparkQA
Copy link

SparkQA commented Feb 12, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39701/

@SparkQA
Copy link

SparkQA commented Feb 12, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39701/

@SparkQA
Copy link

SparkQA commented Feb 12, 2021

Test build #135120 has finished for PR 31550 at commit 1cd1cb3.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member

Thanks for pinging me @peter-toth. I will share it in the vote thread.

@HyukjinKwon
Copy link
Member

@cloud-fan @LinhongLiu @allisonwang-db @maryannxue, it would be great if reviewing this is prioritized to unblock the release.

@HyukjinKwon
Copy link
Member

@@ -557,23 +557,30 @@ object ViewHelper {
private def collectTemporaryObjects(
catalog: SessionCatalog, child: LogicalPlan): (Seq[Seq[String]], Seq[String]) = {
def collectTempViews(child: LogicalPlan): Seq[Seq[String]] = {
child.collect {
child.flatMap {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@peter-toth, thanks for fixing this.
any particular reason to replace collect to flatMap?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have the same comment, too. IMO the previous one looks okay.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems like it simplifies collect(...).flatten to flatMap(...)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as @cloud-fan explained

Copy link
Contributor

@linhongliu-db linhongliu-db left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with minor comments

@linhongliu-db
Copy link
Contributor

@peter-toth, just read the description, do we have analysis error when creating the permanent view with CTE and refers temp objects before #30567

val tempViewName = "temp_view"

withTempView(tempViewName) {
spark.udf.register("temp_func", identity[Int](_))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you use withUserDefinedFunction?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, fixed in: ce34b5e

@@ -3943,6 +3943,43 @@ class SQLQuerySuite extends QueryTest with SharedSparkSession with AdaptiveSpark
}
}
}

test("SPARK-34421: Resolve temporary functions and views in views with CTEs") {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this issue only related to a temp function case? How about a permanent one?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, permanent functions are not affected: #31550 (comment)

@@ -557,23 +557,30 @@ object ViewHelper {
private def collectTemporaryObjects(
catalog: SessionCatalog, child: LogicalPlan): (Seq[Seq[String]], Seq[String]) = {
def collectTempViews(child: LogicalPlan): Seq[Seq[String]] = {
child.collect {
child.flatMap {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have the same comment, too. IMO the previous one looks okay.

|""".stripMargin)
}
assert(e.message.contains(s"Not allowed to create a permanent view `default`.`$viewName` " +
s"by referencing a temporary view $tempViewName."))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm, this message looks correct, so v3.0.2 has the same issue?

Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 3.0.2
      /_/
         
Using Scala version 2.12.10 (OpenJDK 64-Bit Server VM, Java 11.0.2)
Type in expressions to have them evaluated.
Type :help for more information.

scala> val tempViewName = "temp_view"
scala> val viewName = "view_on_temp_view"
scala> sql(s"CREATE TEMPORARY VIEW $tempViewName AS SELECT 0")
scala> sql(
     |   s"""
     |     |CREATE VIEW $viewName AS
     |     |WITH cte AS (
     |     |  SELECT * FROM $tempViewName
     |     |)
     |     |SELECT * FROM cte
     |     |""".stripMargin)

scala> spark.table(viewName).show()
+---+                                                                           
|  0|
+---+
|  0|
+---+

Copy link
Contributor Author

@peter-toth peter-toth Feb 18, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I agree with you, the check came with #30567 that is a 3.1 fix only.
This looks like a bug yes, details: #31550 (comment)

val e = intercept[AnalysisException] {
sql(
s"""
|CREATE VIEW $viewName AS
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the bug happens in permanent view. Is it really a regression?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part isn't, please see #31550 (comment) for details.

@@ -3943,6 +3943,43 @@ class SQLQuerySuite extends QueryTest with SharedSparkSession with AdaptiveSpark
}
}
}

test("SPARK-34421: Resolve temporary functions and views in views with CTEs") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for the temp view, we should verify the view capture temp objects correctly, so you should also add a permanent view/function, table with the same name and see if "select * from temp_view" can return correct result

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ignore my comment, I found that is not necessary after checked your code one more time.

@peter-toth
Copy link
Contributor Author

peter-toth commented Feb 18, 2021

So it looks like the test case caused a bit of confusion here so I've split it into 2 cases.

@linhongliu-db
Copy link
Contributor

SPARK-34421: Resolve temporary objects in permanent views with CTEs tests the non regression part. That "temp object in permanent one" check came with #30567 but it doesn't work on permanent views with CTEs without this fix.

Just a correction, temporary objects check for permanent views are existing for a long time, not introduced by #30567. But we never check the CTE scenario which is fixed by this PR. You can try the below queries on 3.0

CREATE TEMPORARY VIEW v0 AS SELECT 0
CREATE VIEW v1 AS SELECT FROM v0

@peter-toth
Copy link
Contributor Author

SPARK-34421: Resolve temporary objects in permanent views with CTEs tests the non regression part. That "temp object in permanent one" check came with #30567 but it doesn't work on permanent views with CTEs without this fix.

Just a correction, temporary objects check for permanent views are existing for a long time, not introduced by #30567. But we never check the CTE scenario which is fixed by this PR. You can try the below queries on 3.0

CREATE TEMPORARY VIEW v0 AS SELECT 0
CREATE VIEW v1 AS SELECT FROM v0

Indeed, thanks for the correction.

}
}

test("SPARK-34421: Resolve temporary objects in permanent views with CTEs") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we move the test to SQLViewTestSuite, so that it's tested against temp view, global temp view, and permanent view automatically?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because temp view and permanent view have different behaviors, it's not easy to add this to SQLViewTestSuite

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah I see, makes sense.

@cloud-fan
Copy link
Contributor

Please note that the temp view in temp view part does work even without this fix

hmm why is that? Looking at the code I think temp view and function should both trigger this bug.

@cloud-fan
Copy link
Contributor

BTW let's also put the error message caused by this bug in the PR description, so that people can easily understand what happens, instead of just seeing a bug that prevents analysis.

@SparkQA
Copy link

SparkQA commented Feb 18, 2021

Test build #135222 has finished for PR 31550 at commit ce34b5e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@peter-toth
Copy link
Contributor Author

BTW let's also put the error message caused by this bug in the PR description, so that people can easily understand what happens, instead of just seeing a bug that prevents analysis.

Sure, added to the PR description.

Please note that the temp view in temp view part does work even without this fix

hmm why is that? Looking at the code I think temp view and function should both trigger this bug.

No idea yet, but can take a look at it tomorrow.

@cloud-fan
Copy link
Contributor

No idea yet, but can take a look at it tomorrow.

It will be great to figure this out, in case there are other bugs. It doesn't block this PR though.

@linhongliu-db
Copy link
Contributor

take a look at "why temp view refers temp view in CTE is working without this PR".
It's because we do ResolveTempViews both in ResolveRelations and ResolveTables. In this case, ResolveRelations try to resolve temp_view as default.temp_view since it's not in the referred list (caused by this cte bug). But in ResolveTables, it just try to resolve all UnreolvedRelations as a temporary view, so in this case, it succeeds (by mistake).

Here is an example to explain:

use default;
create table same_name as select 100 as a;
create temp view test_view as select * from same_name;
select * from test_view; -- output 100
create temp view same_name as select 99 as a; -- a temp view has the same name with the table
select * from test_view; -- output 100, expected, temp view should not override table
drop table default.same_name; -- drop the table
select * from test_view; -- output 99, wrong result, we expect table not found because we dropped it

@cloud-fan
Copy link
Contributor

I created test_view as a permanent view and tried this example in Spark 3.0, it also gives the wrong result. I think it's another bug of the permanent view that gets inherited by the new SQL temp view. @linhongliu-db can you help to fix it?

@cloud-fan
Copy link
Contributor

thanks, merging to master/3.1!

cloud-fan pushed a commit that referenced this pull request Feb 19, 2021
…h CTEs

### What changes were proposed in this pull request?
This PR:
- Fixes a bug that prevents analysis of:
  ```
  CREATE TEMPORARY VIEW temp_view AS WITH cte AS (SELECT temp_func(0)) SELECT * FROM cte;
  SELECT * FROM temp_view
  ```
  by throwing:
  ```
  Undefined function: 'temp_func'. This function is neither a registered temporary function nor a permanent function registered in the database 'default'.
  ```
- and doesn't report analysis error when it should:
  ```
  CREATE TEMPORARY VIEW temp_view AS SELECT 0;
  CREATE VIEW view_on_temp_view AS WITH cte AS (SELECT * FROM temp_view) SELECT * FROM cte
  ```
  by properly collecting temporary objects from VIEW definitions with CTEs.

- Minor refactor to make the affected code more readable.

### Why are the changes needed?
To fix a bug introduced with #30567

### Does this PR introduce _any_ user-facing change?
Yes, the query works again.

### How was this patch tested?
Added new UT + existing ones.

Closes #31550 from peter-toth/SPARK-34421-temp-functions-in-views-with-cte.

Authored-by: Peter Toth <peter.toth@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(cherry picked from commit 27abb6a)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
@cloud-fan cloud-fan closed this in 27abb6a Feb 19, 2021
@cloud-fan
Copy link
Contributor

@peter-toth can you open backport PRs for 3.0/2.4 to fix the permanent view bug? thanks!

@peter-toth
Copy link
Contributor Author

Thanks for the review. I will open backport PRs soon.

peter-toth added a commit to peter-toth/spark that referenced this pull request Feb 19, 2021
…s with CTEs

This PR:
- Fixes a bug that prevents analysis of:
  ```
  CREATE TEMPORARY VIEW temp_view AS WITH cte AS (SELECT temp_func(0)) SELECT * FROM cte;
  SELECT * FROM temp_view
  ```
  by throwing:
  ```
  Undefined function: 'temp_func'. This function is neither a registered temporary function nor a permanent function registered in the database 'default'.
  ```
- and doesn't report analysis error when it should:
  ```
  CREATE TEMPORARY VIEW temp_view AS SELECT 0;
  CREATE VIEW view_on_temp_view AS WITH cte AS (SELECT * FROM temp_view) SELECT * FROM cte
  ```
  by properly collecting temporary objects from VIEW definitions with CTEs.

- Minor refactor to make the affected code more readable.

To fix a bug introduced with apache#30567

Yes, the query works again.

Added new UT + existing ones.

Closes apache#31550 from peter-toth/SPARK-34421-temp-functions-in-views-with-cte.

Authored-by: Peter Toth <peter.toth@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
peter-toth added a commit to peter-toth/spark that referenced this pull request Feb 19, 2021
…s with CTEs

This PR:
- Fixes a bug that prevents analysis of:
  ```
  CREATE TEMPORARY VIEW temp_view AS WITH cte AS (SELECT temp_func(0)) SELECT * FROM cte;
  SELECT * FROM temp_view
  ```
  by throwing:
  ```
  Undefined function: 'temp_func'. This function is neither a registered temporary function nor a permanent function registered in the database 'default'.
  ```
- and doesn't report analysis error when it should:
  ```
  CREATE TEMPORARY VIEW temp_view AS SELECT 0;
  CREATE VIEW view_on_temp_view AS WITH cte AS (SELECT * FROM temp_view) SELECT * FROM cte
  ```
  by properly collecting temporary objects from VIEW definitions with CTEs.

- Minor refactor to make the affected code more readable.

To fix a bug introduced with apache#30567

Yes, the query works again.

Added new UT + existing ones.

Closes apache#31550 from peter-toth/SPARK-34421-temp-functions-in-views-with-cte.

Authored-by: Peter Toth <peter.toth@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
@peter-toth
Copy link
Contributor Author

@cloud-fan, I've opened #31592 to backport the change to 3.0.
But it looks like https://issues.apache.org/jira/browse/SPARK-29630 is missing from 2.4 and probably we should backport that change before this one. So shall I open a SPARK-29630 backport PR to 2.4?

@linhongliu-db
Copy link
Contributor

linhongliu-db commented Feb 19, 2021

So shall I open a SPARK-29630 backport PR to 2.4?

@peter-toth No need, the function is 2.4 is verifyTemporaryObjectsNotExists, just manually apply your changes to that function should be fine.

@cloud-fan
Copy link
Contributor

If we decided to not backport SPARK-29630 to 2.4 at that time, we probably shouldn't backport this fix as well. 3.0 should be good enough.

@peter-toth
Copy link
Contributor Author

peter-toth commented Feb 19, 2021

So shall I open a SPARK-29630 backport PR to 2.4?

@peter-toth No need, the function is 2.4 is verifyTemporaryObjectsNotExists, just manually apply your changes to that function should be fine.

Yes, found that, but the recursive verify() inside verifyTemporaryObjectsNotExists() is missing due to missing SPARK-29630 and without that function traversing the inner children of With would be an ugly change.

If we decided to not backport SPARK-29630 to 2.4 at that time, we probably shouldn't backport this fix as well. 3.0 should be good enough.

Ok.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
8 participants