Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-18389][SQL] Disallow cyclic view reference #17152

Closed
wants to merge 3 commits into from

Conversation

jiangxb1987
Copy link
Contributor

What changes were proposed in this pull request?

Disallow cyclic view references, a cyclic view reference may be created by the following queries:

CREATE VIEW testView AS SELECT id FROM tbl
CREATE VIEW testView2 AS SELECT id FROM testView
ALTER VIEW testView AS SELECT * FROM testView2

In the above example, a reference cycle (testView -> testView2 -> testView) exsits.

We disallow cyclic view references by checking that in ALTER VIEW command, when the analyzedPlan contains the same View node with the altered view, we should prevent the behavior and throw an AnalysisException.

How was this patch tested?

Test by SQLViewSuite.test("correctly handle a cyclic view reference").

@jiangxb1987
Copy link
Contributor Author

@SparkQA
Copy link

SparkQA commented Mar 3, 2017

Test build #73854 has finished for PR 17152 at commit 1bd0260.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member

Based on our current impl of view, createOrReplaceTempView can still trigger the cyclic view reference. Right?

@jiangxb1987
Copy link
Contributor Author

@gatorsmile Currently we don't perform recursive resolution over a temporary view, so perhaps that won't trigger a cyclic view reference. For example:

scala> spark.sql("CREATE TEMPORARY VIEW v1 AS SELECT * FROM tab")
res3: org.apache.spark.sql.DataFrame = []

scala> spark.sql("CREATE TEMPORARY VIEW v2 AS SELECT * FROM v1")
res4: org.apache.spark.sql.DataFrame = []

scala> spark.sql("ALTER VIEW v1 AS SELECT * FROM v2")
res5: org.apache.spark.sql.DataFrame = []

scala> spark.sql("SELECT * FROM v1")
res6: org.apache.spark.sql.DataFrame = [a: int, b: string]

@gatorsmile
Copy link
Member

Yeah. The temporary view does not have such an issue, because we did not change it. My typo.

What I mean is CREATE OR REPLACE VIEW. AlterViewAsCommand does not cover that code path. : )

@cloud-fan
Copy link
Contributor

When do other databases report this error? During view creating/alter or during view resolution?

@jiangxb1987
Copy link
Contributor Author

Hive report the error during alter view:

hive> CREATE VIEW v1 AS SELECT * FROM t1;
OK
Time taken: 0.556 seconds
hive> CREATE VIEW v2 AS SELECT * FROM v1;
OK
Time taken: 0.099 seconds
hive> ALTER VIEW v1 AS SELECT * FROM v2;
FAILED: SemanticException Recursive view default.v1 detected (cycle: default.v1 -> default.v2 -> default.v1).
hive> CREATE VIEW v3 AS SELECT * FROM v2;
OK
Time taken: 0.354 seconds
hive> ALTER VIEW v1 AS SELECT * FROM v3 JOIN v2;
FAILED: SemanticException Recursive view default.v1 detected (cycle: default.v1 -> default.v3 -> default.v2 -> default.v1).

sql("ALTER VIEW view1 AS SELECT * FROM view3 JOIN view2")
}.getMessage
assert(e2.contains("Recursive view `default`.`view1` detected (cycle: `default`.`view1` " +
"-> `default`.`view3` -> `default`.`view2` -> `default`.`view1`)"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about this test case?

sql("alter view v1 as select * from jt where exists (select 1 from v2)")

Should we get the same exception as well?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is missing in your code is whenever you hit a SubqueryExpression, you need to traverse the plan of that expression to detect cyclic references? See an example of the code in #16493.

@nsyca
Copy link
Contributor

nsyca commented Mar 6, 2017

Going back to @gatorsmile 's question, does this fix cover the scenario below?

sql("create or replace view v1 as select * from v2")

If this is an existing problem and your PR does not cover it, would you intend to address it in this PR?

@jiangxb1987
Copy link
Contributor Author

@gatorsmile @nsyca Thank you for your comments! I've added the coverage for both CREATE OR REPLACE VIEW and SubqueryExpressions.

@SparkQA
Copy link

SparkQA commented Mar 7, 2017

Test build #74088 has finished for PR 17152 at commit f487af3.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

// Detect cyclic references from subqueries.
plan.expressions.foreach { expr =>
if (expr.isInstanceOf[SubqueryExpression]) {
checkCyclicViewReference(expr.asInstanceOf[SubqueryExpression].plan, path, viewIdent)
Copy link
Contributor

@nsyca nsyca Mar 7, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we use the pattern matching instead of isInstanceOf-asInstanceOf? The logic in the code looks good to me.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@SparkQA
Copy link

SparkQA commented Mar 8, 2017

Test build #74169 has finished for PR 17152 at commit 46da41e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

thanks, merging to master!

@asfgit asfgit closed this in b9783a9 Mar 8, 2017
@jiangxb1987 jiangxb1987 deleted the cyclic-view branch March 8, 2017 04:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants