Skip to content

[SPARK-37214][SQL] Fail query analysis earlier with invalid identifiers#34490

Closed
cloud-fan wants to merge 2 commits intoapache:masterfrom
cloud-fan:table
Closed

[SPARK-37214][SQL] Fail query analysis earlier with invalid identifiers#34490
cloud-fan wants to merge 2 commits intoapache:masterfrom
cloud-fan:table

Conversation

@cloud-fan
Copy link
Contributor

@cloud-fan cloud-fan commented Nov 5, 2021

What changes were proposed in this pull request?

This is a followup of #31427 , which introduced two issues:

  1. When we lookup spark_catalog.t, we failed earlier with The namespace in session catalog must have exactly one name part before that PR, now we fail very late in CheckAnalysis with NoSuchTableException
  2. The error message is a bit confusing now. We report Table t not found even if table t exists.

This PR fixes the 2 issues.

Why are the changes needed?

save analysis time and improve error message

Does this PR introduce any user-facing change?

no

How was this patch tested?

updated test

@github-actions github-actions bot added the SQL label Nov 5, 2021
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't throw NoSuchTableException here. The CatalogV2Utils.loadTable swallows NoSuchTableException and delays the error to CheckAnalysis, and we lost the actual error message.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cloud-fan
Copy link
Contributor Author

cc @holdenk @HyukjinKwon @viirya @imback82

@SparkQA
Copy link

SparkQA commented Nov 5, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/49397/

@SparkQA
Copy link

SparkQA commented Nov 5, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/49397/

@SparkQA
Copy link

SparkQA commented Nov 5, 2021

Test build #144926 has finished for PR 34490 at commit f634d80.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 5, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/49403/

@SparkQA
Copy link

SparkQA commented Nov 5, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/49403/

@SparkQA
Copy link

SparkQA commented Nov 5, 2021

Test build #144932 has finished for PR 34490 at commit deb664d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

}.getMessage
assert(message.contains(
"Table or view not found: a.b.c.tbl"))
assert(message.contains("requires a single-part namespace"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, this could be a little misleading since this sounds like SHOW COLUMNS IN is only supported for the session catalog? (although it is currently not supported for v2 catalogs.) This may be OK until we start supporting this command for v2 catalogs.

s"V2 session catalog requires a single-part namespace: ${ident.quoted}")
def requiresSinglePartNamespaceError(ns: Seq[String]): Throwable = {
new AnalysisException(
"spark_catalog requires a single-part namespace, but got " + ns.mkString("[", ", ", "]"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you don't mind, can we use SESSION_CATALOG_NAME instead of spark_catalog?

- "spark_catalog requires a single-part namespace, but got " + ns.mkString("[", ", ", "]"))
+ s"${SESSION_CATALOG_NAME} requires a single-part namespace, but got " +
+   ns.mkString("[", ", ", "]"))

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM (with one nit comment and also agree with @imback82 's comment.)

@dongjoon-hyun
Copy link
Member

BTW, @cloud-fan . According to the JIRA type, is this targeting only Apache Spark 3.3.0?

@cloud-fan
Copy link
Contributor Author

is this targeting only Apache Spark 3.3.0?

This just fixes error message, so we don't have to backport. But since the change is small, I think it should be safe to backport to 3.2 at least.

@SparkQA
Copy link

SparkQA commented Nov 8, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/49449/

@SparkQA
Copy link

SparkQA commented Nov 8, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/49449/

@cloud-fan
Copy link
Contributor Author

thanks for review, merging to master/3.2!

@cloud-fan cloud-fan closed this in 8ab9d63 Nov 8, 2021
cloud-fan added a commit that referenced this pull request Nov 8, 2021
This is a followup of #31427 , which introduced two issues:
1. When we lookup `spark_catalog.t`, we failed earlier with `The namespace in session catalog must have exactly one name part` before that PR, now we fail very late in `CheckAnalysis` with `NoSuchTableException`
2. The error message is a bit confusing now. We report `Table t not found` even if table `t` exists.

This PR fixes the 2 issues.

save analysis time and improve error message

no

updated test

Closes #34490 from cloud-fan/table.

Authored-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(cherry picked from commit 8ab9d63)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
@SparkQA
Copy link

SparkQA commented Nov 8, 2021

Test build #144978 has finished for PR 34490 at commit 3f38d60.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

comparePlans(parsed2, expected2)

val v3 = "CREATE TEMPORARY VIEW a.b AS SELECT 1"
intercept(v3, "It is not allowed to add database prefix")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

@HyukjinKwon HyukjinKwon Nov 18, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought this was an old test so didn't take an action but just noticed that it's pretty new. Probably we should revert this one - seems pretty frequently failing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Im gonna partially revert this change alone for now because other tests seem passing fine.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cloud-fan pushed a commit that referenced this pull request Nov 18, 2021
…alyzer error

### What changes were proposed in this pull request?
followup of #34490 in branch-3.2, which moves the test for checking
`notAllowedToAddDBPrefixForTempViewError` in the parser phase. But it only passes in master.
In branch-3.2, the error happens in the analyzer phase.

diverge happens in PR #34283

### Why are the changes needed?
fix test

### Does this PR introduce _any_ user-facing change?
no

### How was this patch tested?
pass the restored test.

Closes #34633 from linhongliu-db/SPARK-37214-3.2.

Authored-by: Linhong Liu <linhong.liu@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
HyukjinKwon added a commit that referenced this pull request Nov 18, 2021
…flakiness

### What changes were proposed in this pull request?

This PR reverts one of the tests added at #34490 which is (pretty seriously) flaky.

### Why are the changes needed?

Other tests seem passing fine so it's likely test specific issue. so only test is proposed to be reverted for now.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

CI in this PR should test this out.

Closes #34639 from HyukjinKwon/SPARK-37308.

Authored-by: Hyukjin Kwon <gurwls223@apache.org>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
sunchao pushed a commit to sunchao/spark that referenced this pull request Dec 8, 2021
This is a followup of apache#31427 , which introduced two issues:
1. When we lookup `spark_catalog.t`, we failed earlier with `The namespace in session catalog must have exactly one name part` before that PR, now we fail very late in `CheckAnalysis` with `NoSuchTableException`
2. The error message is a bit confusing now. We report `Table t not found` even if table `t` exists.

This PR fixes the 2 issues.

save analysis time and improve error message

no

updated test

Closes apache#34490 from cloud-fan/table.

Authored-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(cherry picked from commit 8ab9d63)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(cherry picked from commit e55bab5)
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
sunchao pushed a commit to sunchao/spark that referenced this pull request Dec 8, 2021
…alyzer error

### What changes were proposed in this pull request?
followup of apache#34490 in branch-3.2, which moves the test for checking
`notAllowedToAddDBPrefixForTempViewError` in the parser phase. But it only passes in master.
In branch-3.2, the error happens in the analyzer phase.

diverge happens in PR apache#34283

### Why are the changes needed?
fix test

### Does this PR introduce _any_ user-facing change?
no

### How was this patch tested?
pass the restored test.

Closes apache#34633 from linhongliu-db/SPARK-37214-3.2.

Authored-by: Linhong Liu <linhong.liu@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
catalinii pushed a commit to lyft/spark that referenced this pull request Feb 22, 2022
This is a followup of apache#31427 , which introduced two issues:
1. When we lookup `spark_catalog.t`, we failed earlier with `The namespace in session catalog must have exactly one name part` before that PR, now we fail very late in `CheckAnalysis` with `NoSuchTableException`
2. The error message is a bit confusing now. We report `Table t not found` even if table `t` exists.

This PR fixes the 2 issues.

save analysis time and improve error message

no

updated test

Closes apache#34490 from cloud-fan/table.

Authored-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(cherry picked from commit 8ab9d63)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
catalinii pushed a commit to lyft/spark that referenced this pull request Feb 22, 2022
…alyzer error

### What changes were proposed in this pull request?
followup of apache#34490 in branch-3.2, which moves the test for checking
`notAllowedToAddDBPrefixForTempViewError` in the parser phase. But it only passes in master.
In branch-3.2, the error happens in the analyzer phase.

diverge happens in PR apache#34283

### Why are the changes needed?
fix test

### Does this PR introduce _any_ user-facing change?
no

### How was this patch tested?
pass the restored test.

Closes apache#34633 from linhongliu-db/SPARK-37214-3.2.

Authored-by: Linhong Liu <linhong.liu@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
catalinii pushed a commit to lyft/spark that referenced this pull request Mar 4, 2022
This is a followup of apache#31427 , which introduced two issues:
1. When we lookup `spark_catalog.t`, we failed earlier with `The namespace in session catalog must have exactly one name part` before that PR, now we fail very late in `CheckAnalysis` with `NoSuchTableException`
2. The error message is a bit confusing now. We report `Table t not found` even if table `t` exists.

This PR fixes the 2 issues.

save analysis time and improve error message

no

updated test

Closes apache#34490 from cloud-fan/table.

Authored-by: Wenchen Fan <wenchen@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(cherry picked from commit 8ab9d63)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
catalinii pushed a commit to lyft/spark that referenced this pull request Mar 4, 2022
…alyzer error

### What changes were proposed in this pull request?
followup of apache#34490 in branch-3.2, which moves the test for checking
`notAllowedToAddDBPrefixForTempViewError` in the parser phase. But it only passes in master.
In branch-3.2, the error happens in the analyzer phase.

diverge happens in PR apache#34283

### Why are the changes needed?
fix test

### Does this PR introduce _any_ user-facing change?
no

### How was this patch tested?
pass the restored test.

Closes apache#34633 from linhongliu-db/SPARK-37214-3.2.

Authored-by: Linhong Liu <linhong.liu@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants