[SPARK-14762][SQL] TPCDS Q90 fails to parse #12537

hvanhovell · 2016-04-20T21:17:16Z

What changes were proposed in this pull request?

TPCDS Q90 fails to parse because it uses a reserved keyword as an Identifier; AT was used as an alias for one of the subqueries. AT is not a reserved keyword and should have been registerd as a in the nonReserved rule.

In order to prevent this from happening again I have added tests for all keywords that are non-reserved in Hive. See the nonReserved, sql11ReservedKeywordsUsedAsCastFunctionName & sql11ReservedKeywordsUsedAsIdentifier rules in https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/parse/IdentifiersParser.g.

How was this patch tested?

Added tests to for all Hive non reserved keywords to TableIdentifierParserSuite.

cc @davies

…tests.

hvanhovell · 2016-04-20T21:17:30Z

This might fail Hive tests.

SparkQA · 2016-04-20T21:23:59Z

Test build #56412 has finished for PR 12537 at commit 5cbd4b6.

This patch fails Scala style tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-04-20T22:41:37Z

Test build #56413 has finished for PR 12537 at commit de9d8be.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

gatorsmile · 2016-04-21T06:05:25Z

...atalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/TableIdentifierParserSuite.scala

+    hiveNonReservedKeyword.foreach { nonReserved =>
+      assert(TableIdentifier(nonReserved) === parseTableIdentifier(nonReserved))
+    }
+  }


I think we need one more test case. Table identifier has illegal characters. For example, tab:1. Then, it will directly fall into the group hiveNativeCommands. Then, we got a weird Hive parser error, just like

NoViableAltException(302@[192:1: tableName : (db= identifier DOT tab= identifier -> ^( TOK_TABNAME $db $tab) |tab= identifier -> ^( TOK_TABNAME $tab) );]) at org.antlr.runtime.DFA.noViableAlt(DFA.java:158) at org.antlr.runtime.DFA.predict(DFA.java:116) at org.apache.hadoop.hive.ql.parse.HiveParser_FromClauseParser.tableName(HiveParser_FromClauseParser.java:4747) at org.apache.hadoop.hive.ql.parse.HiveParser.tableName(HiveParser.java:45920)

Yeah, we need to fix this. The HiveSqlParser passes on queries it can't parse to Hive. I think we should disable this behavior after we have integrated the SparkSqlParser & HiveSqlParser. Could you create a JIRA ticket under https://issues.apache.org/jira/browse/SPARK-14776?

https://issues.apache.org/jira/browse/SPARK-14802. Done

I am just thinking if we can capture it during the Parser? The idea is to issue a better error message when users input an incorrect error message.

How about relaxing the .g4 parsing rule for tableIdentifier? That is, allowing Parser to accept the illegal inputs? Then, verify its correctness in the visitTableIdentifier. If users input an unexpected table/database name, we can easily tell users such inputs are not allowed as the table/database names?

Also found a bug in the table identifier. Hive does not allow _a. Do you want me to fix it? Or you want to do it? Thanks!

Thank you! @hvanhovell I see your points. Fortunately, Hive metastore does not enforce it. Otherwise, we are still unable to have names starting with underscores.

Since we are unable to trust Hive metastore APIs to issue exceptions if we inputting the illegal inputs, I am trying to see if anything we did is not consistent with Hive. Otherwise, it could cause a disaster result. For example, one of them is #12446. We could drop all the partitions if users input a wrong partition column name.

@gatorsmile I think the metastore will allow names starting with underscores, since Hive allows names starting with underscores if you put the name in backticks. However we are talking about Hive, and it has surprised me before. Could you open a PR to test if the metastore allows this?

Metastore allows it. That is why I agreed we can keep it without any change, but we need to add a few test cases to check if this assumption still works if we upgrade the version later. Let me submit a PR for adding these test cases.

Will do it after #12081 is merged. : ) Thanks!

In the current Parser, we still have another issue. For example, our Parser can accept

CREATE TABLE `tab:1` ...

However, Hive metastore does not allow it. We hit an exception from the metastore:

org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.hadoop.hive.ql.metadata.HiveException: [tab:1]: is not a valid table name; org.apache.spark.sql.AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.hadoop.hive.ql.metadata.HiveException: [tab:1]: is not a valid table name; at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:71) at org.apache.spark.sql.hive.HiveExternalCatalog.createTable(HiveExternalCatalog.scala:147) at org.apache.spark.sql.catalyst.catalog.SessionCatalog.createTable(SessionCatalog.scala:188)

In the next PR, I will add the extra logics to verify the names before calling Hive metastore APIs.

SparkQA · 2016-04-21T07:01:50Z

Test build #56486 has finished for PR 12537 at commit c9875e6.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

hvanhovell · 2016-04-21T07:14:47Z

retest this please

SparkQA · 2016-04-21T08:52:35Z

Test build #56501 has finished for PR 12537 at commit c9875e6.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-04-21T14:36:41Z

Test build #56536 has finished for PR 12537 at commit d48b19e.

This patch fails PySpark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-04-21T18:13:16Z

Test build #2845 has finished for PR 12537 at commit d48b19e.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-04-22T17:47:37Z

Test build #2857 has finished for PR 12537 at commit d48b19e.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

davies · 2016-04-22T18:28:27Z

LGTM, merging into master, thanks!

Add most hive non reserved keywords to the non reserved list and add …

5cbd4b6

…tests.

Style.

de9d8be

Schema...

c9875e6

gatorsmile reviewed Apr 21, 2016
View reviewed changes

A few more illegal test after CR.

d48b19e

asfgit closed this in d060da0 Apr 22, 2016

gatorsmile mentioned this pull request Apr 23, 2016

[SPARK-14857] [SQL] Table/Database Name Validation in SessionCatalog #12618

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-14762][SQL] TPCDS Q90 fails to parse #12537

[SPARK-14762][SQL] TPCDS Q90 fails to parse #12537

hvanhovell commented Apr 20, 2016

hvanhovell commented Apr 20, 2016

SparkQA commented Apr 20, 2016

SparkQA commented Apr 20, 2016

gatorsmile Apr 21, 2016 •

edited

hvanhovell Apr 21, 2016

gatorsmile Apr 21, 2016

gatorsmile Apr 21, 2016 •

edited

gatorsmile Apr 22, 2016 •

edited

gatorsmile Apr 22, 2016

hvanhovell Apr 22, 2016

gatorsmile Apr 22, 2016 •

edited

gatorsmile Apr 22, 2016

gatorsmile Apr 22, 2016 •

edited

SparkQA commented Apr 21, 2016

hvanhovell commented Apr 21, 2016

SparkQA commented Apr 21, 2016

SparkQA commented Apr 21, 2016

SparkQA commented Apr 21, 2016

SparkQA commented Apr 22, 2016

davies commented Apr 22, 2016

[SPARK-14762][SQL] TPCDS Q90 fails to parse #12537

[SPARK-14762][SQL] TPCDS Q90 fails to parse #12537

Conversation

hvanhovell commented Apr 20, 2016

What changes were proposed in this pull request?

How was this patch tested?

hvanhovell commented Apr 20, 2016

SparkQA commented Apr 20, 2016

SparkQA commented Apr 20, 2016

gatorsmile Apr 21, 2016 • edited

Choose a reason for hiding this comment

hvanhovell Apr 21, 2016

Choose a reason for hiding this comment

gatorsmile Apr 21, 2016

Choose a reason for hiding this comment

gatorsmile Apr 21, 2016 • edited

Choose a reason for hiding this comment

gatorsmile Apr 22, 2016 • edited

Choose a reason for hiding this comment

gatorsmile Apr 22, 2016

Choose a reason for hiding this comment

hvanhovell Apr 22, 2016

Choose a reason for hiding this comment

gatorsmile Apr 22, 2016 • edited

Choose a reason for hiding this comment

gatorsmile Apr 22, 2016

Choose a reason for hiding this comment

gatorsmile Apr 22, 2016 • edited

Choose a reason for hiding this comment

SparkQA commented Apr 21, 2016

hvanhovell commented Apr 21, 2016

SparkQA commented Apr 21, 2016

SparkQA commented Apr 21, 2016

SparkQA commented Apr 21, 2016

SparkQA commented Apr 22, 2016

davies commented Apr 22, 2016

gatorsmile Apr 21, 2016 •

edited

gatorsmile Apr 21, 2016 •

edited

gatorsmile Apr 22, 2016 •

edited

gatorsmile Apr 22, 2016 •

edited

gatorsmile Apr 22, 2016 •

edited