-
Notifications
You must be signed in to change notification settings - Fork 28.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-12689][SQL] Migrate DDL parsing to the newly absorbed parser #10723
Conversation
Test build #49237 has finished for PR 10723 at commit
|
cc @hvanhovell @viirya can you rename the title to "migrate describe table parsing to ..." |
Because SQLContext still uses DDLParser, looks like I can't simply remove |
Test build #49317 has finished for PR 10723 at commit
|
@cloud-fan Can you also take a look? It is related to the work of adding DDL support for creating bucketed tables. |
@@ -52,26 +57,30 @@ private[sql] class SparkQl(conf: ParserConf = SimpleParserConf()) extends Cataly | |||
nodeToDescribeFallback(node) | |||
} else { | |||
tableType match { | |||
case Token("TOK_TABTYPE", Token("TOK_TABNAME", nameParts :: Nil) :: Nil) => | |||
case Token("TOK_TABTYPE", Token("TOK_TABNAME", nameParts) :: Nil) => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why change this? You didn't touch the describe stuff in SparkSqlParser.g
right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. I think it is incorrect from beginning but not be tested it out because we don't reach here before. I've tested it locally. Once all three commands are migrated, we can see this passing tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we parse the following SQL using the parse driver org.apache.spark.sql.catalyst.parser.ParseDriver.parsePlan("DESCRIBE EXTENDED tbl.a", null)
We would end up with the following AST:
TOK_DESCTABLE 1, 0, 6, 18
:- TOK_TABTYPE 1, 4, 6, 18
: +- TOK_TABNAME 1, 4, 6, 18
: :- tbl 1, 4, 4, 18
: +- a 1, 6, 6, 22
+- EXTENDED 1, 2, 2, 9
This change would pick this up, and old code didn't (I am sure I tested this though :S ). You can disable this in the DDL parser, to see if it works now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we add a test for this? The Hive test suite apparently misses this one. I could also address in a different PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually we have test for describe table command in HiveQuerySuite. Do we need another test?
Conflicts: sql/catalyst/src/main/antlr3/org/apache/spark/sql/catalyst/parser/SparkSqlLexer.g
Test build #49585 has finished for PR 10723 at commit
|
// It is describing a column with the format like "describe db.table column". | ||
nodeToDescribeFallback(node) | ||
case tableName => | ||
case tableName :: Nil => | ||
// It is describing a table with the format like "describe table". | ||
datasources.DescribeCommand( | ||
UnresolvedRelation(TableIdentifier(tableName.text), None), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cleanIdentifier?
@@ -316,7 +316,7 @@ class HiveContext private[hive]( | |||
} | |||
|
|||
protected[sql] override def parseSql(sql: String): LogicalPlan = { | |||
super.parseSql(substitutor.substitute(hiveconf, sql)) | |||
sqlParser.parsePlan(substitutor.substitute(hiveconf, sql)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about gradually moving functionality from the DLL parser to SparkQl? That would allow us to test this in the meantime.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DDLParser is still used in SQLContext. Do we want to completely remove it? Because I already migrate three commands. I think we can test them all together.
case Token("TOK_TABLEOPTIONS", options) => | ||
options.map { | ||
case Token("TOK_TABLEOPTION", Token(key, _) :: Token(value, _) :: Nil) => | ||
(key, value.replaceAll("^\'|^\"|\"$|\'$", "")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not use unquoteString
this does the same and is easier to read?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't know there is unquoteString
. Thanks.
Test build #49591 has finished for PR 10723 at commit
|
Conflicts: sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DDLParser.scala
looseNonReserved | ||
: nonReserved | KW_FROM | KW_TO | ||
; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are allowed to use From
and To
in CreateTableUsing command's options (actually seems we can use any string as the option key). But we can't simply add them into nonReserved
because by doing that we mess other existing rules. So we create a looseIdentifier
and looseNonReserved
here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not add this to the option rule directly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because I don't know if we will add other reserved words later. If so, the option rule might be too long. I don't count if any keywords are not included in nonReserved
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both (current approach or adding it to the option rule) are okay for me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could add your initial line commentaar as a comment in the code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for reminding. I've added it.
Test build #49673 has finished for PR 10723 at commit
|
Test build #49669 has finished for PR 10723 at commit
|
Test build #49675 has finished for PR 10723 at commit
|
case Token(k, Nil) => k | ||
}.mkString(".") | ||
val value = unquoteString(keysAndValue.last.text) | ||
(key, unquoteString(value)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unquoting twice?
@viirya I have done another round. Most things are minor, but I would to know why you want to change the treatment of quoted identifiers? |
…Parser commands to new Parser This PR moves all the functionality provided by the SparkSQLParser/ExtendedHiveQlParser to the new Parser hierarchy (SparkQl/HiveQl). This also improves the current SET command parsing: the current implementation swallows ```set role ...``` and ```set autocommit ...``` commands, this PR respects these commands (and passes them on to Hive). This PR and #10723 end the use of Parser-Combinator parsers for SQL parsing. As a result we can also remove the ```AbstractSQLParser``` in Catalyst. The PR is marked WIP as long as it doesn't pass all tests. cc rxin viirya winningsix (this touches #10144) Author: Herman van Hovell <hvanhovell@questtec.nl> Closes #10905 from hvanhovell/SPARK-12866.
Conflicts: sql/catalyst/src/main/antlr3/org/apache/spark/sql/catalyst/parser/SparkSqlParser.g sql/core/src/main/scala/org/apache/spark/sql/execution/SparkQl.scala sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveContext.scala
@hvanhovell Thanks for reviewing this. I've updated this to address your comments. Please see if it is proper for you. |
Test build #50258 has finished for PR 10723 at commit
|
Test build #50276 has finished for PR 10723 at commit
|
LGTM |
Conflicts: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkQl.scala
Test build #50355 has finished for PR 10723 at commit
|
retest this please. |
Test build #50366 has finished for PR 10723 at commit
|
It's weird. |
retest this please. |
It is, can't make sense of this either. Are tests passing locally? |
Yeah, I think so. And I don't update codes since last successful test. |
See how another round of test shows. |
Many unrelated failures like can't find hive jar file. |
Test build #50377 has finished for PR 10723 at commit
|
ping @rxin |
@viirya I am gonna trigger another test to make sure things keep working. |
retest this please |
@hvanhovell ok, thanks. |
Test build #50382 has finished for PR 10723 at commit
|
cc @rxin |
Thanks - merging this in master. |
JIRA: https://issues.apache.org/jira/browse/SPARK-12689
DDLParser processes three commands: createTable, describeTable and refreshTable.
This patch migrates the three commands to newly absorbed parser.