Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[SPARK-37471][SQL] spark-sql support
;
in nested bracketed comment
### What changes were proposed in this pull request? In current spark-sql, when use -e and -f, it can't support nested bracketed comment such as ``` /* SELECT /*+ BROADCAST(b) */ 4; */ SELECT 1 ; ``` When run `spark-sql -f` with `--verbose` got below error ``` park master: yarn, Application Id: application_1632999510150_6968442 /* sielect /* BROADCAST(b) */ 4 Error in query: mismatched input '4' expecting {'(', 'ADD', 'ALTER', 'ANALYZE', 'CACHE', 'CLEAR', 'COMMENT', 'COMMIT', 'CREATE', 'DELETE', 'DESC', 'DESCRIBE', 'DFS', 'DROP', 'EXPLAIN', 'EXPORT', 'FROM', 'GRANT', 'IMPORT', 'INSERT', 'LIST', 'LOAD', 'LOCK', 'MAP', 'MERGE', 'MSCK', 'REDUCE', 'REFRESH', 'REPLACE', 'RESET', 'REVOKE', 'ROLLBACK', 'SELECT', 'SET', 'SHOW', 'START', 'TABLE', 'TRUNCATE', 'UNCACHE', 'UNLOCK', 'UPDATE', 'USE', 'VALUES', 'WITH'}(line 1, pos 30) == SQL == /* sielect /* BROADCAST(b) */ 4 ------------------------------^^^ ``` In current code ``` else if (line.charAt(index) == '/' && !insideSimpleComment) { val hasNext = index + 1 < line.length if (insideSingleQuote || insideDoubleQuote) { // Ignores '/' in any case of quotes } else if (insideBracketedComment && line.charAt(index - 1) == '*' ) { // Decrements `bracketedCommentLevel` at the beginning of the next loop leavingBracketedComment = true } else if (hasNext && !insideBracketedComment && line.charAt(index + 1) == '*') { bracketedCommentLevel += 1 } } ``` If it meet an `*/`, it will mark `leavingBracketedComment` as true, then when call next char, bracketed comment level -1. ``` if (leavingBracketedComment) { bracketedCommentLevel -= 1 leavingBracketedComment = false } ``` But when meet `/*`, it need `!insideBracketedComment`, then means if we have a case ``` /* aaa /* bbb */ ; ccc */ select 1; ``` when meet second `/*` , `insideBracketedComment` is true, so this `/*` won't be treat as a start of bracket comment. Then meet the first `*/`, bracketed comment end, this query is split as ``` /* aaa /* bbb */; => comment ccc */ select 1; => query ``` Then query failed. So here we remove the condition of `!insideBracketedComment`, then we can have `bracketedCommentLevel > 1` and since ``` def insideBracketedComment: Boolean = bracketedCommentLevel > 0 ``` So chars between all level of bracket are treated as comment. ### Why are the changes needed? In spark #37389 we support nested bracketed comment in SQL, here for spark-sql we should support too. ### Does this PR introduce _any_ user-facing change? User can use nested bracketed comment in spark-sql ### How was this patch tested? Since spark-sql console mode have special logic about handle `;` ``` while (line != null) { if (!line.startsWith("--")) { if (prefix.nonEmpty) { prefix += '\n' } if (line.trim().endsWith(";") && !line.trim().endsWith("\\;")) { line = prefix + line ret = cli.processLine(line, true) prefix = "" currentPrompt = promptWithCurrentDB } else { prefix = prefix + line currentPrompt = continuedPromptWithDBSpaces } } line = reader.readLine(currentPrompt + "> ") } ``` If we write sql as below ``` /* SELECT /*+ BROADCAST(b) */ 4\\; */ SELECT 1 ; ``` the `\\;` is escaped. Manuel test wit spark-sql -f ``` (spark.submit.pyFiles,) (spark.submit.deployMode,client) (spark.master,local[*]) Classpath elements: Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties Setting default log level to "WARN". To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel). 21/11/26 16:32:08 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 21/11/26 16:32:10 WARN HiveConf: HiveConf of name hive.stats.jdbc.timeout does not exist 21/11/26 16:32:10 WARN HiveConf: HiveConf of name hive.stats.retries.wait does not exist 21/11/26 16:32:13 WARN ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 2.3.0 21/11/26 16:32:13 WARN ObjectStore: setMetaStoreSchemaVersion called but recording version is disabled: version = 2.3.0, comment = Set by MetaStore yi.zhu10.12.189.175 Spark master: local[*], Application Id: local-1637915529831 /* select /* BROADCAST(b) */ 4; */ select 1 1 Time taken: 3.851 seconds, Fetched 1 row(s) C02D45VVMD6T:spark yi.zhu$ ``` In current PR, un completed bracket comment won't execute now, for SQL file ``` /* select /* BROADCAST(b) */ 4; */ select 1 ; /* select /* braoad */ ; select 1; ``` It only execute ``` /* select /* BROADCAST(b) */ 4; */ select 1 ; ``` The next part ``` /* select /* braoad */ ; select 1; ``` are still treated as inprogress SQL. Closes #34721 from AngersZhuuuu/SPARK-37471. Authored-by: Angerszhuuuu <angers.zhu@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
- Loading branch information