Skip to content

Commit

Permalink
[SPARK-37471][SQL] spark-sql support ; in nested bracketed comment
Browse files Browse the repository at this point in the history
### What changes were proposed in this pull request?
In current spark-sql, when use -e and -f, it can't support nested bracketed comment such as
 ```
/* SELECT /*+ BROADCAST(b) */ 4;
*/
SELECT  1
;
```
When run `spark-sql -f` with `--verbose` got below error
```
park master: yarn, Application Id: application_1632999510150_6968442
/* sielect /* BROADCAST(b) */ 4
Error in query:
mismatched input '4' expecting {'(', 'ADD', 'ALTER', 'ANALYZE', 'CACHE', 'CLEAR', 'COMMENT', 'COMMIT', 'CREATE', 'DELETE', 'DESC', 'DESCRIBE', 'DFS', 'DROP', 'EXPLAIN', 'EXPORT', 'FROM', 'GRANT', 'IMPORT', 'INSERT', 'LIST', 'LOAD', 'LOCK', 'MAP', 'MERGE', 'MSCK', 'REDUCE', 'REFRESH', 'REPLACE', 'RESET', 'REVOKE', 'ROLLBACK', 'SELECT', 'SET', 'SHOW', 'START', 'TABLE', 'TRUNCATE', 'UNCACHE', 'UNLOCK', 'UPDATE', 'USE', 'VALUES', 'WITH'}(line 1, pos 30)

== SQL ==
/* sielect /* BROADCAST(b) */ 4
------------------------------^^^
```

In current code
```
else if (line.charAt(index) == '/' && !insideSimpleComment) {
        val hasNext = index + 1 < line.length
        if (insideSingleQuote || insideDoubleQuote) {
          // Ignores '/' in any case of quotes
        } else if (insideBracketedComment && line.charAt(index - 1) == '*' ) {
          // Decrements `bracketedCommentLevel` at the beginning of the next loop
          leavingBracketedComment = true
        } else if (hasNext && !insideBracketedComment &&  line.charAt(index + 1) == '*') {
          bracketedCommentLevel += 1
        }
      }
```

If it meet an `*/`, it will mark  `leavingBracketedComment` as true, then  when call next char, bracketed comment level -1.
```
      if (leavingBracketedComment) {
        bracketedCommentLevel -= 1
        leavingBracketedComment = false
      }

```

But when meet `/*`,  it need   `!insideBracketedComment`, then means if we have a  case
```
/* aaa /* bbb */  ; ccc */ select 1;
```

when meet second `/*` , `insideBracketedComment` is true, so this `/*` won't be treat as a start of bracket comment.
Then meet the first `*/`, bracketed comment end, this  query is split as
```
/* aaa /* bbb */;    =>  comment
ccc */ select 1;   => query
```

Then query failed.

So here we remove the condition of `!insideBracketedComment`,  then we can have `bracketedCommentLevel > 1` and  since
```
 def insideBracketedComment: Boolean = bracketedCommentLevel > 0
```
So chars between all level of bracket are treated as comment.

### Why are the changes needed?
In spark #37389 we support nested bracketed comment in SQL, here for spark-sql we should support too.

### Does this PR introduce _any_ user-facing change?
User can use nested bracketed comment in spark-sql

### How was this patch tested?

Since spark-sql  console mode have special logic about handle `;`
```
    while (line != null) {
      if (!line.startsWith("--")) {
        if (prefix.nonEmpty) {
          prefix += '\n'
        }

        if (line.trim().endsWith(";") && !line.trim().endsWith("\\;")) {
          line = prefix + line
          ret = cli.processLine(line, true)
          prefix = ""
          currentPrompt = promptWithCurrentDB
        } else {
          prefix = prefix + line
          currentPrompt = continuedPromptWithDBSpaces
        }
      }
      line = reader.readLine(currentPrompt + "> ")
    }
```

If we write sql as below
```
/* SELECT /*+ BROADCAST(b) */ 4\\;
*/
SELECT  1
;
```
the `\\;` is escaped.

Manuel  test wit spark-sql -f
```
(spark.submit.pyFiles,)
(spark.submit.deployMode,client)
(spark.master,local[*])
Classpath elements:

Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
21/11/26 16:32:08 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
21/11/26 16:32:10 WARN HiveConf: HiveConf of name hive.stats.jdbc.timeout does not exist
21/11/26 16:32:10 WARN HiveConf: HiveConf of name hive.stats.retries.wait does not exist
21/11/26 16:32:13 WARN ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 2.3.0
21/11/26 16:32:13 WARN ObjectStore: setMetaStoreSchemaVersion called but recording version is disabled: version = 2.3.0, comment = Set by MetaStore yi.zhu10.12.189.175
Spark master: local[*], Application Id: local-1637915529831
/* select /* BROADCAST(b) */ 4;
*/
select  1

1
Time taken: 3.851 seconds, Fetched 1 row(s)
C02D45VVMD6T:spark yi.zhu$
```

In current PR, un completed bracket comment won't execute now, for SQL file
```
/* select /* BROADCAST(b) */ 4;
*/
select  1
;

/* select /* braoad */ ;
select 1;
```

It only execute
```
/* select /* BROADCAST(b) */ 4;
*/
select  1
;
```

The next part
```
/* select /* braoad */ ;
select 1;
```
 are still treated as inprogress SQL.

Closes #34721 from AngersZhuuuu/SPARK-37471.

Authored-by: Angerszhuuuu <angers.zhu@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
  • Loading branch information
AngersZhuuuu authored and cloud-fan committed Dec 3, 2021
1 parent 544865d commit 6e19125
Show file tree
Hide file tree
Showing 2 changed files with 10 additions and 1 deletion.
Expand Up @@ -600,7 +600,7 @@ private[hive] class SparkSQLCLIDriver extends CliDriver with Logging {
} else if (insideBracketedComment && line.charAt(index - 1) == '*' ) {
// Decrements `bracketedCommentLevel` at the beginning of the next loop
leavingBracketedComment = true
} else if (hasNext && !insideBracketedComment && line.charAt(index + 1) == '*') {
} else if (hasNext && line.charAt(index + 1) == '*') {
bracketedCommentLevel += 1
}
}
Expand Down
Expand Up @@ -611,4 +611,13 @@ class CliSuite extends SparkFunSuite with BeforeAndAfterAll with Logging {
Seq("--conf", s"${BUILTIN_HIVE_VERSION.key}=$builtinHiveVersion"))(
s"set ${BUILTIN_HIVE_VERSION.key};" -> builtinHiveVersion, "SET -v;" -> builtinHiveVersion)
}

test("SPARK-37471: spark-sql support nested bracketed comment ") {
runCliWithin(1.minute)(
"""
|/* SELECT /*+ HINT() */ 4; */
|SELECT 1;
|""".stripMargin -> "SELECT 1"
)
}
}

0 comments on commit 6e19125

Please sign in to comment.