[SPARK-18106][SQL] ANALYZE TABLE should raise a ParseException for invalid option#15640
[SPARK-18106][SQL] ANALYZE TABLE should raise a ParseException for invalid option#15640dongjoon-hyun wants to merge 5 commits intoapache:masterfrom dongjoon-hyun:SPARK-18106
Conversation
|
Test build #67560 has finished for PR 15640 at commit
|
srinathshankar
left a comment
There was a problem hiding this comment.
Thanks for the quick PR
| if (ctx.partitionSpec == null && | ||
| ctx.identifier != null && | ||
| ctx.identifier.getText.toLowerCase == "noscan") { | ||
| if (ctx.partitionSpec == null && ctx.identifier != null) { |
There was a problem hiding this comment.
What if partition spec is not null ?
What happens with something like
ANALYZE TABLE mytable PARTITION (a) garbage
(Could you add a test for that ?)
Maybe
if (ctx.identifier != null && ctx.identifier.getText.toLowerCase != "noscan") {
throw new ParseException(s"Expected `NOSCAN` instead of `${ctx.identifier.getText}`", ctx)
}
could be moved to the top ?
There was a problem hiding this comment.
Thank you for review and I'll handle that too.
There was a problem hiding this comment.
Test case is added.
| intercept("explain describe tables x", "Unsupported SQL statement") | ||
| } | ||
|
|
||
| test("SPARK-18106 analyze table") { |
There was a problem hiding this comment.
There are also parse tests for AnalyzeTable in sql/hive/src/test/scala/org/apache/spark/sql/hive/StatisticsSuite.scala
Let's have these in the same place
There was a problem hiding this comment.
Ur, @srinathshankar .
I understand the reason why you put this there, so I looked at the StatisticsSuite.scala in both hive module and sql module.
But, we will not compare the value for this test case. If it's a parsing only grammar testcase, I prefer to put this in core.
How do you think about that?
There was a problem hiding this comment.
If you want, I'll remove the normal cases which raises no exceptions.
There was a problem hiding this comment.
Then maybe you can move those parse tests here ? All I'm suggesting is that the parse tests all be together.
There was a problem hiding this comment.
Thank you. But, the parse test should be rewritten. Is it okay? Those testcase uses assertAnalyzeCommand using SparkSession and looks like the following.
def assertAnalyzeCommand(analyzeCommand: String, c: Class[_]) {
val parsed = spark.sessionState.sqlParser.parsePlan(analyzeCommand)
val operators = parsed.collect {
case a: AnalyzeTableCommand => a
case o => o
}
assert(operators.size === 1)
if (operators(0).getClass() != c) {
fail(
s"""$analyzeCommand expected command: $c, but got ${operators(0)}
|parsed command:
|$parsed
""".stripMargin)
}
}
assertAnalyzeCommand(
"ANALYZE TABLE Table1 COMPUTE STATISTICS",
classOf[AnalyzeTableCommand])There was a problem hiding this comment.
In my opinion it's fine to rewrite and simplify. If you could do that, that would be great.
There was a problem hiding this comment.
Yep. I have no objection for that. Actually, I love to do that.
But, I'd like to wait for some directional advice from committer.
There was a problem hiding this comment.
Lets keep the parsing unit test (this file) and the analyze table integration test separate for now.
There was a problem hiding this comment.
Thank you for the guide!
|
Hi, @srinathshankar . |
|
Test build #67587 has finished for PR 15640 at commit
|
|
Test build #67588 has finished for PR 15640 at commit
|
|
Hi, @hvanhovell . |
| intercept("explain describe tables x", "Unsupported SQL statement") | ||
| } | ||
|
|
||
| test("SPARK-18106 analyze table") { |
There was a problem hiding this comment.
Then maybe you can move those parse tests here ? All I'm suggesting is that the parse tests all be together.
| assertEqual("analyze table t compute statistics noscan", | ||
| AnalyzeTableCommand(TableIdentifier("t"), noscan = true)) | ||
| intercept("analyze table t compute statistics xxxx", "Expected `NOSCAN` instead of `xxxx`") | ||
| intercept("analyze table t partition (a) compute statistics xxxx") |
There was a problem hiding this comment.
Nit: , "ExpectedNOSCANinstead ofxxxx" here as well.
There was a problem hiding this comment.
Thank you, It's done.
|
Test build #67598 has finished for PR 15640 at commit
|
hvanhovell
left a comment
There was a problem hiding this comment.
This looks ok. I left a few comments.
| if (ctx.partitionSpec == null && | ||
| ctx.identifier != null && | ||
| ctx.identifier.getText.toLowerCase == "noscan") { | ||
| if (ctx.identifier != null && ctx.identifier.getText.toLowerCase != "noscan") { |
There was a problem hiding this comment.
Move this check into the first if statement. There is no need to check this twice.
There was a problem hiding this comment.
I also think that the treatment of the partionSpec is quite funky. We scan the table as soon as a user defines a spec. Could you remove the null check; maybe it is better to just log a warning message and do what the user specified.
There was a problem hiding this comment.
Thank you for review, @hvanhovell . I'll update the PR.
| intercept("explain describe tables x", "Unsupported SQL statement") | ||
| } | ||
|
|
||
| test("SPARK-18106 analyze table") { |
There was a problem hiding this comment.
Lets keep the parsing unit test (this file) and the analyze table integration test separate for now.
|
Test build #67792 has finished for PR 15640 at commit
|
|
The only test failure seems to be irrelevant. Let's see the final test result which is still running. |
|
Test build #67793 has finished for PR 15640 at commit
|
|
LGTM - merging to master. Thanks! |
|
Thank you, @hvanhovell ! Also, Thank you, @srinathshankar . |
…valid option
## What changes were proposed in this pull request?
Currently, `ANALYZE TABLE` command accepts `identifier` for option `NOSCAN`. This PR raises a ParseException for unknown option.
**Before**
```scala
scala> sql("create table test(a int)")
res0: org.apache.spark.sql.DataFrame = []
scala> sql("analyze table test compute statistics blah")
res1: org.apache.spark.sql.DataFrame = []
```
**After**
```scala
scala> sql("create table test(a int)")
res0: org.apache.spark.sql.DataFrame = []
scala> sql("analyze table test compute statistics blah")
org.apache.spark.sql.catalyst.parser.ParseException:
Expected `NOSCAN` instead of `blah`(line 1, pos 0)
```
## How was this patch tested?
Pass the Jenkins test with a new test case.
Author: Dongjoon Hyun <dongjoon@apache.org>
Closes apache#15640 from dongjoon-hyun/SPARK-18106.
…valid option
## What changes were proposed in this pull request?
Currently, `ANALYZE TABLE` command accepts `identifier` for option `NOSCAN`. This PR raises a ParseException for unknown option.
**Before**
```scala
scala> sql("create table test(a int)")
res0: org.apache.spark.sql.DataFrame = []
scala> sql("analyze table test compute statistics blah")
res1: org.apache.spark.sql.DataFrame = []
```
**After**
```scala
scala> sql("create table test(a int)")
res0: org.apache.spark.sql.DataFrame = []
scala> sql("analyze table test compute statistics blah")
org.apache.spark.sql.catalyst.parser.ParseException:
Expected `NOSCAN` instead of `blah`(line 1, pos 0)
```
## How was this patch tested?
Pass the Jenkins test with a new test case.
Author: Dongjoon Hyun <dongjoon@apache.org>
Closes apache#15640 from dongjoon-hyun/SPARK-18106.
What changes were proposed in this pull request?
Currently,
ANALYZE TABLEcommand acceptsidentifierfor optionNOSCAN. This PR raises a ParseException for unknown option.Before
After
How was this patch tested?
Pass the Jenkins test with a new test case.