[SPARK-18106][SQL] ANALYZE TABLE should raise a ParseException for invalid option by dongjoon-hyun · Pull Request #15640 · apache/spark

dongjoon-hyun · 2016-10-26T05:33:57Z

What changes were proposed in this pull request?

Currently, ANALYZE TABLE command accepts identifier for option NOSCAN. This PR raises a ParseException for unknown option.

Before

scala> sql("create table test(a int)")
res0: org.apache.spark.sql.DataFrame = []

scala> sql("analyze table test compute statistics blah")
res1: org.apache.spark.sql.DataFrame = []

After

scala> sql("create table test(a int)")
res0: org.apache.spark.sql.DataFrame = []

scala> sql("analyze table test compute statistics blah")
org.apache.spark.sql.catalyst.parser.ParseException:
Expected `NOSCAN` instead of `blah`(line 1, pos 0)

How was this patch tested?

Pass the Jenkins test with a new test case.

…valid option

SparkQA · 2016-10-26T07:40:26Z

Test build #67560 has finished for PR 15640 at commit 4819dd1.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

srinathshankar

Thanks for the quick PR

srinathshankar · 2016-10-26T16:23:59Z

sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala

-    if (ctx.partitionSpec == null &&
-      ctx.identifier != null &&
-      ctx.identifier.getText.toLowerCase == "noscan") {
+    if (ctx.partitionSpec == null && ctx.identifier != null) {


What if partition spec is not null ?
What happens with something like

ANALYZE TABLE mytable PARTITION (a) garbage

(Could you add a test for that ?)
Maybe

if (ctx.identifier != null && ctx.identifier.getText.toLowerCase != "noscan") { throw new ParseException(s"Expected `NOSCAN` instead of `${ctx.identifier.getText}`", ctx) }

could be moved to the top ?

Thank you for review and I'll handle that too.

Test case is added.

srinathshankar · 2016-10-26T16:25:46Z

sql/core/src/test/scala/org/apache/spark/sql/execution/SparkSqlParserSuite.scala

    intercept("explain describe tables x", "Unsupported SQL statement")
  }
+
+  test("SPARK-18106 analyze table") {


There are also parse tests for AnalyzeTable in sql/hive/src/test/scala/org/apache/spark/sql/hive/StatisticsSuite.scala
Let's have these in the same place

Ur, @srinathshankar .

I understand the reason why you put this there, so I looked at the StatisticsSuite.scala in both hive module and sql module.
But, we will not compare the value for this test case. If it's a parsing only grammar testcase, I prefer to put this in core.

How do you think about that?

If you want, I'll remove the normal cases which raises no exceptions.

Then maybe you can move those parse tests here ? All I'm suggesting is that the parse tests all be together.

Thank you. But, the parse test should be rewritten. Is it okay? Those testcase uses assertAnalyzeCommand using SparkSession and looks like the following.

def assertAnalyzeCommand(analyzeCommand: String, c: Class[_]) { val parsed = spark.sessionState.sqlParser.parsePlan(analyzeCommand) val operators = parsed.collect { case a: AnalyzeTableCommand => a case o => o } assert(operators.size === 1) if (operators(0).getClass() != c) { fail( s"""$analyzeCommand expected command: $c, but got ${operators(0)} |parsed command: |$parsed """.stripMargin) } } assertAnalyzeCommand( "ANALYZE TABLE Table1 COMPUTE STATISTICS", classOf[AnalyzeTableCommand])

In my opinion it's fine to rewrite and simplify. If you could do that, that would be great.

Yep. I have no objection for that. Actually, I love to do that.
But, I'd like to wait for some directional advice from committer.

Lets keep the parsing unit test (this file) and the analyze table integration test separate for now.

Thank you for the guide!

dongjoon-hyun · 2016-10-26T16:58:17Z

Hi, @srinathshankar .
For the test suite, I'll update again if you give more opinion. So far, I didn't update that because I'm not sure.

SparkQA · 2016-10-26T19:02:24Z

Test build #67587 has finished for PR 15640 at commit c92b6f1.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-10-26T19:07:33Z

Test build #67588 has finished for PR 15640 at commit 2a7707e.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

dongjoon-hyun · 2016-10-26T19:09:25Z

Hi, @hvanhovell .
Could you review this PR when you have some time?

srinathshankar · 2016-10-26T19:21:46Z

sql/core/src/test/scala/org/apache/spark/sql/execution/SparkSqlParserSuite.scala

    intercept("explain describe tables x", "Unsupported SQL statement")
  }
+
+  test("SPARK-18106 analyze table") {


Then maybe you can move those parse tests here ? All I'm suggesting is that the parse tests all be together.

srinathshankar · 2016-10-26T19:22:16Z

sql/core/src/test/scala/org/apache/spark/sql/execution/SparkSqlParserSuite.scala

+    assertEqual("analyze table t compute statistics noscan",
+      AnalyzeTableCommand(TableIdentifier("t"), noscan = true))
+    intercept("analyze table t compute statistics xxxx", "Expected `NOSCAN` instead of `xxxx`")
+    intercept("analyze table t partition (a) compute statistics xxxx")


Nit: , "ExpectedNOSCANinstead ofxxxx" here as well.

Thank you, It's done.

SparkQA · 2016-10-26T22:37:42Z

Test build #67598 has finished for PR 15640 at commit ec0516b.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

hvanhovell

This looks ok. I left a few comments.

hvanhovell · 2016-10-28T22:47:03Z

sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala

-    if (ctx.partitionSpec == null &&
-      ctx.identifier != null &&
-      ctx.identifier.getText.toLowerCase == "noscan") {
+    if (ctx.identifier != null && ctx.identifier.getText.toLowerCase != "noscan") {


Move this check into the first if statement. There is no need to check this twice.

I also think that the treatment of the partionSpec is quite funky. We scan the table as soon as a user defines a spec. Could you remove the null check; maybe it is better to just log a warning message and do what the user specified.

Thank you for review, @hvanhovell . I'll update the PR.

hvanhovell · 2016-10-30T15:36:04Z

sql/core/src/test/scala/org/apache/spark/sql/execution/SparkSqlParserSuite.scala

    intercept("explain describe tables x", "Unsupported SQL statement")
  }
+
+  test("SPARK-18106 analyze table") {


Lets keep the parsing unit test (this file) and the analyze table integration test separate for now.

SparkQA · 2016-10-30T20:57:03Z

Test build #67792 has finished for PR 15640 at commit 00b4f54.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

dongjoon-hyun · 2016-10-30T21:09:12Z

The only test failure seems to be irrelevant. Let's see the final test result which is still running.

[info] - SPARK-10562: partition by column with mixed case name *** FAILED *** (687 milliseconds)

SparkQA · 2016-10-30T22:09:31Z

Test build #67793 has finished for PR 15640 at commit 465f646.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

hvanhovell · 2016-10-30T22:23:54Z

LGTM - merging to master. Thanks!

dongjoon-hyun · 2016-10-30T23:49:29Z

Thank you, @hvanhovell ! Also, Thank you, @srinathshankar .

…valid option ## What changes were proposed in this pull request? Currently, `ANALYZE TABLE` command accepts `identifier` for option `NOSCAN`. This PR raises a ParseException for unknown option. **Before** ```scala scala> sql("create table test(a int)") res0: org.apache.spark.sql.DataFrame = [] scala> sql("analyze table test compute statistics blah") res1: org.apache.spark.sql.DataFrame = [] ``` **After** ```scala scala> sql("create table test(a int)") res0: org.apache.spark.sql.DataFrame = [] scala> sql("analyze table test compute statistics blah") org.apache.spark.sql.catalyst.parser.ParseException: Expected `NOSCAN` instead of `blah`(line 1, pos 0) ``` ## How was this patch tested? Pass the Jenkins test with a new test case. Author: Dongjoon Hyun <dongjoon@apache.org> Closes apache#15640 from dongjoon-hyun/SPARK-18106.

[SPARK-18106][SQL] ANALYZE TABLE should raise a ParseException for in…

4819dd1

…valid option

srinathshankar suggested changes Oct 26, 2016

View reviewed changes

Add testcase.

2a7707e

srinathshankar reviewed Oct 26, 2016

View reviewed changes

Check exception message.

ec0516b

hvanhovell requested changes Oct 30, 2016

View reviewed changes

dongjoon-hyun added 2 commits October 30, 2016 12:11

Address comments.

00b4f54

Move partition spec warning to the top in order to cover all cases.

465f646

asfgit closed this in 8ae2da0 Oct 30, 2016

dongjoon-hyun deleted the SPARK-18106 branch November 7, 2016 00:49

Conversation

dongjoon-hyun commented Oct 26, 2016

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

SparkQA commented Oct 26, 2016

Uh oh!

srinathshankar left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun commented Oct 26, 2016

Uh oh!

SparkQA commented Oct 26, 2016

Uh oh!

SparkQA commented Oct 26, 2016

Uh oh!

dongjoon-hyun commented Oct 26, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun Oct 26, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Oct 26, 2016

Uh oh!

hvanhovell left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Oct 30, 2016

Uh oh!

dongjoon-hyun commented Oct 30, 2016

Uh oh!

SparkQA commented Oct 30, 2016

Uh oh!

hvanhovell commented Oct 30, 2016

Uh oh!

dongjoon-hyun commented Oct 30, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

dongjoon-hyun Oct 26, 2016 •

edited

Loading