Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-17732][SQL] ALTER TABLE DROP PARTITION should support comparators #15987

Closed
wants to merge 1 commit into from
Closed

[SPARK-17732][SQL] ALTER TABLE DROP PARTITION should support comparators #15987

wants to merge 1 commit into from

Conversation

dongjoon-hyun
Copy link
Member

@dongjoon-hyun dongjoon-hyun commented Nov 23, 2016

What changes were proposed in this pull request?

This PR aims to support comparators, e.g. '<', '<=', '>', '>=', again in Apache Spark 2.0 for backward compatibility.

Spark 1.6

scala> sql("CREATE TABLE sales(id INT) PARTITIONED BY (country STRING, quarter STRING)")
res0: org.apache.spark.sql.DataFrame = [result: string]

scala> sql("ALTER TABLE sales DROP PARTITION (country < 'KR')")
res1: org.apache.spark.sql.DataFrame = [result: string]

Spark 2.0

scala> sql("CREATE TABLE sales(id INT) PARTITIONED BY (country STRING, quarter STRING)")
res0: org.apache.spark.sql.DataFrame = []

scala> sql("ALTER TABLE sales DROP PARTITION (country < 'KR')")
org.apache.spark.sql.catalyst.parser.ParseException:
mismatched input '<' expecting {')', ','}(line 1, pos 42)

After this PR, it's supported.

How was this patch tested?

Pass the Jenkins test with a newly added testcase.

@dongjoon-hyun
Copy link
Member Author

dongjoon-hyun commented Nov 23, 2016

Hi, @hvanhovell .
Could you review this PR?

This is the first attempt to use UnresolvedAttribute and Analyzer rule. There are two debatable issues.

  • Catalyst Analyzer doesn't know AlterTableDropPartitions, so I need to introduce CommandWithExpression trait here.

  • I added a testcase for atomic types, but we need to change ADD PARTITION, too. ADD PARTITION relates the several more parts. If possible, I want to make it as a separate PR.

@SparkQA
Copy link

SparkQA commented Nov 23, 2016

Test build #69050 has finished for PR 15987 at commit 970a904.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • trait CommandWithExpression extends LeafNode

@rxin
Copy link
Contributor

rxin commented Nov 23, 2016

Why does it cast it to double? The fix looks pretty weird.

@dongjoon-hyun
Copy link
Member Author

Thank you for review, @rxin . In the expression, the partition attribute are assumed String, so Analyzer injects that.

I tried to implement the following recommended way in JIRA issue, but the PR may choose not a clean way. Do you mean CommandWithExpression in command.scala and Analyzer.scala?

The Analyzer is injecting casts because the dataType of the partition attribute is a String:
https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala#L223
This is not trivial to solve. If we want to properly fix this, we need to use UnresolvedAttributes and do some add some analysis for this command. Another - hackier - way of getting there would, to explicitly exclude this command from analysis.
I am personally a proponent of option 1.

@dongjoon-hyun dongjoon-hyun changed the title [SPARK-18515][SQL] AlterTableDropPartitions fails for non-string columns [SPARK-17732][SPARK-18515][SQL] ALTER TABLE DROP PARTITION should support comparators Nov 30, 2016
@SparkQA
Copy link

SparkQA commented Nov 30, 2016

Test build #69383 has finished for PR 15987 at commit 9db07af.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • trait CommandWithExpression extends LeafNode

@SparkQA
Copy link

SparkQA commented Nov 30, 2016

Test build #69397 has finished for PR 15987 at commit c35aeab.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • trait CommandWithExpression extends LeafNode

@dongjoon-hyun
Copy link
Member Author

Hi, @hvanhovell .
Could you review this PR? This PR now is the new fix + the reverted commit.

// We should not do this here in order to support general CommandWithExpression
if (catalogTable.tableType == CatalogTableType.VIEW) {
throw new AnalysisException(
"Cannot alter a view with ALTER TABLE. Please use ALTER VIEW instead")
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously, run function of Command does every error handlings including this kind of errors.
Now, we should do that here because this exception has higher priority than unresolved attributes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Except AlterTableDropPartitionCommand, do we have other running commands which need this resolving? If not, I think we may not need to add this to Analyzer and create new abstraction CommandWithExpression.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. There is no such command.
AlterTableDropPartitionCommand is not visible in catalyst. So, it's inevitable to create some corresponding one in catalyst.

This is an real issue. So, @rxin told me it's a weird.
Until now, I didn't find a better way to support expressions on Command.
I'm asking the committers not to use expressions here.

@dongjoon-hyun dongjoon-hyun changed the title [SPARK-17732][SPARK-18515][SQL] ALTER TABLE DROP PARTITION should support comparators [SPARK-17732][SQL] ALTER TABLE DROP PARTITION should support comparators Nov 30, 2016
@dongjoon-hyun
Copy link
Member Author

dongjoon-hyun commented Nov 30, 2016

Hi, @hvanhovell , @gatorsmile , @rxin .

Should we remove Expression related stuff?
Originally, #15302 does not have this kind of expression issue. It's just an extension of the current Spark.

But, while supporting expressions in partition spec, I made #15704 . but it's reverted now. IMO, the followings can be different issues.

  1. SUPPORT DROP BY FILTER ([SPARK-17732][SQL] ALTER TABLE DROP PARTITION should support comparators #15302)
  2. SUPPORT EXPRESSION ON PARTITION SPEC

Especially, Apache Spark Command is not designed to carry expression, so it seems to make things complex and weird like in this PR.
Also, for 2, we need to support expressions on ADD PARTITION, too. For example, current Spark do the following, but Hive makes a partition p=10.

scala> sql("create table tbl_x (a int) partitioned by (p int)")
scala> sql("alter table tbl_x add partition (p=10.0)")
scala> sql("show partitions tbl_x").show
+---------+
|partition|
+---------+
|   p=10.0|
+---------+

In short, this is beyond of 1, I think we had better go back the old version (without expression.) How do you think about this?

…port comparators

This PR aims to support `comparators`, e.g. '<', '<=', '>', '>=', again in Apache Spark 2.0 for backward compatibility.

**Spark 1.6**

``` scala
scala> sql("CREATE TABLE sales(id INT) PARTITIONED BY (country STRING, quarter STRING)")
res0: org.apache.spark.sql.DataFrame = [result: string]

scala> sql("ALTER TABLE sales DROP PARTITION (country < 'KR')")
res1: org.apache.spark.sql.DataFrame = [result: string]
```

**Spark 2.0**

``` scala
scala> sql("CREATE TABLE sales(id INT) PARTITIONED BY (country STRING, quarter STRING)")
res0: org.apache.spark.sql.DataFrame = []

scala> sql("ALTER TABLE sales DROP PARTITION (country < 'KR')")
org.apache.spark.sql.catalyst.parser.ParseException:
mismatched input '<' expecting {')', ','}(line 1, pos 42)
```

Pass the Jenkins test with a newly added testcase.
@dongjoon-hyun
Copy link
Member Author

Rebased to resolve conflicts.

@dongjoon-hyun
Copy link
Member Author

Hi, @hvanhovell .
Could you give me some overall directional advice?

@SparkQA
Copy link

SparkQA commented Dec 2, 2016

Test build #69574 has finished for PR 15987 at commit c7c31e1.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • trait CommandWithExpression extends LeafNode

@dongjoon-hyun
Copy link
Member Author

Retest this please

@SparkQA
Copy link

SparkQA commented Dec 7, 2016

Test build #69797 has finished for PR 15987 at commit c7c31e1.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • trait CommandWithExpression extends LeafNode

@dongjoon-hyun
Copy link
Member Author

Hi, @hvanhovell .
I will remove the support of expression from this PR. Is it okay for you?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants